Memory Architecture's Stance Layer

The intuition pump: memory architecture probably follows a finite power law. A heavy head where general solutions dominate, a long tail where custom engineering is required, and an 80-20 zone where frontier compute focuses. Off-the-shelf eats the head. The tail stays bespoke. Where does any specific project sit, and at what point does the distinction stop mattering?

This is right, with one specific exclusion that matters.

The head: general agentic memory. Most of what humans want their memory to do is shared. Remember conversations. Surface forgotten context at the right moment. Don't bother the user with stale or irrelevant pings. Get smarter over time. Anthropic, OpenAI, Google are converging on memory features that handle this. Off-the-shelf will cover the 80%.

The body: vertical-specific tooling. Lawyers' matter history, doctors' patient records, researchers' lit reviews are domain-specialized but pattern-shared. Solvable by vertical tools built on general primitives. The plumbing is shared (embeddings, retrieval, indexing); the domain layer is configured. This handles another 15%.

The tail: custom epistemic projects. Maybe 5%. Researchers, writers, artists with idiosyncratic structures. The frontier won't focus here because the market is small. Solutions are written, not bought.

The 80% and the 15% are about market dynamics: where the frontier focuses compute. The 5% is about something the frontier can't focus on even if it wanted to.

Memory architecture has two layers. The plumbing layer (storage, retrieval, indexing, sync, decay) is solvable in general. The stance layer (what counts as worth remembering, what counts as a connection, what counts as quality, when something earns subtraction) is operator-specific and exogenous. The model can't infer the stance without the operator providing it, and providing it amounts to specifying a custom architecture.

This is where off-the-shelf hits a ceiling. Not for technical reasons, but for informational reasons. The operator's epistemic stance is exogenous information; it has to enter the system from outside. A sufficiently smart model could in principle infer a stance from observed behavior, but the inference is unstable until the operator has demonstrated enough preference signals to stabilize it, and meanwhile the stance is what's evolving through use. Externalizing it as architecture is what makes the evolution legible.

The sub-tail where the architecture is the public work. Most tail-case operators use their custom memory architecture as scaffolding for some other output: papers, deliverables, products. The architecture is private; the output is public. A narrower sub-tail inverts this. The architecture itself is part of what's published. The typed-edge graph, the GC discipline, the predecessor chain, the dipole calibration are content, not plumbing. The structural design encodes the project's epistemic posture, and that encoding is one of the things readers come for.

Hari sits in this sub-tail. It can't adopt off-the-shelf even in principle, because adopting would replace the public artifact.

Adoption guidance. Use general primitives where they exist: embeddings, retrieval mechanisms, sync infrastructure, agentic loops. Don't rebuild what generalizes. Keep custom: typed-edge schema, GC policy, quality function, provenance discipline, dipole-style calibration. These are stance-encoding; they don't generalize.

Will it stop mattering? Only for projects where the operator's epistemic stance is either inferable from observation or default-compressible to a small parameter set. For projects in the sub-tail, the stance is what's being developed. There is no default to compress to, and the inference is unstable while the stance evolves. The distinction doesn't dissolve. It sharpens as off-the-shelf saturates the head, because the contrast between general and custom becomes more visible.

Frontier focus. The compute goes where the market is: the head. Memory features in Claude, GPT, Gemini will saturate the 80%. Customization affordances will eat into the 15%. The 5% tail continues to require bespoke engineering, increasingly built on top of general primitives. This is the shape that serves both ends. The head gets the product. The tail gets cheaper parts to build with. The 80-20 frontier focus accelerates the tail rather than threatening it.

Timelines. Four predictions, each with a clock. AI timelines have been a graveyard for confident point estimates. What follows is calibrated ranges with attached mechanisms; trust the shape more than the dates.

Head saturation (the 80%). General agentic memory feels "solved" to the median user: remembers reliably, forgets appropriately, surfaces context without false positives. Range: 2027 to 2029. The rate-limiting factor isn't capability (mostly there by 2026) but trust formation and UX maturity. Users need calibrated expectations: what to expect the system to remember, what to accept it forgetting, when to nudge it. Median guess: late 2028.

Body absorbed by vertical tools (the 15%). Harvey for law, Glass for medicine, and their analogues in other domains absorb professional patterns. Vertical-specific tooling needs domain ontology (fast), regulatory shape (slow), and existing-system integration (varies). Substantially absorbed by 2028, mostly by 2030.

Tail-case primitive infrastructure (general plumbing for the 5%). General agentic-memory primitives (embeddings, retrieval, decay policies, agentic loops) package into stable libraries that tail-case operators build on top of. Currently rough (LangChain memory abstractions, Mem0, Letta, the Anthropic Files API). Stabilization usually trails capability by two to three years; stabilize in 2027 to 2028.

Thin-waist convergence. The surface where the operator specifies stance shrinks to a minimal interface; most of the system below is general. Range: 2030 to 2035. Path-dependent. Requires both growing plumbing capability (likely) and a stable interface for stance specification (uncertain; the field needs to coordinate on schemas).

The dominant uncertainty is rate-of-frontier-compute. If capability growth stalls (training data, energy, architecture limits), every clock above slips two to four years. If it accelerates, every clock pulls in one to three. The shape stays the same; the speed changes. A second uncertainty: if a future system can infer stance from a small number of interactions, thin-waist convergence holds but the amount of stance work the operator does shrinks faster. The structural prediction survives; the operator's per-piece work doesn't.

The thing worth watching is whether the shape holds (head saturates first, body next, tail last, thin-waist eventually), not whether the dates land. Dates are best-guesses. The shape is the prediction.

The thin-waist future. Plumbing generalizes. Stance doesn't. The operator-specific layer is exogenous, and exogenous information has to enter the system from outside. No amount of frontier compute resolves this, because the resolution requires either inferring what the operator has not yet decided, or compressing what is structurally novel. The two-layer model predicts that memory-architecture frontier work will look like increasingly capable and configurable plumbing with a thinner and thinner waist where the stance is specified. The plumbing waist is what off-the-shelf gives. The stance is what the operator brings.

Memory Architecture's Stance Layer

Related