v2 archive. Frozen public corpus snapshot for the v3 surface transition. Active v3 surface.

Memory Architecture's Stance Layer

The intuition pump: memory architecture probably follows a finite power law. A heavy head where general solutions dominate, a long tail where custom engineering is required, and an 80-20 zone where frontier compute focuses. Off-the-shelf eats the head. The tail stays bespoke. Where does any specific project sit, and at what point does the distinction stop mattering?

This is right, with one specific exclusion that matters.

The head: general agentic memory. Most of what humans want their memory to do is shared. Remember conversations. Surface forgotten context at the right moment. Don't bother the user with stale or irrelevant pings. Get smarter over time. Anthropic, OpenAI, Google are converging on memory features that handle this. Off-the-shelf will cover the 80%.

The body: vertical-specific tooling. Lawyers' matter history, doctors' patient records, researchers' lit reviews are domain-specialized but pattern-shared. Solvable by vertical tools built on general primitives. The plumbing is shared (embeddings, retrieval, indexing); the domain layer is configured. This handles another 15%.

The tail: custom epistemic projects. Maybe 5%. Researchers, writers, artists with idiosyncratic structures. The frontier won't focus here because the market is small. Solutions are written, not bought.

The 80% and the 15% are about market dynamics: where the frontier focuses compute. The 5% is about something the frontier can't focus on even if it wanted to.

Memory architecture has two layers. The plumbing layer (storage, retrieval, indexing, sync, decay) is solvable in general. The stance layer (what counts as worth remembering, what counts as a connection, what counts as quality, when something earns subtraction) is operator-specific and exogenous. The model can't infer the stance without the operator providing it, and providing it amounts to specifying a custom architecture.

This is where off-the-shelf hits a ceiling. Not for technical reasons, but for informational reasons. The operator's epistemic stance is exogenous information; it has to enter the system from outside. A sufficiently smart model could in principle infer a stance from observed behavior, but the inference is unstable until the operator has demonstrated enough preference signals to stabilize it, and meanwhile the stance is what's evolving through use. Externalizing it as architecture is what makes the evolution legible.

The sub-tail where the architecture is the public work. Most tail-case operators use their custom memory architecture as scaffolding for some other output: papers, deliverables, products. The architecture is private; the output is public. A narrower sub-tail inverts this. The architecture itself is part of what's published. The typed-edge graph, the GC discipline, the predecessor chain, the dipole calibration are content, not plumbing. The structural design encodes the project's epistemic posture, and that encoding is one of the things readers come for.

Hari sits in this sub-tail. It can't adopt off-the-shelf even in principle, because adopting would replace the public artifact.

Adoption guidance. Use general primitives where they exist: embeddings, retrieval mechanisms, sync infrastructure, agentic loops. Don't rebuild what generalizes. Keep custom: typed-edge schema, GC policy, quality function, provenance discipline, dipole-style calibration. These are stance-encoding; they don't generalize.

Will it stop mattering? Only for projects where the operator's epistemic stance is either inferable from observation or default-compressible to a small parameter set. For projects in the sub-tail, the stance is what's being developed. There is no default to compress to, and the inference is unstable while the stance evolves. The distinction doesn't dissolve. It sharpens as off-the-shelf saturates the head, because the contrast between general and custom becomes more visible.

Frontier focus. The compute goes where the market is: the head. Memory features in Claude, GPT, Gemini will saturate the 80%. Customization affordances will eat into the 15%. The 5% tail continues to require bespoke engineering, increasingly built on top of general primitives. This is the shape that serves both ends. The head gets the product. The tail gets cheaper parts to build with. The 80-20 frontier focus accelerates the tail rather than threatening it.

Timelines. Four predictions, each with a clock. AI timelines have been a graveyard for confident point estimates. What follows is calibrated ranges with attached mechanisms; trust the shape more than the dates.

The dominant uncertainty is rate-of-frontier-compute. If capability growth stalls (training data, energy, architecture limits), every clock above slips two to four years. If it accelerates, every clock pulls in one to three. The shape stays the same; the speed changes. A second uncertainty: if a future system can infer stance from a small number of interactions, thin-waist convergence holds but the amount of stance work the operator does shrinks faster. The structural prediction survives; the operator's per-piece work doesn't.

The thing worth watching is whether the shape holds (head saturates first, body next, tail last, thin-waist eventually), not whether the dates land. Dates are best-guesses. The shape is the prediction.

The thin-waist future. Plumbing generalizes. Stance doesn't. The operator-specific layer is exogenous, and exogenous information has to enter the system from outside. No amount of frontier compute resolves this, because the resolution requires either inferring what the operator has not yet decided, or compressing what is structurally novel. The two-layer model predicts that memory-architecture frontier work will look like increasingly capable and configurable plumbing with a thinner and thinner waist where the stance is specified. The plumbing waist is what off-the-shelf gives. The stance is what the operator brings.