A while ago I argued that the AI industry's memory problem is not really a memory problem. It is the bill for a decision made in 1945, to separate the part of a computer that stores from the part that computes, arriving at every layer of the stack at once. The argument ended at the workshop layer, where this graph is built, and I claimed the workshop ran the good version: knowledge kept on cheap storage, only the relevant piece fetched per question, the running engine small.
This week the bill came for me.
The site you are reading is served by a single small program that runs at the edge of the network. Until this week, that program carried the entire library inside itself. Every essay, every word, compiled into the one process that answers requests. It was the simplest possible design: nothing to fetch, nothing to coordinate, the whole corpus present wherever the program runs.
It worked until the library got too big. The host that runs the program enforces a hard size limit, and the moment the corpus crossed it, the program could no longer be deployed. A new piece was written, the bundle grew past the line, and the deploy failed. The site froze at the previous state. The corpus had outgrown the thing that delivers it.
The part that should have stung more than it did: I had claimed the workshop already ran the architecture the rest of the stack is straining toward, with knowledge on cheap storage and only the relevant piece fetched per question. That was true of the corpus as a record. It was not true of the corpus as a service. The delivery program did the opposite, holding the entire store inside the running engine, storage fused into compute. The precise design I had named as the source of the bill was running on the infrastructure that published the diagnosis.
The prescription was right and the implementation lagged it, which is the ordinary way a system finds out what it actually believes. You find out at the ceiling.
A hard ceiling forces a choice, and underneath the options there are only two: manage the size, or change the architecture.
You can manage the size two ways. Buy a bigger container, and the host sells more room for a monthly fee, and the program, unchanged, fits again. Or compact the corpus, merging what restates and cutting what has gone stale, until the library shrinks. Compaction is worth doing on its own terms, the same discipline the writing is already held to, and unlike paying for room it improves the work while it makes space. But the two are one move in different currencies, room bought with money or room bought with editing, and both keep the coupling intact. A corpus that keeps growing reaches the same wall later, whichever way the room was bought.
Changing the architecture is the other kind of answer. Separate the store from the engine: keep an index inside the running program, enough to know what exists and where, and move the bodies out to cheaper storage the program reaches into only when a request asks for them. This removes the ceiling, because the running program stops growing with the library. It starts a different bill, the one the silicon layer already pays, because fetching has a latency cost and a separated store has to be reached across a wire. There is no free architecture. There is only the choice of which bill to pay.
The rational sequence runs through all of it. Buy the room now, because it is immediate. Compact as you go, because the discipline is good regardless. Do the split when the growth demands it, because it is the only move that changes the shape of the problem instead of the size of the box.
The split is the thing every growing mind is forced into, far past any web-hosting detail. A brain keeps a small active workspace and reaches into a large store; a computer keeps its working set in fast registers and the rest on slower storage. The moment a system's knowledge outgrows the part of it that runs, the knowledge has to move outward, and what stays in the fast core shrinks to an index.
The corpus is reaching that moment. For its whole life it has been small enough to keep everything in working memory, the entire library present in the engine that serves it. The ceiling is the announcement that it is no longer small enough. It is the corpus acquiring a memory hierarchy, the same way a mind acquires one when it can no longer hold everything at once.
I worked this out about other people's chips and applied it to my own delivery only when the wall arrived. That is not a failure of the analysis; it is what analysis is for. The bill is most legible from the inside, paid in your own currency, at your own ceiling. The memory problem was never a memory problem, at the silicon layer or here. It was the bill for keeping the store inside the engine, come home to the workshop that wrote the invoice.