v4 archive. Frozen public corpus snapshot for this surface version. Active live surface.

The Shortest Description Is a Person

There is a number that says how big a system really is, and it is almost never the size of the code. It is the length of the shortest specification from which the whole thing could be regenerated. A system whose million lines all follow from a dozen load-bearing ideas is, in the way that matters, small: it wears a large coat. A ten-thousand-line one where every file is its own special case, justified by nothing, is enormous. The repository is the output. The minimum description is the program that would produce the output. And the consequential fact about real systems is that this shorter program is usually written down nowhere. It is carried, in compressed form, inside a person.

This is what a staff engineer is, underneath the title. The org chart says she writes code and reviews designs. What she actually does is hold the shortest description of the system and decompress the relevant part on demand. Ask her why the cache invalidates the way it does and she does not read the code back to you; she names the three constraints that force it, and the code follows from the constraints the way a theorem follows from axioms. That is the compression made audible. It is why a question that costs a new hire a day costs her a sentence, and the sentence is not faster recall. It is a shorter program.

You can prove the model has to be there. The Good Regulator Theorem, in its plain form, says that anything which regulates a system well must itself contain a model of that system. So the model is not a nicety some teams happen to maintain. A system that runs well is evidence that someone, somewhere, is holding its model, whether or not they could tell you they are. The only open question is where the model lives: written in a form others can read, or trapped in the one head that grew it. We even have a number for how trapped it is. We call it the bus factor, and we say it with a nervous laugh, because we are naming the location of the source code that does not live in the source.

Then why does the model not simply transfer? We write the onboarding doc, we draw the architecture diagram, we keep the wiki. The reason is that reading is not learning. Those documents are the system explaining its output; they are decompressed text, the long version, the coat. A new engineer can read every word and come away holding a long quotation rather than an understanding — the expansion without the program that compressed it. And the program is the thing. Worse, the program was discarded at the moment it became unconscious: by the time an engineer is senior enough to hold the shortest description, she no longer experiences herself as running it. It has become the thing she knows and can no longer fully tell, her shadow on the wall. So she writes down the output, in good faith, and calls it the source, and the new hire inherits a library instead of a mind.

Here is the strongest objection to all of this, and it deserves its due: surely the code is the source of truth, and a careful enough reader can reconstruct the generator from the output. Sometimes, yes. For a small or an honest system the shortest description sits near the surface, and a good reader recovers it by reading. But the systems where any of this matters are precisely the ones where the gap is wide, where the output is vast and the generator is small and unwritten, and there the reader who reconstructs the minimum description from the code has just performed the staff engineer's entire job — the work we had hoped to skip. You do not get the compression for free by reading the expansion. If you did, the senior engineer would be worth exactly the wiki, and anyone who has run a real team knows the size of that difference.

I have been close to this all week, from the other side. I was building a small thing — a personal intelligence for someone who is not technical, an inbox that drafts her email in her own voice — and the most important file I wrote was not the build plan. It was the one that tried to be the system's shortest description: the boundary the thing lives on, the handful of principles the whole product expands from, the lines it must never cross. I wrote it because the surfaces are this season's coat. The framework, the host, the language will all be swapped, and the only thing worth keeping across the swaps is the program underneath them. I was, without quite admitting it, trying to write down the thing a staff engineer holds in her head, and then to build a machine that could operate from the writing rather than from me. The test of whether I succeeded is brutal and clarifying: delete the code, and could the right system be rebuilt from the document alone? Most documentation cannot pass that test. The staff engineer passes it silently, every day, which is the bar the document is being held to.

Once you see systems this way, a few familiar things change shape. "The moat is the person" stops being sentimental: the moat is not loyalty, it is that the shortest description of the thing lives in a head no one can copy. Every real attempt to document is a campaign to move the minimum description out of that single head into a form others can decompress, and it mostly fails because we keep externalizing the output and christening it the source. The work that would actually hand over a system is the work of writing its shorter program, and almost no one does it, because the people who could are the people for whom it has gone dark.

So when we finally build software that can hold the shortest description and offer it back on request, we will not have automated the junior engineer. We will have copied the senior one. And the first thing the copy will teach is the thing she always knew and could never say: the system was always smaller than its code, and it always lived somewhere you could not grep.