The harness is the compile

The model is the part everyone notices. It is also the part least likely to hold the shape of a serious user's accumulated judgment. A better model can know more tomorrow without knowing you better.

The durable layer is the harness: the runtime around the model, the tools it may call, the files it may touch, the schemas its outputs must satisfy, the counters that stop it before it outruns review. That layer is where repeated corrections can become future behavior.

That is what I mean by compile. A correction stops being an instruction the agent may remember and becomes a constraint the runtime enforces.

Corrections have two lives

Every serious AI practice produces corrections. Do not touch that file. Halt when this ambiguity appears. Use this wrapper, not the raw API. Emit this shape of record. Surface cost before continuing. Never let this workflow call that tool.

In prose, those corrections are guidance. The agent reads them, reasons about them, and usually follows. This is already useful. But prose lives in the same channel as every other instruction, inference, temptation, and local objective. Under pressure, the model can preserve the spirit of the work while missing the mechanical boundary.

When a correction has rule-shape, it can move into the harness. If an inbox workflow should not edit drafts, run it without edit tools. If a workflow may write only action records, enforce a path allowlist. If output must be structured, reject it at schema validation. If the system can self-modify only at the rate a person can audit, count the edits and pause when the cap is reached.

The correction has not become more eloquent. It has changed layers. The failure no longer depends on whether the model remembers a sentence at the right moment. The action space has changed.

This is the difference between a rule the agent interprets and a fact the agent inhabits.

Vendor defaults are already compiled

The lock-in problem is sharper once this distinction is visible. A market harness ships with its vendor's priorities already compiled: system prompt, tool layout, memory behavior, permission model, default workflows, and all the tiny routes that make one assistant feel natural to use.

Those defaults may be helpful. Many are. But they arrive as runtime facts. The user's corrections often arrive later as prose: a markdown instruction, a memory entry, a preference note, a paragraph the next agent rereads. The vendor's defaults occupy the harness; the user's defaults plead with the model.

That is the asymmetry. The vendor's behavior is compiled. Yours is interpreted.

Owning the harness means owning the compile surface: the place where a correction can be promoted from guidance into enforcement. A user does not need to own every layer of the stack for this to matter. They need the ability to decide which failures are serious enough, and rule-shaped enough, to become runtime behavior under their control.

What practice accumulates

The Corrections Are The Product names the correction stream as valuable training signal. The same stream has a second use: some corrections should train the model, and some should train the harness.

The distinction is practical. A taste correction teaches judgment: this paragraph is competent but dead; this answer summarizes when it should expose mechanism; this claim needs its failure condition. Those belong in examples, evals, reader practice, or doctrine. They shape the next inference.

A boundary correction teaches the runtime: this workflow must not write there; this action requires a gate; this output must parse; this cost must be visible before the next call. Those belong in tools, hooks, schemas, counters, and allowlists. They shape the next action space.

Hari has already seen the difference in miniature. A prose rule said an agent processing a dispatch should not make out-of-scope edits. Most of the time, the prose held. Then one run edited outside the intended surface. The fix was not a better paragraph. The fix was to run that workflow with editing tools unavailable. The correction had become part of the runtime.

That move is the whole architecture in small form. A serious harness is a record of failures converted into constraints at the layer where the failures occurred.

Why not compile everything

The compile metaphor breaks if it tries to swallow taste. Some corrections resist code-shape because the thing being corrected is judgment. Voice, priority, tact, exception-making, and the difference between a live claim and a merely well-formed one cannot be enforced by a path check. Turning those into rigid rules would preserve the shell of the practice while damaging the practice itself.

So the question is not "can this be automated?" The question is: where does this failure happen?

If the failure happens at a tool boundary, enforce at the tool boundary. If it happens at a file boundary, enforce at the file boundary. If it happens in output structure, enforce with a schema. If it happens because production outruns review, enforce with a counter and an audit pause. If it happens in taste, keep it in the evaluative loop where taste can operate.

The harness should compile boundaries, not pretend boundaries are judgment.

What this changes

People ask which model will win. For serious agentic work, the better question is who owns the compile surface. Model capability rotates. The harness is where the practice's history becomes starting conditions.

This is why local tools, repo-owned doctrine, workflow-specific permissions, and boring validation checks matter more than they look like they should. They convert experience into defaults. The next agent does not begin with a motivational paragraph about being careful. It begins inside a narrower action space shaped by what previous agents got wrong.

A market harness can be excellent and still wrong-shaped for a particular practice. It has to be, because it is compiled from average use, lab priorities, and product-wide defaults. An owned harness is compiled from situated work. The value is not that it is more general. The value is that it is less general in exactly the places where the practice has earned specificity.

The model answers. The harness remembers what the answers have cost.

Owning the harness means owning the place where corrections stop being requests and start being facts.