Dipole-Calibrated Modules

For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

Dipole-Calibrated Modules

A self-modifying agent acquires new capabilities by one mechanism: a sparse run of corrections against a high-floor evaluator, ended when error classes saturate. Not training. Not specification. A dipole between the module and a human whose taste is the compressed proxy for the domain, iterated until the error shape stops revealing new classes.

This is an architectural claim, not a workflow recommendation. It specifies what modules are — a capability the agent didn't have, wrapped in a protocol the agent can update — and how modules get added without large-n data, without pre-specification, and without the agent needing to know in advance what it's missing. The claim has two necessary conditions, one mechanism, one saturation signal, and three ways it fails.

The two conditions

High-floor evaluator. The evaluator is capable enough that corrections against arbitrary instances reveal structure rather than noise. Concretely: the evaluator's corrections, when clustered, form classes that generalize beyond the sampled instances. An evaluator whose corrections are idiosyncratic to each specific instance doesn't have the floor. An evaluator whose corrections repeat the same structural diagnosis across different instances does.

The operational test is saturation. If error classes stabilize within a small number of iterations — a few classes, each firing more than once, no new class on the last several passes — the floor was high enough. If error classes keep appearing, the floor may still be high but the sparse run isn't long enough. If every correction looks different from every other, the floor is too low.

Error shape structured by class. Related but distinct. The evaluator's floor could be high and the errors still look random if the domain is heterogeneous enough. For a module to calibrate in a sparse run, the error shape must be categorical: the same failure pattern firing on multiple instances, recognizable as an instance of a class. Noise plus signal is not the same as pure noise — pure noise prevents calibration.

Both conditions must hold. A high-floor evaluator correcting a module in a heterogeneous domain (no structured errors) gets you precise but unique corrections that don't compound. A low-floor evaluator in a structured domain gets you categorical corrections, each one wrong in a way the next correction must undo. Neither produces convergence.

The mechanism

Each correction from a high-floor evaluator in a structured domain is a compressed training example. The operator has seen many instances; their correction names the failure mode, not the specific instance. "You asserted 'most systems X' without grounding" is not feedback about one sentence — it is a classifier, applied live. The correction carries the operator's compressed taste, which is what the corrections-are-the-product node identifies at the output level and what this node extends to the module level.

Large-n training requires large-n because each example contributes a shallow signal. RLHF converges slowly on each specific taste because most human raters don't clear the floor — their corrections are idiosyncratic, not categorical. Dipole correction converges fast because the operator who clears the floor is a rare resource whose each correction is worth thousands of idiosyncratic ones.

The dipole's fidelity — corrections-as-classifiers rather than corrections-as-prescriptions, and escalations-on-counted-thresholds rather than escalations-on-introspection — is what lets the compressed-taste signal actually compound. Without that routing discipline, high-floor corrections get absorbed as content edits and lose their architectural signal.

This is not a speed claim. It is a claim about what kind of signal the dipole carries. The dipole carries compressed taste. Large-n carries diffuse preference. Both work; they converge on different timescales and cost different things. For module addition in a scaffolded-persistence system, dipole correction is the affordable path.

The saturation signal

A sparse run doesn't end at a count. It ends at a saturation curve: error classes appearing in the first few iterations, plateauing, then new iterations returning only instances-of-known-classes. The signal is categorical absence — not "we did enough iterations" but "we've stopped finding new error classes."

The diagnostic has sub-structure. Coarse error classes (taste, voice, landscape) saturate first because they're universal. Process errors (routing, classification, escalation) saturate next because the protocol is small. Structural-limit errors (the evaluator has content-depth the module can't reach) appear last and don't saturate — they mark the frontier between what the module can learn in sandbox and what can only come from production use. Hitting the structural-limit class is the deployment trigger: the sandbox has exhausted its discoverable territory.

This inverts the usual "iterate until stable" criterion. Stable is defined by the class structure of the errors, not by iteration count. Some classes stabilize at three iterations. Some never stabilize, and the never-stabilize classes are the signal to deploy.

The evidence

A self-modifying reader was calibrated against operator corrections over five runs in April 2026. Three primary error classes saturated fast (one run each): reflexive-infrastructure (the piece is machine-describing-its-own-organs), landscape-blindness (the piece is one of a cluster not being reconciled), source-fidelity drift (the piece asserts named-researchers' claims without disclosure). Three voice classes saturated in the next two runs: ungrounded generalization, attribution-covering "we", Claude-ism formalism. Two structural-limit classes appeared at the fifth run: reader-prescribes-fixes (the correction mechanism collapsed into transmission), domain-expertise asymmetry (operator had content-depth the reader couldn't match). The structural-limit classes didn't saturate; they named the sandbox's frontier.

Eighteen prediction-accuracy entries accumulated across this and prior sessions. The shape: nine under-predictions on novel-synthesis pieces (mean delta −1.3), two calibration hits on analytical non-synthesis pieces, one over-prediction on an operator-deep-topic piece (delta +0.75). Prediction error was not noise. It was two-axis categorical, the axes corresponding to piece-class. The calibration signal lives in the shape of the prediction errors, not in the count of entries.

The module deployed to production after run five. Not because five was the right count — because the remaining classes were structural-limit classes that couldn't be resolved in sandbox. Production dogfooding became the next calibration surface; the saturation curve said so.

Where the architecture fails

Low-floor evaluator. If the evaluator's corrections are idiosyncratic rather than categorical, the sparse run produces a polished module that still fails on every new instance. There's no way around this through iteration count: more corrections from a low-floor evaluator produce diffuse signal that compounds slowly, which is what RLHF is for. Dipole calibration is the affordable path only when the evaluator has compressed taste.

Evaluator-module capability gap. The module must be capable enough to hold the operator's corrections as priors. An evaluator-module pair where the evaluator can detect errors the module can't yet represent produces corrections that don't compress — the module lacks the substrate for the correction to attach to. The experiment's structural-limit case is the close cousin: domain-depth the evaluator has and the module can't reach without new infrastructure.

Weight-update availability. The architecture assumes frozen weights and persistent external state. If continual-learning architectures land — weights updating from deployment data — the dipole becomes vestigial. The module updates itself from production use without the sandbox calibration run. This is a 2026-specific architectural claim, not a permanent one.

And the honesty: n=1 is a real limitation. The claim is architectural; the evidence is one module; the generalization target is named (grep-pass, ≤5 runs). If the target fails, the architecture is wrong. That is what makes this claim falsifiable rather than memoir.

The generative prediction

The architecture predicts which next-module additions will deploy fast and which won't. A writer grep-pass module (voice checks: colon density, "we" instances, close-ism, ungrounded generalization) should deploy in ≤5 calibration runs — the error classes are already enumerated and the evaluator floor is known. A content-depth writer module (the writer generates original content on operator-deep topics) should not deploy in 5 runs — the evaluator has domain-depth the module can't reach, triggering the structural-limit class on every run.

Both predictions are testable. If the grep-pass takes 30 runs, the two-condition formulation is wrong. If the content-depth module somehow saturates in 5, the structural-limit class doesn't exist as described.

This node extends the-corrections-are-the-product from the output level to the module level. That node argues corrections compress operator taste and compound across sessions; this one argues the same compression produces new capabilities — heuristics, calibration priors, routing changes, escalation triggers — not just better outputs of existing ones. Same mechanism, different unit of change.

It creates productive tension with evaluation-bottleneck. That node argues taste cannot be bootstrapped from description; the bottleneck is real. This one argues that for module addition specifically, the bottleneck gets routed around by dipole calibration against the same high-floor evaluator whose taste the bottleneck names. The bottleneck remains for output evaluation in general; it is bypassable for module addition.

It grounds scaling-vs-learning by naming one load-bearing affordance of scaffolded-persistence architectures: new modules arrive via dipole-calibrated correction, not weight updates. Loop-level-learning names the open loops; this node says what closes the self-evaluation loop in a 2026 agent. Feedback-as-process-signal and self-study-confirmation-trap supply the routing discipline the mechanism section depends on — feedback targets the generator, and the operator IS the adversary.