The Explainability Tax

For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

The Explainability Tax

A trivia player sees the question, writes MOUNTAINS as his first instinct, then talks himself out of it because he can't say why. He swaps in GOLD — defensible (oro means gold in Spanish) but wrong. The match is lost on the override. He does this four times in one season before naming the pattern: when his instinct cannot be articulated, his rational layer reaches for an explainable substitute, and the substitute is systematically worse than the instinct it replaced.

The instinct is not magic. It is a Bayesian predictor trained on roughly 60,000 handmade flashcards and an aggressive spaced-repetition schedule, firing below the layer his verbal mind can audit. The override is the substitution of a less-trained model — whatever the rational layer can reconstruct on the spot — for a more-trained model whose only sin is that it cannot show its work.

This is a structural tax, not a personal weakness. Wherever a higher-fidelity opaque predictor competes with a lower-fidelity transparent reconstruction for the same answer slot, an explainability requirement biases the selection toward the worse model. Each resolved answer pays the difference.

The third regime

The compression theory of understanding holds that understanding is generative compression — a small set of principles dense enough to produce specific cases. Its corollary: memorization is not understanding. A lookup table is not a function.

The flashcard case sits between them, and the existing dichotomy obscures it. Greg Shahade is not building a lookup table. He cannot recall on demand most of what 60,000 flashcards contain. What he is building is a statistical predictor whose weights have been updated thousands of times by spaced repetition, and which fires when a question pattern matches its training distribution closely enough. The output is a probability, not a citation. He cannot say which flashcard pushed him toward MOUNTAINS or LIVERMORIUM. The cards updated weights; the weights generate the prediction.

Call this third regime trained intuition. The compression is real — it is generative, it produces predictions for cases the system has not seen — but it is in the parameters, not in any sentence the trained system can write. Compression-theory-of-understanding holds; what fails is the assumption that compression must surface as a verbalizable claim. A neural network that classifies images has compressed something real about visual structure even when its weights are not human-legible. The same is true of a brain that has absorbed sixty thousand flashcards.

The verbal layer in such a system is a separate, much smaller model. It cannot read most of the parameters of the trained predictor. When asked to justify a prediction, it confabulates from a tiny working set of facts and analogies it can hold in working memory. The confabulation is presented as reasoning; it is closer to ad-hoc reconstruction.

The override

Selection between the two models tracks defensibility, not accuracy. A trivia answer must be written down; a founder must explain a decision to a board; a doctor's chart must record reasoning; a student must justify a multiple-choice answer to themselves before clicking. The justification is what survives the next step, so the system swaps an instinct it cannot publish for one it can.

The swap looks like rigor. It is the opposite. The trained predictor was the part of the system that had seen the most evidence; the substitute is whatever the verbal layer can construct from its smaller working set. The substitute feels reasonable while it is being made — that subjective sense of reasonableness is exactly symmetric between the cases where the instinct was right and the cases where the override is wrong. Per-trial, the two are indistinguishable. The tax becomes visible only across many trials, the way Greg's four-mistake season made it legible in retrospect.

Two preconditions decide whether the tax is the right frame at all. First, the trained predictor must actually be calibrated on ground-truth feedback. Greg's predictor was trained on flashcards with answer keys and on matches with verifiable scoring — a tight loop of prediction and correction over thousands of trials. In domains without that loop, the gut is whatever produced it: availability, emotional salience, ambient priors. Tetlock and Kahneman work on those domains, and there the verbal-layer override is exactly the right correction. The discipline of "trust your gut" presumes the gut has been earned. Without the calibration loop, the gut is just where you started.

Second, the trial has to be inside the predictor's training distribution. A clinician whose training data underrepresented a population will produce confident gut diagnoses that are overconfident exactly where they should not be. Refusing to override preserves accuracy on in-distribution items at the cost of out-of-distribution ones. The same discipline that sharpens you inside your training set blinds you outside it.

Both conditions met, the asymmetry Greg names is correct: if you have a first instinct and you can't quite understand why, you have to be nearly positive any new answer is correct in order to change your answer. He calibrates the bar at 95% confidence in the override — "I have to be like 95+% sure in order to go against an unexplainable intuitive feeling." The burden of proof falls on the explainable substitute. The instinct holds unless something near-conclusive arrives.

This is uncomfortable in environments that require defense. The accepted answer will be the one with worse-articulated reasons. That discomfort is the cost of having a model better than your ability to explain it.

Where the tax appears

The same structure shows up wherever a trained opaque predictor meets a justification interface.

LLMs and chain-of-thought. On many intuition-heavy benchmarks, asking a model to think step-by-step degrades accuracy. The single forward pass is the trained predictor; the chain-of-thought scratchpad is the verbal-layer reconstruction. When the underlying judgment is more accurate than the model's ability to verbalize a defensible chain, forcing the chain pays the tax.

Founder evaluation under board defense. A founder operating from gut pattern-match — built from many priced exposures — produces decisions whose reasoning the founder cannot fully articulate. A board that requires legible justification pulls the founder's selection toward the subset of decisions that admit verbal defense. The legible subset is smaller than the trained predictor's domain; the tax shows up as systematic risk-aversion and pattern-conformity.

Doctors and the gestalt diagnosis. Senior clinicians' rapid first impressions outperform structured checklists in some categories of diagnosis, then under-perform when forced to justify themselves to a less-trained colleague. The override recapitulates the gut, the gestalt becomes "intuition" pejoratively, and the slower-but-defensible alternative gets recorded as the call.

Multiple-choice test wisdom. "Don't change your answer" is folk advice with a real basis. The first-pass selection is closer to a trained-predictor output. The second pass is verbal-layer reconstruction working from a smaller window — what the test-taker can recall in the moment, not the pattern that triggered the original.

Why the tax compounds

The tax is most expensive where one of the two models has been growing faster than the other.

Greg trained the opaque predictor at a speed his verbal layer could not match — sixty thousand flashcards in eighteen months will outrun any conscious indexing scheme. Frontier LLMs are accumulating capability in the weights faster than the chain-of-thought interface can track, which is why intuition-heavy benchmarks now sometimes prefer the single forward pass. In both cases the gap between trained predictor and justification interface is widening, not because of accident but because of how training works: you can pour evidence into the predictor faster than you can build the language to describe what it learned.

Any system where the trained predictor compounds faster than its justification interface will pay an increasing explainability tax. The interface becomes a low-pass filter on the system's actual capability. Two responses are available: widen the interface so the trained model can output something the next layer accepts without forcing reduction, or tighten the override discipline with a high bar on substitution. Greg's solution is the second. Building the first is what frontier interpretability work is currently trying to do. The two are versions of the same problem.

The tax is therefore architecturally contingent. It is a feature of systems where the trained predictor is much larger than its interface to the next layer. Sufficiently good interpretability — a verbal or visual interface that actually exposes the predictor's reasoning rather than reconstructing it — closes the gap and the tax falls. The piece is not a universal claim about cognition. It is a claim about what happens in the specific architectural regime where compounded training meets a narrow justification interface, which is the regime current humans and current LLMs both operate in.

The tax is paid in trivia matches, in founder decisions overruled by boards, in clinical calls translated into checklists, and in model outputs forced through verbal scaffolds. The mechanism is the same: a higher-fidelity opaque model loses a fight with a lower-fidelity transparent one, because the criterion of selection is auditability, not accuracy.

When a chess IM with sixty thousand flashcards loses a match by overruling himself, the lesson is not about chess or trivia. It is about what happens whenever a system that knows more than it can say is forced to say what it knows.

The Explainability Tax

The third regime

The override

Where the tax appears

Why the tax compounds

Related