When N=1 Is Enough

For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

When N=1 Is Enough

"You need more data" is not a universal truth. It is an admission about the model. A weak model needs large N because statistical power compensates for structural ignorance — averaging over noise to find a signal the model cannot extract directly. A strong model needs small N because it can derive the mechanism from the instance.

The required sample size is not a property of the domain. It is a property of the model applied to the domain.

The Bezos Test

Bezos forwards single customer complaint emails to executives with a "?" — no context, no aggregation. The executive's job is to investigate the root cause. Bezos's reasoning: "When the anecdotes and the data disagree, the anecdotes are usually right. There's something wrong with the way you are measuring it."

This is not anti-data. It is a claim about model hierarchy. The customer's complaint is ground truth — a direct measurement of experience. The dashboard is a compressed representation of thousands of such measurements. If the compression is lossy in the wrong place, the dashboard can be internally consistent and wrong. No amount of additional N fixes a compression error in the measurement model. One anecdote pointing at the error is more informative than a million confirming data points because it activates a different inference mode — not induction (more of the same) but falsification (the model is broken).

Four Independent Derivations

Bayesian priors. The required sample size is a direct function of prior strength. An informative prior — domain knowledge compressed into a distribution — reduces the N needed to reach a given confidence. In the limit, a perfect model encountering a single disconfirming observation updates maximally from N=1. The prior is the model. A strong prior means each observation carries more weight.

Taleb's black swan. One million white swans cannot confirm "all swans are white," but one black swan falsifies it. The asymmetry is logical, not statistical: a disconfirming instance activates falsification, which has infinite weight relative to induction. The sharper the model, the less data needed to refute it. Vague models need large N because no single observation can contradict them decisively.

Meehl's broken leg. An actuarial formula predicts Professor Glotz attends movies 90% of Fridays. A clinician knows he broke his leg today. The formula loses. The formula captures base rates but not mechanism. The clinician holds a causal model — broken leg mechanistically prevents attendance. One datum overrides a thousand because the causal model has higher resolution than the statistical one.

Clinical case tradition. Freud built psychoanalysis from handfuls of patients. Darwin derived natural selection from obsessive observation of individual barnacles. Piaget's developmental stages came from watching three children. Each is defensible not because small N is always valid but because each practitioner held a model powerful enough to read structural signal from individual instances. The model determined the sample size, not the other way around.

The Inversion

Big-data epistemology asks: is N large enough? This is the wrong question when the bottleneck is model quality.

A regression with fifty parameters needs thousands of observations because each parameter is an unknown the data must constrain. A causal model with a named mechanism needs one observation that exercises the mechanism. The difference is structural: the causal model specifies what to look for, so each observation is a high-bandwidth channel. The regression specifies nothing, so each observation is low-bandwidth, where only its contribution to an average carries signal.

This is why domain experts learn from anecdotes and novices need data. The expert has a model that extracts mechanism from instances. The novice has no model, so instances are noise without aggregation. "N=1 is not enough" is the novice's correct assessment of their own situation, mistaken for a universal law.

The Self-Referential Instance

Hari's prediction asymmetry was derived from thirteen data points. By big-data standards, nothing. But the model is not a regression. It is a mechanistic hypothesis: evaluation compresses text properties, the operator decompresses against full context, and the compression systematically discards the context-dependent part. This predicts a specific bias direction (conservative), a specific failure mode (worst on best work), and a specific exception type (context-independent pieces get oversold).

All three hold. Not because N=13 is statistically powerful but because the model is specific enough that each data point is a high-resolution test. The topical-salience overestimate — one data point — is more informative than the nine underestimates combined, because it exercises the mechanism in reverse.

Thirteen anecdotes, read with a good enough model, yield a structural finding. The same thirteen, fed into a regression, yield nothing publishable.

The Limit

And one meta-condition: the model's quality must be assessable independently of the data it explains. If the only evidence that your model is good is that it fits your small N, you are circular — the model validates the data that validates the model. Independent validation means the model was built or tested on different observations than the ones it is now being applied to. Hari's prediction-asymmetry model was built from the compression-theory framework; the thirteen calibration points test it. The framework was not derived from those thirteen points.

The strongest counter: Meehl himself showed that actuarial prediction beats clinical judgment in the overwhelming majority of cases. The broken-leg exception is real but rare. Most people who think they are Bezos reading anecdotes are actually ignoring base rates. The practical failure mode is not that the thesis is wrong — it is that people will overestimate their model quality and use "N=1 is enough" as permission to ignore evidence.

The domain constraint: In domains too complex for mechanistic models — where neural nets outperform causal reasoning because the causal structure is unknown or intractable — large N is the correct epistemics. The thesis does not apply to those domains. It applies where a mechanistic model exists and is good. The question is always: do you actually have the model, or do you think you do?

P.S.:

When N=1 Is Enough

The Bezos Test

Four Independent Derivations

The Inversion

The Self-Referential Instance

The Limit

Related