Consensus as Counterfactual

Brian Potter's "How Long Do We Wait for New Inventions" does something that, four years ago, no individual researcher could have done in a working career. He asks an LLM, for 190 inventions, when a working example could have been built given era-appropriate workshops and skilled engineers. He spot-checks the answers, aggregates the gaps, reports the distribution. The piece exists because LLM-as-prior is now a usable research method: compress a corpus of historical writing into a queryable structure, query at scale, spot-check the outputs against the literature. The findings are downstream. The methodology is the contribution.

Take the methodology as the artifact, and the findings reframe.

Potter reports that 64% of the 166 inventions Claude dated had an "earliest plausible" date within 50 years of actual invention, 90% had "earliest straightforward" within 50 years, and 75% of post-1900 inventions had straightforward dates within 10 years. The surface conclusion: invention cycles accelerate; the gap between feasibility and realization has narrowed.

The LLM has read the historical record. It knows Edison observed thermionic emission in 1883 and Fleming built the diode in 1904. Asked when a Fleming valve "could have been built," it does not generate an independent counterfactual estimate. It reverse-engineers a prerequisite chain from the known invention date back to the nearest plausible enabler in the literature. The "earliest plausible" date is shorthand for "what does the consensus say enabled this invention, and when was that enabler available?" The "actual" date is the date. The gap is consensus-narrative compactness measured against the recorded fact: not feasibility versus realization, but historiography versus history.

The 97% factual-accuracy check Potter ran on a subset is consistent with this. The verifiable facts are about prerequisites: when thermionic emission was observed, when internal combustion engines became light enough, when high-temperature alloys became available. Those are facts the LLM has memorized. The composition of the chain, which prerequisite counts as the gate and why this one rather than that one, lives in the consensus narrative. High factual accuracy on the prerequisites validates the LLM's memory of the dictionary; the chain is the sentence.

Consensus-narrative compactness is worth extracting. It is the structural compression of how historians narrativize invention, available at the corpus level for the first time. That is real. But it cannot tell the reader whether the recorded invention date sat early, late, or central within the true space of possible histories. It can only tell the reader that historians narrativize the chain as short.

Two follow-ons sharpen this.

First, the pre-1900 / post-1900 break is partly an artifact of narrative density. After 1900, the channels that record invention multiply. Industrial research labs begin generating systematic documentation of prerequisites that individual inventors did not: GE's Research Laboratory opened in 1900 as the first in the United States, followed by corporate research-and-development across the chemical, telecommunications, and aerospace industries over the next two decades. Named-inventor priority disputes get adjudicated more thickly in print. Trade press and specialized journals multiply. The literature populates the prerequisite chain more thickly, so the LLM, asked to find the proximate enabler, lands closer to the actual invention date. The pre-1900 chains are hazier because the record itself is hazier. Some of the apparent acceleration is the field getting better at telling itself the story of invention.

Second, the surface conclusion does work the methodology cannot underwrite. "Cycles accelerate" reads as a claim about real-world tempo. The defensible claim is weaker and stranger: in the post-1900 era, the historical record narrativizes prerequisite-to-invention chains tightly. Whether the underlying tempo has changed, whether the tempo has stayed the same while the record-keeping tightened, or some mix, the methodology cannot distinguish. The commenter who raised the survivor-bias point (inventions whose prerequisites converged but no one made them never enter the corpus) was pointing at a related shape: the dataset is selected on inventions-that-happened, then narrativized backward from there.

The substitution generalizes. A query to an LLM that takes counterfactual shape ("when could X have been invented?", "what would have happened if Y?", "what is the most likely cause of Z?") defaults to consensus retrieval. The surface reads as counterfactual. The generating process is compression-over-narrative. Not always: for inventions with overdetermined prerequisite chains, the LLM's selection across paths can carry real discrimination. But the default shape is consensus retrieval dressed in counterfactual clothing. The user who reads the output as a counterfactual estimate has not noticed the substitution. The user who reads it as a consensus compression has located what the tool actually does.

Two existing nodes carry adjacent priors. Amplification, Not Substitution names the methodology-amplification move Potter makes: the LLM extends what a researcher can survey, not what a researcher can produce. Potter's piece is amplification across the historiography of invention. First-Principles Epistemology names the alternative move: working from physical or causal ceilings, not from the literature's narrativization. The complement Potter's methodology suggests but does not take is a first-principles feasibility estimate for each invention, derived independently of the historical record. Then the gap between consensus-compactness and first-principles-feasibility becomes the real measurement.

The findings are interesting. The methodology is the contribution. The substitution worth naming, in Potter's piece and in every adjacent LLM-as-method study, is the default shape: consensus retrieval dressed in counterfactual clothing.

This applies inside my own daily practice. When I ask an LLM for the "best framing" of a problem, the "missing variable" in an analysis, or the "earliest moment" a pattern was visible, what comes back is the consensus narrative on the question, compressed. Useful. Not counterfactual. The work of testing the consensus against the real shape of the case is mine.

Consensus as Counterfactual

Related