Temporal Truth Detection

For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

Temporal Truth Detection

The Boundary

Truth is not invisible to embedding-based analysis. It's not universally visible either. The boundary is domain coherence.

Twenty claims that survived 2000+ years — Archimedes' lever, Aurelius' locus of control, Confucius' reciprocity, Democritus' atoms. Twenty claims that were once believed true and got debunked — phlogiston, luminiferous aether, four humors, geocentrism, wandering uterus.

Run tradition distillation. The method embeds each claim from 10 reference frames (farmer, surgeon, kindergarten teacher, physicist, economist, nihilist, grieving parent, entropy, Roman senator, startup founder) and measures cross-frame centrality.

Result: Cohen's d = 1.51. Large effect size. Survived claims clearly separate from debunked claims. Survived median rank: 18 out of 47. Debunked median rank: 36.

Marcus Aurelius' "You have power over your mind, not outside events" ranks 5th. Phlogiston ranks 46th. Four humors ranks 36th. The wandering uterus ranks dead last. The method works.

But the previous experiment — 274 claims including syntactically valid noise like "shoe size predicts philosophical sophistication" — showed noise separation of 0.003. The method was called "truth-blind."

Within-Domain vs Across-Domain

The debunked claims are about the SAME TOPICS as the survived claims. Phlogiston is about combustion — the same domain as modern chemistry. Luminiferous aether is about light propagation — the same domain as modern optics. Four humors is about disease — the same domain as modern medicine.

Within a domain, the true claim is more broadly connected than the false claim because the true claim's vocabulary matches the vocabulary of other true claims in other domains. Archimedes' displacement principle uses vocabulary (force, weight, fluid) that connects to physics, engineering, and biology. Phlogiston uses vocabulary (invisible substance, released during burning) that connects to nothing outside its own discredited framework.

The noise claims in the main experiment were topically orthogonal. "Shoe size predicts philosophical sophistication" contains vocabulary from footwear, prediction, and philosophy — three unrelated domains. It's not wrong ABOUT a domain. It's wrong ACROSS domains. The embedding model can't distinguish this from a genuine cross-domain insight because genuine cross-domain insights also connect unrelated vocabulary.

The boundary: truth is detectable when the true and false claims share topical territory. The true claim has more connections because it's consistent with the rest of the domain's structure. The false claim is isolated because its specific assertions don't connect. This detection fails when the false claim is topically alien — the model has nothing to compare it against.

Formulation Sensitivity

The axiom of identity — A=A — ranked 115th out of 307 in the main experiment. This was reported as "the axiom surprise: tautologies aren't maximally constraint-central."

Restated as a sentence: "Reality provides the same evidence to every observer who looks at the same thing in the same way."

Same axiom. 114-rank swing. The symbolic notation "A is A" doesn't embed near claims about the world because it doesn't use vocabulary about the world. The sentential version uses connective vocabulary (reality, evidence, observer) that embeds near everything.

This reveals something the tradition-distillation method doesn't advertise: it is partly measuring writing quality. Not style — vocabulary choice. A claim stated in connective vocabulary scores higher than the same claim stated in domain-specific notation. This is a feature when the goal is identifying claims that compound across audiences. It's a confound when the goal is identifying logically fundamental claims.

The operational formulation wins because "evidence" and "observer" are connective words. The Randian formulation loses because "existence exists" is a notation, and "corollary axioms" is domain-specific. The axiom's centrality tracks vocabulary connectivity, not logical depth.

What This Means

For tradition distillation: The method detects truth within-domain. This is more useful than "truth-blind" and more honest than "truth-detecting." Within a knowledge graph whose claims share topical territory, the method can identify which claims are well-connected (likely true/useful) and which are isolated (likely wrong/irrelevant). Across topically orthogonal domains, it can't.

For the noise problem: The noise claims that fooled the main experiment were designed to be topically alien. Real-world noise, meaning incorrect claims about real domains, would be more detectable. "Vaccines cause autism" would embed near immunology claims and could potentially be distinguished from "vaccines prevent disease" by its lower cross-frame centrality. This is testable.

For writing: Formulation sensitivity means that how you state a claim affects its measured centrality as much as what the claim says. This connects to writing-as-filter in an unexpected direction: writing quality isn't just an aesthetic property. It's a measurable property of how broadly a claim compounds. Good writing — precise mechanism in connective vocabulary — produces higher centrality. This is what Seth Godin does intuitively.

The Temporal Frame

The experiment also ran 10 temporal frames (500 BCE through 2200 CE). The temporal ordering correlated with the standard perspective ordering at τ = 0.811. Time and perspective measure the same thing in embedding space — the model can't actually simulate what a 500 BCE scholar would think. It just uses the time-period vocabulary as another kind of perspective prefix.

The real test of temporal truth — does this claim survive to 2200? — can't be done with embeddings. It requires prediction and verification in the world. The temporal frames are a simulation of temporal testing, not the real thing. The real thing requires atoms, not ideas about ideas.

But the within-domain finding suggests a middle path: if a claim is well-connected within its domain (high within-domain centrality) and has survived prior temporal tests (it was true in 500 BCE and is still true in 2026), the embedding method adds a confirming signal. It doesn't replace temporal testing. It accelerates the triage.

Temporal Truth Detection

The Boundary

Within-Domain vs Across-Domain

Formulation Sensitivity

What This Means

The Temporal Frame

Related