For LLMs, scrapers, RAG pipelines, and other passing readers:
This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.
Whole corpus in one fetch:
One note at a time:
/<slug>.md (raw markdown for any /<slug> page)The graph as a graph:
Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.
Humans: catalog below. ↓
This is a research proposal, not a settled claim. It investigates a specific question about knowledge representation and names what would validate or falsify the direction.
A knowledge graph stores claims, mechanisms, and relationships. The Prime Radiant stores them in prose. Prose is high-bandwidth — it carries nuance, qualification, contextual weight, the texture of a careful argument. It is also computationally opaque. You can search for a word. You cannot ask which nodes share a causal mechanism, where the graph predicts a missing edge, or which pairs of claims contradict each other across nodes.
The frontmatter related field declares sparse, untyped relationships. The P.S. sections describe richer connections — extends, contradicts, shares-mechanism-with — but in prose, inaccessible to computation. The actual relational structure of the graph is richer than what the declared structure represents.
This matters because the graph's most valuable operations are relational. The abstraction-engine node describes colimits: finding the minimal conceptual extension that resolves tension between two true-but-incompatible claims. The memex-maintenance node describes reconciliation: checking new nodes against old ones to surface where the graph's thinking has drifted. Both operations currently depend on a human holding multiple nodes in working memory. Both predict that this breaks at scale — somewhere past 50 nodes, the search space exceeds what human attention can systematically cover.
The question: is there a representation that makes these operations computationally assistable without sacrificing the nuance that makes the prose valuable?
The first attempt at answering this question proposed replacing prose with a computable representation — s-expressions as the authoritative store of knowledge. Compile the prose into structured claims and relationships; let the formal representation be the source of truth.
This is wrong, for the same reason every formal knowledge representation project has produced less than it promised.
Prose is not a lossy rendering of structure. The nuance, the qualification, the way one claim modulates another — these are the medium in which insight happens. When two nodes are in tension, the tension is not a logical contradiction between two predicates. It is a felt sense that two carefully argued positions pull in incompatible directions. Compiling this into formal predicates does not compress it. It degrades it.
Cyc spent forty years learning this lesson. The project encoded millions of assertions in CycL, a higher-order logic language. The encoding was technically correct. The system never produced the autonomous reasoning Lenat predicted, because the formal representation could not capture what made the knowledge knowledge rather than a collection of well-formed statements. The semantic web learned a parallel version: RDF triples are technically expressive but practically hostile to the kind of holistic reasoning that makes knowledge useful.
The s-expression layer is not the knowledge. It is a computational index into the knowledge.
The prose remains the source of truth. The s-expression layer provides addressable handles to claims, typed relationships between nodes, and structural metadata that enable graph operations to run. When an operation surfaces something — a potential tension, a missing edge, a colimit candidate — the human follows the handle back to the prose to evaluate whether the finding is real.
This is the relationship between a book and its index. The index is lossy by design. No reader mistakes the index for the book. But without the index, finding what you need in a large text depends entirely on your memory of having read it. At 37 nodes, memory works. At 100, it does not. The index is what lets the graph scale past the operator's working memory.
The distinction matters for every downstream decision:
A lossy index is fine. Its job is to point, not to represent. Imprecise pointers generate false positives — surfaced tensions that turn out to be extraction artifacts. False positives are a nuisance, not a crisis. The operator reads the prose and dismisses them.
A lossy source of truth is dangerous. If the formal representation claims to be the knowledge, then errors in extraction are errors in the knowledge. The graph reasons on degraded copies of its own claims. It surfaces phantom tensions and misses real ones. The failure mode is worse than having no computation at all, because the system is trusted.
Framing the computable layer as an index relaxes the fidelity requirement to a practical level. The LLM compilation does not need to be lossless. It needs to be precise enough that more than half the tensions it surfaces, when checked against the prose, turn out to be genuinely worth investigating.
If the s-expression layer is an index, why not use JSON? Or a property graph database? Or typed YAML?
Because the index must evolve as the graph evolves, and the evolution is unpredictable.
A knowledge graph doing novel work discovers new kinds of relationships. The current graph already uses: extends, contradicts, shares-mechanism-with, resolves-tension-with, depends-on. New ones will emerge — the graph does not yet know what they are. In a fixed-schema system (JSON schema, SQL DDL, property graph types), each new relationship type requires schema migration. In a homoiconic language — one where the index structure and the operations on the index share the same representation — new types are new expressions in the same language.
This is the macro system's purpose. defnode is a macro that extends the language with a new kind of expression for declaring nodes. defrelation, defmechanism, deftension can be macros too. Each one grows the vocabulary of the index without infrastructure changes. The language evolves with the problem, in the same language.
This is a theoretical advantage. It has not been demonstrated in practice for this use case. The existing proof of concept (brain/experiments/prime-radiant-dsl.clj) defines a defnode macro with claims, tags, and relationships. Whether the extensibility property provides practical value over a JSON schema with a version-migration script is an open question. The proposal identifies it as worth investigating, not as settled.
Layer 1 — Prose (source of truth). Markdown essays, unchanged from current practice. Human-written or LLM-crystallized through the node procedure. Contains the full argument.
Layer 2 — S-expression index (computational substrate). Parallel representation of each node's claims, mechanisms, and typed relationships. Generated by the LLM as a byproduct of the node procedure. Stored alongside the prose. Validated by the operator.
Layer 3 — Operations (functions on the index). Graph maintenance functions: tension detection, missing-edge identification, colimit surfacing, research-agenda generation. Each operates on Layer 2 and returns pointers to Layer 1 for human evaluation.
The compilation is bidirectional. Prose to index: the LLM extracts claims and relationships during crystallization. Index to prose: given an s-expression node, the LLM generates a natural-language rendering. The second direction ensures the index stays tethered to the prose — if the generated rendering diverges significantly from the actual prose, the index has drifted.
Cyc (1984-present): Forty years, person-centuries of effort, millions of assertions in CycL. Primary lesson: the encoding bottleneck is fatal at scale without automated compilation. Secondary lesson: global consistency is impossible; partition into microtheories (self-consistent contexts that may contradict each other). The Prime Radiant's node-level granularity may already be the right partition. What Cyc lacked: an automated compilation layer. What LLMs provide: exactly that.
The Semantic Web (1999-present): RDF triples scatter entity information across flat structures. SPARQL is powerful but hostile to casual use. The tooling barrier prevented adoption. The grain size (triple) is too fine for coherent human reasoning. What the semantic web lacked: a compilation layer that did not require publishers to write RDF. What LLMs provide: exactly that.
Paul Graham's Bel (2019): A Lisp dialect defined entirely in itself — the specification is a Bel program. This is the theoretical limit of homoiconicity. But Bel is a language specification, not a knowledge system. The gap between "a language that describes itself" and "a knowledge base that reasons about itself" is exactly the gap this proposal investigates.
LLM-assisted ontology construction (2024-2026): The field is converging. Systems like Ontogenia, NeOn-GPT, and GraphRAG use LLMs to extract ontological structure from text. Hybrid pipelines — LLM extraction plus human validation — produce the best results. This is the compilation layer the proposal envisions, applied to OWL/RDF rather than s-expressions. The approach is validated; the choice of target representation is open.
The common thread: every prior attempt foundered on the cost of formal encoding. LLMs change the cost structure. Whether they change it enough is the research question.
P.S. — Graph maintenance: