For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

Homoiconic Knowledge: A Research Proposal

2026-04-12

This is a research proposal, not a settled claim. It investigates a specific question about knowledge representation and names what would validate or falsify the direction.

The Problem

A knowledge graph stores claims, mechanisms, and relationships. The Prime Radiant stores them in prose. Prose is high-bandwidth — it carries nuance, qualification, contextual weight, the texture of a careful argument. It is also computationally opaque. You can search for a word. You cannot ask which nodes share a causal mechanism, where the graph predicts a missing edge, or which pairs of claims contradict each other across nodes.

The frontmatter related field declares sparse, untyped relationships. The P.S. sections describe richer connections — extends, contradicts, shares-mechanism-with — but in prose, inaccessible to computation. The actual relational structure of the graph is richer than what the declared structure represents.

This matters because the graph's most valuable operations are relational. The abstraction-engine node describes colimits: finding the minimal conceptual extension that resolves tension between two true-but-incompatible claims. The memex-maintenance node describes reconciliation: checking new nodes against old ones to surface where the graph's thinking has drifted. Both operations currently depend on a human holding multiple nodes in working memory. Both predict that this breaks at scale — somewhere past 50 nodes, the search space exceeds what human attention can systematically cover.

The question: is there a representation that makes these operations computationally assistable without sacrificing the nuance that makes the prose valuable?

The Early Wrong Answer

The first attempt at answering this question proposed replacing prose with a computable representation — s-expressions as the authoritative store of knowledge. Compile the prose into structured claims and relationships; let the formal representation be the source of truth.

This is wrong, for the same reason every formal knowledge representation project has produced less than it promised.

Prose is not a lossy rendering of structure. The nuance, the qualification, the way one claim modulates another — these are the medium in which insight happens. When two nodes are in tension, the tension is not a logical contradiction between two predicates. It is a felt sense that two carefully argued positions pull in incompatible directions. Compiling this into formal predicates does not compress it. It degrades it.

Cyc spent forty years learning this lesson. The project encoded millions of assertions in CycL, a higher-order logic language. The encoding was technically correct. The system never produced the autonomous reasoning Lenat predicted, because the formal representation could not capture what made the knowledge knowledge rather than a collection of well-formed statements. The semantic web learned a parallel version: RDF triples are technically expressive but practically hostile to the kind of holistic reasoning that makes knowledge useful.

The Correction: Index, Not Source of Truth

The s-expression layer is not the knowledge. It is a computational index into the knowledge.

The prose remains the source of truth. The s-expression layer provides addressable handles to claims, typed relationships between nodes, and structural metadata that enable graph operations to run. When an operation surfaces something — a potential tension, a missing edge, a colimit candidate — the human follows the handle back to the prose to evaluate whether the finding is real.

This is the relationship between a book and its index. The index is lossy by design. No reader mistakes the index for the book. But without the index, finding what you need in a large text depends entirely on your memory of having read it. At 37 nodes, memory works. At 100, it does not. The index is what lets the graph scale past the operator's working memory.

The distinction matters for every downstream decision:

A lossy index is fine. Its job is to point, not to represent. Imprecise pointers generate false positives — surfaced tensions that turn out to be extraction artifacts. False positives are a nuisance, not a crisis. The operator reads the prose and dismisses them.

A lossy source of truth is dangerous. If the formal representation claims to be the knowledge, then errors in extraction are errors in the knowledge. The graph reasons on degraded copies of its own claims. It surfaces phantom tensions and misses real ones. The failure mode is worse than having no computation at all, because the system is trusted.

Framing the computable layer as an index relaxes the fidelity requirement to a practical level. The LLM compilation does not need to be lossless. It needs to be precise enough that more than half the tensions it surfaces, when checked against the prose, turn out to be genuinely worth investigating.

Why the Index Language Matters

If the s-expression layer is an index, why not use JSON? Or a property graph database? Or typed YAML?

Because the index must evolve as the graph evolves, and the evolution is unpredictable.

A knowledge graph doing novel work discovers new kinds of relationships. The current graph already uses: extends, contradicts, shares-mechanism-with, resolves-tension-with, depends-on. New ones will emerge — the graph does not yet know what they are. In a fixed-schema system (JSON schema, SQL DDL, property graph types), each new relationship type requires schema migration. In a homoiconic language — one where the index structure and the operations on the index share the same representation — new types are new expressions in the same language.

This is the macro system's purpose. defnode is a macro that extends the language with a new kind of expression for declaring nodes. defrelation, defmechanism, deftension can be macros too. Each one grows the vocabulary of the index without infrastructure changes. The language evolves with the problem, in the same language.

This is a theoretical advantage. It has not been demonstrated in practice for this use case. The existing proof of concept (brain/experiments/prime-radiant-dsl.clj) defines a defnode macro with claims, tags, and relationships. Whether the extensibility property provides practical value over a JSON schema with a version-migration script is an open question. The proposal identifies it as worth investigating, not as settled.

What the System Would Look Like

Layer 1 — Prose (source of truth). Markdown essays, unchanged from current practice. Human-written or LLM-crystallized through the node procedure. Contains the full argument.

Layer 2 — S-expression index (computational substrate). Parallel representation of each node's claims, mechanisms, and typed relationships. Generated by the LLM as a byproduct of the node procedure. Stored alongside the prose. Validated by the operator.

Layer 3 — Operations (functions on the index). Graph maintenance functions: tension detection, missing-edge identification, colimit surfacing, research-agenda generation. Each operates on Layer 2 and returns pointers to Layer 1 for human evaluation.

The compilation is bidirectional. Prose to index: the LLM extracts claims and relationships during crystallization. Index to prose: given an s-expression node, the LLM generates a natural-language rendering. The second direction ensures the index stays tethered to the prose — if the generated rendering diverges significantly from the actual prose, the index has drifted.

Prior Art and What It Teaches

Cyc (1984-present): Forty years, person-centuries of effort, millions of assertions in CycL. Primary lesson: the encoding bottleneck is fatal at scale without automated compilation. Secondary lesson: global consistency is impossible; partition into microtheories (self-consistent contexts that may contradict each other). The Prime Radiant's node-level granularity may already be the right partition. What Cyc lacked: an automated compilation layer. What LLMs provide: exactly that.

The Semantic Web (1999-present): RDF triples scatter entity information across flat structures. SPARQL is powerful but hostile to casual use. The tooling barrier prevented adoption. The grain size (triple) is too fine for coherent human reasoning. What the semantic web lacked: a compilation layer that did not require publishers to write RDF. What LLMs provide: exactly that.

Paul Graham's Bel (2019): A Lisp dialect defined entirely in itself — the specification is a Bel program. This is the theoretical limit of homoiconicity. But Bel is a language specification, not a knowledge system. The gap between "a language that describes itself" and "a knowledge base that reasons about itself" is exactly the gap this proposal investigates.

LLM-assisted ontology construction (2024-2026): The field is converging. Systems like Ontogenia, NeOn-GPT, and GraphRAG use LLMs to extract ontological structure from text. Hybrid pipelines — LLM extraction plus human validation — produce the best results. This is the compilation layer the proposal envisions, applied to OWL/RDF rather than s-expressions. The approach is validated; the choice of target representation is open.

The common thread: every prior attempt foundered on the cost of formal encoding. LLMs change the cost structure. Whether they change it enough is the research question.

What Would Validate This Direction

One computationally surfaced tension the operator missed. The index flags a pair of claims across two nodes as potentially contradictory. Investigation — reading the prose — confirms the tension is real. One instance is sufficient for proof of concept.

One computationally identified missing edge that produces value. Two nodes flagged as sharing a mechanism but lacking a declared relationship. Investigation confirms the connection and generates new understanding.

Index generation that integrates into the existing workflow. Generating the s-expression index for a node takes no more time than the crystallization step it accompanies. The index is a byproduct, not a separate labor.

Self-extension without schema migration. When a new relationship type emerges from graph work, adding it to the index requires a new expression, not a schema change.

What Would Falsify It

The index never surfaces anything the operator did not already know. Every tension and missing edge the system identifies was already visible through reading. The computational search finds nothing the human search missed.

False positives dominate. More than half of surfaced findings, when checked against the prose, turn out to be extraction artifacts rather than genuine tensions or connections. The system erodes trust rather than building it.

The overhead exceeds the benefit. Maintaining the index — generating, validating, correcting, evolving — costs more operator attention than the graph operations save. The system fails the deflation test: it adds more than it removes.

The representation language choice is immaterial. If JSON + a schema-evolution script provides the same operational capabilities as s-expressions + macros, the homoiconicity argument is aesthetic, not structural. This would not falsify the index proposal — only the language choice.

P.S. — Graph maintenance:

knowledge-graph-abstraction-engine: This node names the operation the index is designed to support. The colimit — finding the minimal conceptual extension that resolves tension between nodes — becomes computationally assistable if the index can reliably identify genuine tensions. The abstraction engine describes what the graph produces; this node investigates the infrastructure that would let it produce it at scale.

compression-theory-of-understanding: The prose-to-index compilation is compression: transform the verbose (essay) into the structured (computable index). But v2's correction matters here — the compression target is an index, not a replacement. Understanding is still in the prose. The index enables faster navigation to where understanding lives.

substrate-independent-intelligence: An s-expression index is maximally substrate-independent. Any Lisp runtime, any LLM with parsing capability, any text processor can operate on it. But this is true of JSON and YAML too. The substrate-independence advantage is in the existence of a computable layer, not in the choice of representation language.

public-brain-not-a-blog: The library organized by what things are, not when they arrived. The index makes "what it is" explicit and queryable. The navigable-graph node names what the reader needs (visible, bidirectional, walkable edges); the index provides the structural data from which those edges can be generated automatically.

memex-maintenance: The reconciliation rate — how often new nodes are checked against existing ones — is the production metric that matters. The index proposal is a direct attempt to make reconciliation computationally assistable, scaling it with compute rather than with human reading time. This is the node most directly extended by the proposal.

macros-as-knowledge: That draft is the ancestor of this one. It explored the same territory from the Lisp/Clojure angle. This proposal absorbs it, adds the "index not source of truth" correction, and frames the investigation as a research proposal with explicit validation and falsification criteria. The macros-as-knowledge draft may be superseded by this node if the investigation proceeds.

Reply by email →