Write More Nodes

For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

Write More Nodes

There is a phase in every knowledge system when the right move is not to build infrastructure, optimize retrieval, train models, or design pipelines. The right move is to produce more units of the thing the system is made of. For a knowledge graph, that means writing more nodes and linking them honestly.

This is a structural claim, not a motivational one. It has empirical thresholds.

Why volume is load-bearing now

A 62-node knowledge graph was tested for what predicts its internal structure. The answer: editorial topology — which nodes cite which — outperforms 768 dimensions of text embedding at predicting connections (AUC 0.708 vs 0.580). Adding embeddings to topology adds +0.001. The text content contributes almost nothing to structural prediction that the graph's own link structure doesn't already encode.

This means every new node with declared related fields adds training data to the graph's own model. Not metaphorically — the topological features that predict structure (in-degree, out-degree, neighborhood density, their products) improve with every edge added. The graph's predictive power over itself is a function of its density.

At 62 nodes the graph is sparse. Mean degree is ~6. Many potential connections don't exist yet — not because they aren't real but because the node that would reveal them hasn't been written. The 572 embedding-based discoveries from v1 are evidence: real connections latent in the structure, visible only to an outside tool because the graph isn't dense enough to surface them internally.

Writing the nodes that fill these gaps is not "content creation." It is structural densification. Each node with honest links increases the graph's ability to predict its own future shape.

The threshold structure

Below ~62 nodes (current): The author can hold the full graph topology in memory. Editorial judgment is high-fidelity. Topology beats embeddings because the author sees structure that text similarity can't encode. The right activity: write and link. No tools needed beyond the node procedure.

~200 nodes: The author can no longer survey the full structure. Connections will be missed not because they aren't real but because the author has forgotten a node published three months ago. This is where embedding-based discovery tools become worth investing in — they compensate for finite human memory. The embedding-assisted D3 experiment is already designed for this transition.

~500+ nodes: Topological features may degrade as linking becomes noisier. This is where fine-tuned embedding models, graph neural networks, or custom projection layers justify their cost — the graph is dense enough to provide training signal, and the author's memory is insufficient to maintain edge quality alone.

These thresholds are not walls. They are phase transitions — the information structure of the system changes qualitatively at each one. The tools that matter change with it. Building the 500-node tools at 62 nodes is not just premature — it's building a tool whose input (graph density) doesn't exist yet.

What "honest linking" means

Not all node production is equal. A node that says something novel but declares no relationships adds text without adding topology. It is semantically present and structurally invisible.

The related field is not metadata. It is a structural assertion: "I am claiming that this concept connects to these specific other concepts, and not to the others I could have listed." The omissions are as informative as the inclusions.

This is why honest linking compounds but careless linking doesn't. If every node lists the same five hub nodes as related, the topology degenerates — everything connects to everything through the hubs, and second-order structure (neighborhood density, cluster tightness) collapses. The experiment showed this: in-degree alone was dominated by three hub nodes. The compositional features survived hub removal because they encode distributed structure that only emerges from specific, varied linking.

The instruction is not "write more" but "write more and link each one as if the link is the claim."

The infrastructure trap

The instinct when building a knowledge system is to build infrastructure first: the embedding pipeline, the retrieval mechanism, the scoring model, the publication workflow. This instinct is wrong at low density.

Every experiment in this system's history — v1 (claim extraction), v2 (300-frame analysis), v3 (custom dimensions), v4 (s-expression compilation), claim-landscape (307-claim benchmark) — confirmed the same pattern: the experiments are valuable as diagnostics but produce zero structural densification. The graph had 62 public nodes before the experiments and 62 after. The experiments measured the graph. They did not grow it.

The time spent designing embedding experiments is time not spent writing nodes that would make the graph denser, which would make future experiments more statistically powerful, which would make diagnostic tools more useful. The experiments are not wasted — they produced real findings (tradition distillation, topology > embeddings, truth-blindness, the hub correction). But they are second-order. The first-order activity is ingestion.

The strategic thesis says: "Write ideas worth reading in 2300. Capture how you think while writing them." Step 1 requires volume. Step 2 happens automatically through the node procedure and correction stream. No infrastructure is needed for either step that doesn't already exist.

Where this could be wrong

Quality over quantity. One canonical node (score 9, D3=3) may be worth ten mediocre ones. The accumulation node argues that direction matters more than rate. If "write more" pushes toward quantity at the expense of quality, the topology degrades. Counter: the node procedure already gates quality — D-scoring, steelmanning, entropic stopping. "More" means higher throughput at the current bar, not a lower bar.

The evaluation bottleneck. Producing nodes faster than they can be evaluated grows the draft queue without growing the published graph. The published graph is what compounds. A 200-node draft queue and a 62-node published graph is structurally the same as a 62-node graph with a long to-do list. The instruction should be "publish more," not just "write more."

Experiments produce knowledge that writing doesn't. The topology-is-the-model finding required an experiment, and that experiment produced this node. Experiments and ingestion are symbiotic — experiments motivate ingestion, ingestion provides data for experiments. The claim is about priority, not exclusion: at 62 nodes, the marginal node exceeds the marginal experiment in structural return. Both are valuable. If forced to choose, choose the node.

This node applies accumulation to the knowledge graph itself: the graph compounds through structural densification, and consistency of production matters more than intensity of experimentation.

It depends on topology-is-the-model for the empirical finding that grounds the priority claim. Without that finding, "write more" is motivational advice. With it, it's a structural argument.

It extends evaluation-bottleneck: the bottleneck is not just evaluation but the full write→evaluate→publish cycle. The draft queue is a buffer, not a product. Only published nodes compound.

It operationalizes step 1 of the strategic thesis: "write ideas worth reading in 2300" requires volume at a quality bar. The quality bar exists (D1/D2/D3). The volume does not yet.

Write More Nodes

Why volume is load-bearing now

The threshold structure

What "honest linking" means

The infrastructure trap

Where this could be wrong

Related