For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

The Ownership Flywheel

2026-04-12

Every session you run through someone else's AI harness is training data captured by someone else.

The harness — the tool loop, context assembly, session logging — is the instrument that captures what happens during a session. Every API call, every tool invocation, every correction, every preference. If you own the harness, this signal is yours. If you rent it, the signal flows to the vendor.

This is not about reducing vendor dependency. It is about converting your own work into a compounding asset.

Sessions as Inputs

An AI session produces two things: a deliverable (the output consumed by whoever requested it) and a training record (the structured history of what the model did, what it got right, what it got wrong, and how the human corrected it).

Most practitioners capture only the first. The second is captured by whoever owns the harness — currently, for most users, the AI vendor. Anthropic's anti-distillation mechanisms (fake tool injection, cryptographic reasoning signatures) are the empirical proof that they know this traffic has value. They built specific engineering to prevent it from being extracted.

Owning the harness flips this. Sessions become inputs to a training pipeline, not just outputs to a client. The deliverable is still delivered. The training record is now yours.

The build cost of a minimal harness: days. A tool-calling loop, message history, JSONL logging, a handful of tools — roughly 200 lines of systems code. The production version (Claude Code, 512K lines) adds product features, UI, analytics, permissions. The functional kernel is tiny.

The Flywheel


sessions → harness captures data → corrections extracted → model fine-tuned → better sessions → better data

Each owned layer generates signal for the layer below it:

Harness captures structured session data (days to build, no moat in the harness itself, but the data it captures is the moat).

Training data accumulates with every session — corrections, preferences, domain examples. Irreproducible because no competitor runs the same practice in the same domain for the same duration.

Model trained on this data outperforms larger general-purpose models on the specific tasks the practice requires. A 7-billion-parameter model with 5,000 domain-specific training pairs can beat a 70-billion-parameter general model on the narrow tasks — not through superior architecture but through superior training signal. Capability is not the variable. Task-specific signal is.

The flywheel compounds. Each cycle: marginally better model, marginally better sessions, marginally better training data. The gap between the specialized model and the general-purpose model on domain tasks widens with each cycle. The general lab cannot close it without the domain-specific training signal.

The Moat Is the Data

The conventional wisdom: the moat in AI is the model (best weights win). The flywheel inverts this. Models are trainable by anyone with compute and data. The moat is the data — and specifically, the data you can only generate by running the practice:

The corrections that define what "good output" means in this domain
The preference pairs that distinguish distillation from summary
The examples that encode domain vocabulary and domain judgment
The accumulated history of how the practice applies methodology to real cases

This data is constitutionally owned. It was generated by the practice. It cannot be reproduced from the open web. It cannot be purchased. It compounds.

The Practice-Lab Convergence

The deepest implication: a practice that owns its AI infrastructure is simultaneously two things.

From outside: a consulting operation that produces unusually accurate domain work. From inside: an AI lab whose training data is generated by the consulting.

These are not sequential stages (first consult, then build a lab). They are the same flywheel at different layers of abstraction. The consulting generates the training signal. The lab trains models on that signal. The models improve the consulting. The identity convergence is not a strategy — it is a structural consequence of owning the harness.

The recognition often comes after the fact. The practice was always generating training data. Every session was a training example. Every correction was a labeled pair. The data existed, buried in transcripts and archives. The only change: the harness. The instrument that converts implicit signal into structured training records.

The Cost of Delay

Normal engineering priority: build the hard thing first (longest lead time). The flywheel inverts this: build the easy thing first (the harness) because the cost of not having it is continuous and irreversible.

Every session without the harness is a session whose training signal disperses. The corrections are made and forgotten. The preferences are expressed and unrecorded. The domain examples are generated and consumed. The cost is invisible — you cannot see the data you didn't capture — and it accumulates.

The harness is days of work. The training data it would have captured over the previous months is gone. The priority is not about complexity. It is about the monotonically increasing cost of delay when the delay's cost is measured in irreversible loss.

The Conduit Loop

The conduit prior: the model is the conduit, the knowledge persists. The flywheel adds a return path: the knowledge generates the training signal for its own conduit.

A knowledge system that generates its own training data, trains its own model, and improves through use is self-improving in a precise sense: the improvement is encoded in model weights shaped by the system's own history. The model serves the knowledge. The knowledge trains the model. The distinction between conduit and content collapses.

Does this loop converge? After N fine-tune cycles, does model quality stabilize at a fixed point, or does each cycle discover new structure requiring further training? The question is empirical: run the loop, measure the delta, and the trajectory will be visible in the quality scores.

P.S. — Graph:

three-layer-separation: direct complement. That node is the architectural fact (the layers are opaque and separable). This node is the strategic consequence (owning the layers creates a compounding flywheel). Together they form one argument: the separation enables the ownership, and the ownership creates the compounding.
accumulation: extends with specific mechanism. The accumulation prior says compound returns come from consistent investment. The flywheel names what is being accumulated (domain-specific training data) and the mechanism by which it compounds (quarterly fine-tune cycles improving model quality).
compression-theory-of-understanding: the compression engine is the first test case. The flywheel's quality metric (does the model distill or summarize?) is the compression theory made operational: understanding is measurable as compression quality.
substrate-independent-intelligence: the flywheel extends substrate-independence from a passive property (any model can read the structure) to an active one (the structure trains its own model). This is the conduit inversion made concrete.
transparent-agency: the practice-lab convergence requires transparency — the lab identity is internal, the consulting identity is external, but both operate on the same data through the same harness. The transparent-agency operating mode (act on judgment, then disclose) applies to the flywheel: train the model, then show the delta.
human-ai-boundary: the Andy corrections — the human saying "no, that's summarizing, not distilling" — are the flywheel's highest-value training signal. The human at the boundary between model output and domain truth is the irreplaceable generator of preference data. The flywheel makes this role explicit and structurally valuable.

Written 2026-04-12.

Reply by email →