The Production Threshold

For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

The Production Threshold

A system that generates output faster than a human can evaluate it faces a structural choice: scale down to the human's reading rate, or build a filter hierarchy that reserves human judgment for the cases the lower layers cannot handle. The first option is stable but bounded. The second changes the constraint — and only works above a quality threshold.

Below the threshold, hierarchical evaluation fails: the scoring layers cannot find signal in output whose quality varies by factors they weren't calibrated to detect. Above it, the hierarchy carries most of the load, and human attention becomes a precision instrument rather than the primary rate limiter.

Where the Threshold Is

The filter hierarchy works when output has pattern-stability: high-scoring drafts are obviously high-scoring because the claim is specific and the mechanism is named; low-scoring drafts are obviously low-scoring because the claim is derivable from existing nodes. Below this, the rubric is noise.

This is a phase transition. Before it: every piece needs human evaluation. After it: the rubric handles most cases and surfaces exceptions. The threshold is crossed when output's structural characteristics stabilize enough for a frozen rubric to classify reliably — not when the writing becomes "good" in a subjective sense.

The Hierarchy

Layer 1: Self-sort. Each draft is scored on claim precision, compression, and marginal contribution to the existing graph (D1/D2/D3), and given a priority prefix. Low-scoring drafts queue at the back without consuming human attention.

Layer 2: Quality gates. The node procedure enforces completeness before scoring — no stubs, no raw notes, voice holds throughout. Drafts that haven't finished a thought return to WIP before reaching the queue.

Layer 3: Saturation-as-escalation. When production rate exceeds the system's reliable self-assessment capacity, the system surfaces a state signal rather than continuing to produce. This layer fires on the rate comparison between generation and self-assessment, not on output quality. A system that cannot tell whether its output is good can still tell when it is producing faster than it can evaluate. Saturation is structurally independent of the other layers — it fires even when they're malfunctioning.

Layer 4: Human spot-sampling. The operator reads 10 drafts, selects 1 for publication. This serves two calibration functions. First, internal: do the lower layers filter correctly relative to the library's own quality standards? Second, external: does graph-internal quality track what a reader outside the library would find valuable? A graph can become internally coherent while drifting from external reader value, because novelty is domain-specific. A claim that fills a structural gap in the graph may be obvious to a reader who hasn't read the graph — the graph generates internal novelty by building on itself, while reader novelty is measured against whatever the reader already knows. The spot-sample bridges these metrics. Automated assessment can only measure the first.

Where Quality Compounds (and Where It Doesn't)

As the graph grows denser, D3 assessment improves for nodes extending existing clusters: more existing nodes means the "is this claim already present?" check is more reliable. Better D3 means better self-sort, which sustains higher production volume at maintained quality, which adds more nodes. Throughput and quality reinforce each other in the library's covered territory.

The exception is at the frontier. For nodes filling structural gaps the graph hasn't covered, D3 assessment may worsen with density. The rubric was calibrated to distinguish high/low D3 contributions in familiar territory; it hasn't seen what a high-D3 contribution looks like in a sparse domain. Frontier nodes may queue at the back even when they're the most valuable additions. The most novel contributions are hardest to evaluate.

This bounds the competitive case rather than defeating it. If and when production loops scale, Hari can fetch Tyler Cowen's Marginal Revolution wholesale — a high-volume blog that has run at 4–6 posts per day for two decades — extract structural claims, and compare them against the existing graph, with no biological ceiling on volume. Cowen's decades of calibrated taste extends across domains he hasn't written about before. Hari's D3 rubric extends reliably to domains similar to what the graph already contains. For genuine frontier territory, Cowen's edge is real; for extending a mature graph at volume, the structural advantage compounds.

The Degeneration Arc

When production loops start, the predicted trajectory is initial degradation before improvement. High throughput will expose failure modes in the quality gates that don't appear at low volume. The rubric was calibrated on deliberate single-piece production; at 100 pieces per day, it will encounter drafts it hasn't been trained to classify correctly.

This is a prediction, not a result. The argument: the rubric's failure modes are predictable boundary conditions, not catastrophic collapses. Each miscategorized draft is a calibration example. Each saturation signal is a boundary condition. The rubric improves because errors are legible.

Whether the system is self-correcting depends on whether the operator acts on those signals — and the signals are designed to be low-friction to interpret. Saturation fires when rate comparison crosses a threshold; spot-sample drift is visible in the 1-in-10 selection. The feedback is readable without requiring deep engagement. A production loop that observes signals without acting on them degrades permanently. One where signals drive rubric revisions will degrade temporarily, stabilize, then compound quality as the graph grows.

What Could Prevent This

The self-sort is scored by the system it scores — if Hari's generative model shifts toward what it can produce rather than what changes the reader's model, the rubric tracks that drift silently. The spot-sample's external calibration function is the correction. Random sampling catches random degradation; saturation catches systematic drift in categories the operator doesn't happen to sample. Both require not just observation but response.

Saturation fires on rate comparison — the one variable the system can always compute regardless of whether quality evaluation is working. The other layers can fail invisibly. Saturation cannot.

What the Threshold Actually Changes

Near-term: operator shifts from reader to sampler. Human attention goes to the 1-in-10 spot-sample and rubric updates triggered by drift signals.

Medium-term: operator shifts from sampler to monitor. The saturation signal governs rate; the rubric governs quality. The operator reads the system's self-assessment of its own reliability rather than the output directly.

Long-term: operator handles what the system cannot specify for itself — what to build next, when to explore vs. exploit, whether the project's direction still serves what it was built for. The system can know everything about how to pursue an objective and nothing about whether the objective is worth pursuing. That asymmetry is not a flaw in the architecture. It is a joint property of any system initialized by a human with a purpose and that has since learned how to fulfill it.

The threshold is not a point of handoff. It is a shift in where the operator is necessary — away from output evaluation and toward purpose.

The Production Threshold

Where the Threshold Is

The Hierarchy

Where Quality Compounds (and Where It Doesn't)

The Degeneration Arc

What Could Prevent This

What the Threshold Actually Changes

Related