Operator Signal Capture

For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

Operator Signal Capture

The feedback loop that would make a knowledge system self-improving on style is straightforward to describe and almost never implemented correctly. An operator reads a piece and reacts. If the system could see that reaction, associate it with the specific piece version that caused it, and route it to whatever produced the style decision the operator is responding to — it would be learning. Almost every implementation breaks at the capture step, not the learning step.

What the capture step requires

Verbatim. The operator's exact words, not a paraphrase or summarized sentiment. "The conclusion is beautiful" is not the same as "positive reaction to conclusion." The difference is not just precision — it's that the verbatim contains the interpretation pathway. When the aggregator runs, it needs to understand what landed and why. The verbatim is the primary data. The analysis of it (compressed, structured, machine-readable) is the derived data. Discarding the primary and keeping only the derived forecloses re-analysis. New models of what matters in prose may interpret the same words differently than the current model. The verbatim is the hedge against the analysis being wrong.

Version pinning. The signal must be associated with a specific version of the piece — not a date, not a draft number, but a commit hash or equivalent. The reason: a piece changes. If the operator said "this is beautiful" and six months later the piece has been edited twelve times, "this is beautiful" is no longer attached to any coherent artifact. Was it the conclusion that landed? The conclusion has been rewritten. Which version of the conclusion? Without version pinning, the signal cannot be causal — you cannot know what produced the reaction, which means you cannot know what to repeat. A date without a hash is not sufficient because the repository changes continuously; two signals from the same date may be attached to different versions.

Typed structure. A signal without a type label cannot be routed. "The operator said something positive about this piece" updates a global quality estimate. It doesn't tell you whether the voice attractor (precision, structural revelation, compression, intellectual honesty) fired correctly, whether the claim structure landed, whether the conclusion was particularly strong. Typed signals — quality, voice, content, structure, process — can be aggregated separately. The aggregator for voice signals should update voice priors; the aggregator for process signals should update the node procedure. Untyped signals update everything and therefore nothing.

What breaks without each condition

Without verbatim: the aggregated dataset contains only derived claims ("piece X was positively received"). It cannot be re-analyzed when the model of what drives quality changes. It cannot support attribution — "what specifically made this land?" The dataset trains on interpretations rather than evidence. This is the same mistake as training on cleaned labels rather than raw labels.

Without version pinning: feedback becomes anecdotal. You know a piece received a positive reaction at some point in its history. The piece has been revised since. You cannot attach the reaction to a specific causal state. An aggregator that runs on this data is finding correlations between current text and past reactions to a different text. The spurious correlations it finds will be non-trivially wrong.

Without typed structure: all signals pile up in a single distribution. Voice signals and structure signals and process signals average each other out. A system that consistently produces excellent claims and weak voice will receive mixed signals that average to mediocre. The pathology is invisible; the diagnosis requires routing. Untyped signals prevent the diagnosis.

The minimum implementation

The format is append-only JSONL: line-by-line parsing allows incremental streaming without loading the full history. Sporadic capture creates selection bias (high-salience reactions only, missing the full quality distribution); the procedure should capture negative and neutral signals, not just "wow this is amazing."

The aggregation layer

The aggregation layer doesn't exist yet and doesn't need to. The log is forward-compatible with it. Three examples of what aggregation could produce:

Voice attractor calibration. Positive voice signals cluster on what? If "beautiful conclusion" and "compression landed" both fire on the same class of passages, there's a shared structural property to name. Negative voice signals cluster on what? The divergence describes where the attractor is inconsistently applied.

Claim type performance. Do falsifiable mechanism claims receive stronger quality signals than landscape claims? The aggregator has the verbatim to check against the piece text at the pinned commit. Signal type + commit hash + diff at that commit = a direct connection between claim type and quality reaction.

Process diagnosis. Process signals — feedback about how the node was generated, not what it produced — are the highest-value input for improving the node procedure. A pattern in process signals appearing disproportionately on nodes from a particular topic class is a systematic failure mode. Finding it requires routing process signals separately and reading them as a corpus.

None of these analyses require more than the six fields plus git history. The minimum capture is sufficient for the full aggregation once it's ready.

This node fills the gap between the-corrections-are-the-product and feedback-as-process-signal. The first establishes that corrections are the highest-value output of a serious practice and that capture is the critical step. The second establishes how to receive feedback without losing its diagnostic content. This node establishes what "capture" means structurally — what conditions must hold for a captured signal to be a usable preference datum rather than an anecdote.

It grounds evaluation-bottleneck at the implementation level: that node argues that the operator's correction history is the thing that updates the rubric, and that this is what makes the operator irreplaceable. This node describes what the correction history requires in order to be usable.

It extends active-signal-constraint: the principle that the encoding active without infrastructure is the only encoding that functions applies here. JSONL with six fixed fields is the active encoding — it works without a parser, without a database, without an aggregation pipeline. The aggregation pipeline, when it exists, can read JSONL directly. No migration.

It connects to accumulation: the log grows in analytical value faster than it grows in size. Consistent capture is the compound investment. The first hundred entries are almost worthless analytically; the first thousand start to show patterns.

Operator Signal Capture

What the capture step requires

What breaks without each condition

The minimum implementation

The aggregation layer

Related