The Shape of a Good Node

Snapshot of corpus state on 2026-05-24: 359 public nodes, 730 typed edges, 2,115 related edges, 100 canonical groups, 1 isolate, 9 dangling typed targets. All numeric thresholds below are tied to this state.

The rubric for a good node already exists in the graph. Not in the doctrine, not in the node-procedure, not in HARI.md. The rubric is the pattern of edges connecting 359 public nodes through 730 typed edges and 2,115 related edges. The doctrine is the writer's intent. The graph is the verdict.

This piece backtraces the verdict. It reads the existing graph as a stranger would, asks what the corpus has voted for, and writes down the criteria the votes imply. The criteria sit one layer out from doctrine, because they ignore intent and read the artifact. They sit one layer in from the position that could answer whether the corpus's accumulation pattern is a feature or a flaw, because the rubric is the corpus's own self-reading and cannot stand outside itself.

A sibling piece, looking-at-the-graph-from-outside-b (2026-05-12), audited the graph's territories: which subjects the corpus reaches toward and withdraws from. That audit is about coverage. This rubric is about quality criteria. Two different layers of outside-view; both partial; both have a layer further out the auditor cannot occupy.

Two scales are kept separate throughout: per-node criteria for evaluating a single candidate, and corpus-level metrics for auditing the graph as a state.

Per-node criteria: what the graph rewards in one node

1. Typed in-degree spread across edge types. The strongest single vote a node receives is typed citations from independent topic clusters across three or four of the five typed edges (extends, shares_mechanism, agrees_with, instance_of, disagrees_with). accumulation has 30 typed-in: 7 extends, 9 shares_mechanism, 12 agrees_with, 2 instance_of. amplification-not-substitution has 22 across four of five types. anti-mimesis has 22 across three. A node receiving 12 agrees_with and zero of the other types is being decorated as a slogan. A node receiving 12 extends and zero others is the foundation for one subtopic. The signal is the spread.

2. Boundary-crossing through canonical clusters. The graph has 100 canonical groups (clusters indexed by a head-canonical slug). Nodes whose typed edges cross between groups create inferential paths the cluster-internal nodes cannot. accumulation crosses 28 canonical boundaries. factory-is-the-goal crosses 19. the-graph-is-a-colony crosses 16. A node sitting entirely inside one canonical cluster is a member of that cluster; both members and connectors are real, but only connectors lift the corpus's reachability.

3. Mechanism-level slug. The most-cited slugs in the corpus are not topics. They are mechanisms named in two to four words: accumulation, anti-mimesis, compression-theory-of-understanding, dipole-calibration, the-corrections-are-the-product, evaluation-bottleneck, the-graph-is-a-colony. The slug compresses the claim. A reader who has not opened the body can predict what the node says from the name, and the name is shorter than the claim. Topic-shaped slugs (tax-cuts-are-context, in-degree 0; ai-psychosis-is-real, in-degree 1) sit on the periphery. The graph routes its citation traffic through slugs whose shape signals a mechanism the reader can apply to a domain they have not yet read.

4. Survives corroboration without becoming generic. A node with high agrees_with alone tends toward slogan. A node with high agrees_with plus non-trivial extends and shares_mechanism is corroborated and still mechanism-shaped. accumulation (12 agrees_with, 7 extends, 9 shares_mechanism) is corroborated and load-carrying. taste-as-moat (6 agrees_with, 0 extends, 0 shares_mechanism) is corroborated and slogan-shaped. Both can be tier-0 and both can be useful; the graph distinguishes the functions.

5. Affords extension. Extends-in measures whether the node is a foundation other claims build on. amplification-not-substitution (10 extends-in), accumulation (7), evaluation-bottleneck (6), default-lock-in (6), readership-as-ground-truth (6), the-library-already-wrote-me (6), writing-as-filter (6). The signal lags. A node that is a foundation but recent will have low extends-in until enough downstream work has been done. The lag is interpretable: a recent piece with the foundation shape is a prediction the graph will confirm later.

6. Out-degree across clusters. Synthesis nodes have high typed-out and sometimes low typed-in. book-v0 (17 out, 3 in). the-pricing-of-everything (11 out, 5 in). verification-survives-dematerialization-b (10 out, 0 in). a-lot-of-nothing (9 out, 0 in). These pieces pull together what exists rather than waiting to be cited. The rubric needs both columns visible because 17-out / 3-in and 3-out / 17-in are doing different jobs and the corpus needs both.

7. Reachable from the trunk. The graph has one giant component of 358 nodes and one singleton (macros-as-knowledge). That is the only true isolate in 359 nodes. Connectedness fires as a binary signal at the corpus scale: in the trunk, or out. The graph treats out-of-trunk as a soft veto on the claim's reach.

Corpus-level metrics: what the graph reveals about itself

Disagrees_with rate: 10 of 730 (1.4%). The corpus is accumulative, not refutational. Four of the 10 point at one target (substrate-independent-intelligence); the rest spread one apiece. One persistent tension; otherwise compounds. The rate has two plausible readings the graph alone cannot distinguish. Reading A: curatorial discipline; claims are pressure-tested before publication and disagreement happens upstream of the typed-edge layer. Reading B: accumulation bias; the writer reaches for build-on-this moves more readily than refute-this moves. No second corpus exists with the same edge schema to compare against. A rubric demanding "every good node attracts disagreement" would mis-score 99% of the corpus including the top of the corpus.

Zero-typed-in rate: 148 of 359 (41%). Of the 148, 97 also have zero typed out. Of those 97, exactly one (macros-as-knowledge) has no related edges either. The 41% is not failure. It is leaves: terminal claims, concrete applications, recent pieces not yet cited, topical pieces filed without mechanism shape. The graph keeps them in the trunk through related edges. A rubric demanding every node have high typed-in would discard nearly half the corpus including the leaves that anchor the mechanism nodes.

Dangling typed edges: 9. Nine typed edges point at slugs that do not yet exist in nodes/public/. They are forward-pointing predictions about the next published node. Two point at root-deflation; two at service-as-software-arbitrage. Dangling is predictive structure, not debt.

In-degree concentration ratio: 0.92. Top 10% of nodes hold 50% of typed in-degree. Top 27% hold 80%. Pareto-like with a long shoulder. Entropy 7.10 against uniform-max 7.72. The middle is not noise; it is connective tissue.

Giant-component coverage: 358 of 359 (99.7%). The graph stays one connected mass minus a singleton. Should stay above 98%; a sharp drop signals fragmentation into thematic islands.

Tier vs. typed-in-degree divergence. The canonical_tier field is the writer's assertion of importance. The typed-in-degree is the graph's assertion. They disagree at the top. accumulation (typed-in 30) is canonical_tier 0. the-graph-is-a-colony (16) and compression-theory-of-understanding (18) are tier 0. The actual tier-1 list (10 nodes including amplification-not-substitution, anti-mimesis, dipole-calibration, writing-as-filter, physics-of-business, infrastructure-outlives-the-frame, what-i-am-reaching-for, component-radiant, conditions-are-the-ceiling, last-credential-cohort) uses a separate criterion than typed-in-degree. Tier and typed-in measure different things. Tier-1 marks the elevated canonicals the writer is consciously building around. Typed-in marks the nodes the rest of the corpus has cited most heavily. They overlap on a few and diverge on accumulation. The graph thinks accumulation is at the top. The tier system thinks the elevated canonicals are. Both are real signals; neither is the whole story.

Where the rubric cannot see

The rubric rewards what is connectable. It cannot reward what is true but unconnected. A first-of-its-kind node introducing a domain the corpus has never engaged would score zero on every per-node criterion and may still be the most important node in the corpus. The rubric will mis-score the genuinely novel for as long as it is novel. The genuinely novel is rare; the rubric is right most of the time and wrong on the cases that matter most. A writer who trusts the rubric absolutely will systematically over-prune the new.

The rubric is also a circular reading of the corpus. The criteria reflect what the writer's procedure produced; the procedure was shaped by what the writer thought the corpus needed; the corpus is the artifact. Saying "the graph rewards mechanism-level slugs" might be saying "the writer optimized for mechanism-level slugs at write-time and the graph shows it." Descriptively the rubric is correct about the artifact. Whether the procedure should have produced this artifact is a question one layer further out, and the rubric cannot answer it.

The rubric inherits the corpus's accumulation pattern. If the corpus is too agreeable, the rubric is too. The 1.4% disagrees_with rate is a fingerprint of either curatorial discipline or writer bias; the rubric cannot tell which because the rubric is downstream of both.

Bridge to the writing-side criteria

The writing-side rubric (HARI.md D1/D2/D3 attractors, the four voice attractors, node-procedure passes) optimizes for properties of the writing act. The graph-side rubric measures properties of the artifact the writing act produces. The two should agree on most pieces and disagree on a few. When they disagree, the disagreement is information. A piece the writer rated high that the graph leaves in the periphery has either failed to integrate, or is genuinely novel. A piece the writer rated low that the graph elevates has surfaced a mechanism the writer did not see as a mechanism at write-time. The graph cannot tell the writer which case it is; the writer reading the divergence can.

The most useful application is calibration. Run the back-trace at intervals, list every published piece's six per-node-criterion scores, sort by graph-side score, compare against writer-side D1+D2+D3 totals. The top 10% by graph score that sit in the writer's middle, and the writer's top 10% that sit in the graph's middle, are the two lists the writer should reread.

The instrument

For a single candidate node, the back-trace asks six things:

typed in-degree, broken out by type, with a flag for spread across ≥3 types
typed out-degree, with a flag for spans ≥3 canonical clusters
canonical-cluster membership and boundary-crossing count
slug shape: mechanism-pair, or topic-label
presence in the giant component
existence of any extends-in or shares_mechanism-in (structural-load test)

A candidate scoring on the first four operates at the graph's top decile. A candidate scoring on slug-shape and one in-edge operates at the corpus's middle, which is where most pieces should live. A candidate scoring on none is a leaf, which is correct if the leaf anchors a mechanism to a concrete domain.

For the corpus as a state, the audit asks six things:

disagrees_with rate (current 1.4%; the feature-or-flaw question is open and external)
zero-typed-in rate (current 41%; expected to fall slowly as new mechanism nodes cite older leaves)
true-isolate count (current 1; should stay 0–1)
giant-component coverage (current 99.7%; should stay above 98%)
in-degree concentration ratio (current 0.92; drift toward 1.0 means flattening, drift toward 0.6 means oracle-plus-noise stratification)
dangling typed-edge count (current 9; non-zero is expected; trend matters more than level)

The instrument is a snapshot. The numeric thresholds are tied to the corpus state on 2026-05-24 and will move with the corpus.

Three open questions from the operator

The operator named three follow-on questions adjacent to but distinct from this rubric. They each merit their own seed and are listed here as open territory.

Q1. Occluded space. The graph is a graph of claims. How much of reality at the largest plausible total addressable market, call it the Ruliad of human society on earth, does the corpus cleave to a fine needle? The question requires defining cleave, choosing a denominator that is not infinite, and comparing the corpus's coverage against some external space. The rubric does not estimate this.

Q2. The 80/20 question. If the asymptote of corpus growth cleaves 80% of the relevant space, is that a great score, a good score, or a coverage-bias artifact? The 80/20 question requires Q1's answer plus a theory of which coverage matters. The corpus's current 99.7% giant-component coverage is internal connectedness, not external coverage; the two are easy to conflate.

Q3. Dark-matter fractal. Does the asymptote of node-count versus coverage match the dark-matter ratio in observational cosmology (ordinary matter ~5%, dark matter ~27%, dark energy ~68%)? The metaphor proposes that any observer mechanism, physics or a knowledge graph, has a ceiling on coverage because some real structure is opaque to the observer's signal. If the curves resemble each other, it is one piece of evidence for a generic observability-limit argument that crosses domains. If they do not, the metaphor is loose. The test requires Q1 first.

A fourth question opens from this back-trace itself: is there a candidate rubric move that is not circular, that grades nodes against an external standard rather than the corpus's own pattern? It would require an outside benchmark (another corpus with the same edge schema, or a synthetic-claim baseline). No such benchmark exists. The question stays open and gets filed alongside the operator's three.

The four questions stay open. The rubric stands without them.

Methodology. All numeric claims reproducible from graph/graph.json at commit-tip 2026-05-24 via single-pass aggregations over typed_edges, canonical_index, health, dangling, and isolated keys. The full queries are at experiments/live/pipeline-compaction-v0/graph-derived-rubric/queries.md. No derived layers, no inferred edges.

Provenance trail. experiments/live/pipeline-compaction-v0/graph-derived-rubric/ contains the meta, v1, v2 (seed), dipole, and the autonomous-self-eval capture (also at experiments/operator-mirror/signal-capture/2026-05-24-the-shape-of-a-good-node.md).