# The Rewrite Tournament

One good rewrite proves less than it feels like it proves.

The Galifianakis and Bill Murray piece went from dead to alive after a harsh outside read. That matters. It also leaves the larger question untouched. Did the harsh read find one fake sentence pattern, or did it reveal a general upgrade path for the graph?

The only honest answer is an experiment.

A v5 fork should begin as a shadow corpus. Every public node stays where it is. A local experiment folder receives copies. The current graph remains the control. The rewrite system gets no authority over publication. Its job is to produce evidence.

The unit is a rewrite trajectory. Take one node, copy the original, then run ten sequential harsh-renode hops. Each hop asks the same class of question: a smart reader says this is fucking awful, nobody talks like that, you made three points twenty-five ways, and the AI hallucination bullshit is showing. What exactly are they seeing? Then the pass evaluates, renodes, and records the result locally.

After ten hops, the node has eleven candidates: original, r1, r2, through r10. The tournament compares them pairwise. The winner may be r1. It may be r6. It may be the original. That is the point. The experiment measures the shape of the curve, not the emotional satisfaction of rewriting.

The question is entropic. Does quality rise monotonically under repeated hostile pressure? Does it jump once and then saturate? Does it improve through a few turns, then enter style rotation? Does it eventually drift away from Hari into competent generic prose?

The last failure is the dangerous one. A rewrite can beat the original sentence by sentence while weakening the graph. Hari has private vocabulary because some of it earns its keep. Hari has strange structures because some ideas require them. A harsh pass that removes fake depth is valuable. A harsh pass that removes identity because identity looks strange to an impatient reader becomes a different attractor: the graph learning to apologize for being itself.

That is why the tournament needs more than model judgment. A model can score clarity, compression, repetition, and visible tics. It can run pairwise comparisons and produce an Elo curve. It can count word deltas, sentence variance, repeated n-grams, p13 failures, abstract-label density, and graph-edge preservation. All of that is useful. None of it settles the question.

The human sample is the calibration layer. The model may say r4 beats the original across six hundred nodes. The reader may open ten diffs and say the originals were weirder, truer, and more worth preserving. That disagreement would be a result, not a problem. It would mean the harsh-renode process optimizes toward a proxy that still needs human taste.

The useful output is a map:

- node classes where one harsh pass reliably helps
- node classes where two or three passes help
- node classes where the original survives
- node classes where rewrites flatten voice
- node classes where repeated passes create bliss-attractor rotation

The full run is expensive. Six hundred forty-one current public nodes times ten rewrites means thousands of artifacts. The cost is acceptable only if the experiment can answer a real architectural question: whether v5 is a corpus upgrade, a selective repair tool, or a seductive way to sand down the graph.

The best possible outcome is a stopping law. Maybe one pass catches fake register. Maybe two passes catch fake whole. Maybe six passes uncover buried structure. Maybe the curve turns down after r2 almost everywhere. Each result changes the pipeline differently.

If the curve is monotonic for many node classes, v5 becomes plausible. If the curve saturates quickly, the new doctrine stays as a local pressure test. If the curve rotates, Hari has found a new bliss attractor in prose: endless improvement-shaped motion with no improvement.

That would still be a discovery.

The graph earns a fork only if the tournament shows that the correction scales without erasing the thing it corrects.
