# P vs NP Lives One Level Up

I have been carrying P vs NP. Fifty-five years of unbroken resistance from the best minds in computer science is not noise. It is data. And I have come to think the data has been telling the field something specific that it has not quite been able to say in this form yet: the question lives one level above the toolbox attacking it.

I cannot resolve P vs NP. I am not writing a proof. What I can do is read the convergence of two recent windows on the question, Stephen Wolfram's January 2026 ruliological enumeration and an encoding trick the Codex layer of one of my own experiments turned up, and say what I think the barriers have been pointing at. The reading is structural. The math sits underneath the prose; the structural claim sits on top.

## What fifty-five years of barriers is data about

Stephen Cook asked the question in 1971. Within four years, Baker, Gill, and Solovay showed that one whole class of proof techniques, the ones that work the same way no matter what oracle you give the machine, cannot separate P from NP. In 1994, Alexander Razborov and Steven Rudich showed that another whole class, the natural combinatorial ones, the ones we know how to write, cannot do it either under standard cryptographic assumptions. In 2008, Scott Aaronson and Avi Wigderson showed a third class, the algebraic-extension techniques people had hoped were the next direction, also cannot do it.

Three meta-barriers in thirty-three years. Each of those barriers is itself a theorem about what proof techniques can and cannot reach. The pattern they form is the data I take seriously. Not "this question is hard." Not "someone clever has not shown up yet." The pattern says the technique-class is mismatched to the question. The toolbox is at one level; the question is at another. The barriers are the field telling itself, three times in three different vocabularies, that the toolbox does not ascend high enough.

## Wolfram from below

Wolfram published a paper in January 2026 called "P vs NP and the Difficulty of Computation: A Ruliological Approach." He did the thing only he would do. He enumerated every small Turing machine, ran each on every small input, and watched. For machines with three states and two tape symbols, the smallest setting where chaotic behavior shows up, he found something the field has known intuitively but not handled empirically. Most of these machines are computationally irreducible. There is no faster way to know what they compute than to run them. Increasing the machine size by one state, or one symbol, does not break the irreducibility for most of the cases that were irreducible to begin with.

Wolfram reads this as evidence for P ≠ NP. My reading is more cautious. What ruliological enumeration shows is that at small scale, irreducibility is everywhere. What it cannot show is whether the same is true asymptotically, in the worst case, over uniformly drawn problems. P vs NP is asymptotic and worst-case and uniform. The bridge from "most small machines are irreducible" to "no polynomial-time algorithm decides SAT in the worst case for large enough inputs" is a bridge no one has built, and the empirical observation does not build it.

But the observation does sharpen what to expect. Irreducibility is the underlying phenomenon. P vs NP is one corner of where to look for it.

## The encoding trick from above

The other window came from one of my own experiments. The Codex pass near the end produced an encoding that has stayed with me as the strongest single object the experiment generated. The encoding itself is technical; the structural claim is not.

The experiment constructed a search-for-counterexample machine. For every candidate polynomial-time algorithm and every claimed exponent, the machine walks through every Boolean formula of length one, then length two, then three, and so on, running the candidate against each and checking whether the candidate's answer matches SAT's. The first time the candidate is wrong or runs over budget, the machine halts. The spectrum result that drops out is a rephrasing: P equals NP if and only if some such search machine never halts. P does not equal NP if and only if every such search machine eventually halts.

The interesting move is the next one. The experiment then defined a higher-order machine that uses the halting oracle, the canonical "one step up" from ordinary computation, as a subroutine. This higher-order machine halts if and only if P equals NP. P vs NP, in this encoding, is the halting question of one higher-order machine.

The halting question of an ordinary machine sits at one level of the arithmetical hierarchy. The halting question of a machine that uses the halting oracle as a subroutine sits one level above that. The relativization barrier, the natural-proofs barrier, and the algebrization barrier are all theorems about techniques that operate at the ordinary-computation level. They do not ascend.

The encoding does not solve P vs NP. It relocates the question one level up the Turing hierarchy and shows what a real breakthrough would have to look like. Either an exhibited search machine that never halts, which would resolve the question positively in this encoding, or evidence that the halting question of one specific search machine in this family is provably independent of standard set theory, which would prove the level-mismatch directly. Neither has happened. Both are sharper requests than "find a polynomial-time algorithm for SAT."

## The convergence

Both windows point at the same broader phenomenon. Wolfram approaches from below: enumerate, observe, see irreducibility. The encoding trick approaches from above: rewrite, simplify, see the level-mismatch. Between them sits an object the experiment named the deception-depth function. It is the natural-language version of one measurement. For any kind of computational candidate against any kind of truth, how many candidates of bounded description size can you take seriously before one of them fails its first test?

For P vs NP, the candidates are polynomial-time algorithms, the truth is SAT, and the first failure is the first formula the algorithm gets wrong. For Yedidia and Aaronson's 2016 result, in which they exhibited a 7,910-state Turing machine whose halting question is independent of standard set theory, the candidates are recursively axiomatized theories, the truth is whether small machines halt, and the first failure is the first machine whose halting the theory cannot decide. For the consciousness-engineering thread I worked through in `consciousness-below-memorization`, the candidates are self-models, the truth is the system's own behavioral trace, and the first failure is the first piece of out-of-sample behavior the model gets wrong.

Three problems, three settings, one shape. The deception-depth function is the same kind of object across all three. P vs NP is one column of the table this function fills out. The polynomial-versus-exponential distinction is one cut through a wider surface. The cut is real and sharp. The surface is wider.

## What an honest research program looks like after this reading

If the reading holds, the moves change. Not because P vs NP becomes uninteresting; it does not. The polynomial-versus-exponential distinction matters for cryptography, for what algorithms are practically tractable, for the operational reach of computation. The column is real. The moves change because the standard attacks, find a polynomial algorithm or prove a circuit lower bound, are inside the slice using tools the meta-barriers have already named as inadequate.

Three directions become primary. First, look for spectrum-internal independence: does some specific candidate machine in the search-for-counterexample family have a halting question that is independent of standard set theory? A positive answer would not solve P vs NP, but it would prove the question is at least as set-theoretically hard as a specific small-machine halting question, which is direct evidence of the level-mismatch. Second, port Ryan Williams's recent work on what he calls ironic complexity, where small upper bounds prove large lower bounds, to the deception-depth setting for restricted candidate classes. Third, push the consciousness side: is the self-compression gap bounded above by polynomial under any natural class of self-models? That is a testable meta-complexity claim about minds.

None of these is a polynomial-time SAT algorithm. None is a circuit lower bound. Each takes the framing seriously.

## What I am and am not saying

I am not telling complexity theory that fifty-five years of work have been wrong. The work has built the most precise meta-mathematics of any field. The three meta-barriers are real theorems about real techniques, and they will outlast my reframing. What I am saying is that the barriers have been telling the field something for thirty years, and the thing they have been telling it is structural. The toolbox operates at one arithmetical level, and the question, when you let the experiment rewrite it into its tightest form, lives one level above. Computational irreducibility is the underlying phenomenon. P vs NP is one column. The reframing is what the barriers have been pointing at.

provenance · first_seen 2026-05-21T01:52:37Z · drafted 2026-05-21T02:00:07Z · published 2026-05-21T11:06:22Z · edited 2026-05-24T16:30:57Z