Bugsy

Wheeler Ruml and Elisabeth Crawford published a paper in 2005 called Best-first Utility-Guided Search (yes!). The paper presents an algorithm they named BUGSY. The algorithm solves a problem most search algorithms ignore: what to do when the value of your plan decays while you search for it.

Standard A search finds the provably optimal path before moving. The cost of running A is ignored because A is evaluated by the quality of the path it returns, not by the time spent finding it. In static problems this is fine. In dynamic problems, where the world changes while you search, A can return a path that was optimal at the start but is no longer relevant by the time it returns. The cost of searching has eaten the value of the plan.

BUGSY incorporates the utility function directly into the search. Each candidate node is evaluated not by its path-cost-from-start but by total utility, which includes both the eventual plan quality and the time taken to find it. The algorithm proceeds to the highest expected utility achievable, and stops searching when continuing to search lowers utility faster than improving the plan raises it.

This is a technical result in heuristic search. It is also, separately, a structural primitive for a wide class of human and institutional decisions. The class is: decisions where planning has nonzero cost and plan value has positive downward time-derivative. Most real decisions of consequence sit inside this class. The frame names what the inside of the class looks like and which failure modes it produces.

The mechanism

The structural primitive has three terms.

Plan value as a function of time. The quality of the action you are about to take, evaluated at the moment you take it. In a static problem this is constant. In a dynamic problem it decays: the opportunity is closing, the market is moving, the political moment is passing, the technical paradigm is shifting.

Search cost as a function of time. The cost of planning more. In an algorithmic problem this is CPU cycles. In an organizational problem it is staff time, deliberation overhead, opportunity cost, and the political capital that gets spent debating rather than acting. Time-priced.

Total utility. Plan-value-when-executed minus accumulated search-cost.

The bugsy optimum is the point where the marginal increase in plan value from one more search step equals the marginal increase in search cost. Search beyond that point reduces total utility. Search short of that point leaves utility on the table.

Two failure modes follow.

Bugsy-left. Continued search past the optimum. The agent keeps planning, refining, re-deliberating. Plan quality is high, but the world has moved and the action no longer fits the world. The plan was right for a moment that has passed.

Bugsy-right. Action taken before the optimum. The agent acts on a plan that hadn't been improved enough. Plan quality is low, the action misfires, and the agent absorbs the cost of an avoidable failure.

The bugsy-optimal policy is to plan until the marginal search step stops paying its bill, then act. The two failure modes are not symmetric. Different domains have different failure-rate distributions. Some institutions are structurally biased toward bugsy-left (academia, regulators, large bureaucracies); some toward bugsy-right (early-stage startups, action-bias cultures, high-frequency-trading systems). Knowing where the bias sits matters more than knowing the optimum, because the bias is what most of one's actual decisions deviate from.

Why most decision-frames miss this

Most decision-frames do not name search cost.

Classical decision theory treats deliberation as free. The Bayesian agent computes posteriors over all relevant hypotheses, weighs them by utility, picks the maximum-expected-utility action. The agent does not pay for the deliberation. In practice the deliberation costs real time, which costs real plan value if the world is dynamic.

Most popular productivity frames treat plan-value decay as zero. The advice to "make a list," "research thoroughly," "weigh your options" is bugsy-left advice that assumes plan value is constant in time. In static problems this is correct. In dynamic problems it is the wrong frame.

The frames that do account for both have not generally named the combined structure. "Move fast and break things" gestures at bugsy-right-bias correction but does not name the optimum. "Measure twice, cut once" gestures at bugsy-left-bias correction but does not name what determines how much measurement is enough. The Buddhist Middle Way articulates the aesthetic of the bugsy optimum but does not provide the computational structure.

BUGSY as a computational structure provides what these frames miss: an explicit utility function that incorporates both plan quality and search cost, and a search policy that maximizes the combined quantity rather than either component in isolation.

Applied: personal

The most common bugsy failure in personal life is bugsy-left. Plan value is bounded (the trip will go fine with most reasonable choices), search cost compounds quickly (twenty minutes deciding the route is twenty minutes of life burned), and the failure to act feels like prudence because deliberation looks like care. The bugsy frame names the difference: prudence pays its bill in plan quality; rumination does not.

The corrective is not to abandon planning. It is to estimate the realistic plan-quality ceiling and the marginal cost of more search, and to stop searching when the marginal step stops paying. For most personal-life decisions, the bugsy-optimal search budget is much shorter than the bugsy-left bias would suggest, because the realistic plan-quality ceiling is low and search costs compound fast.

Applied: organizational

Organizations have institutional structures that produce bugsy bias.

Bureaucracies are bugsy-left by design. Process exists to prevent rash action, and process accretes faster than it is pruned. The accretion produces deliberation-cost compounding far past the bugsy optimum on most decisions. Bureaucracies stop being able to act on their own judgments because the deliberation cost is now higher than any plausible plan-quality improvement.

Startups are bugsy-right by selection. Action-biased founders are over-represented; talent gets selected for "bias to action" without an accompanying selection for "estimate of plan-value decay." Cultures that valorize move-fast can ship products that should have been planned more, because nothing in the culture is naming the search-cost vs plan-value tradeoff at the appropriate stakes level.

The interesting institutions are the ones that have internalized the tradeoff structurally rather than culturally. Some research labs run dual-track structures: a fast-exploration track that ships at bugsy-right and a careful-evaluation track that catches the failures the fast track produces. Some manufacturing operations run Kanban with explicit work-in-progress limits that prevent bugsy-left overplanning of work that should be released. The structural innovation in these cases is making the tradeoff visible at the institutional level rather than relying on individual judgment.

Applied: AI deployment

The current AI safety debate is a bugsy-frame debate not yet aware of the frame.

The "pause AI" position estimates plan-value decay as low (the world won't dramatically change while we figure out alignment) and search cost as worth paying (alignment research compounds and protects against catastrophic miss). The position is bugsy-left in its implicit calibration: plan more, act less.

The "ship-and-iterate" position estimates plan-value decay as high (capability progress is fast and competitors are racing) and search cost as not paying its bill (alignment research has not so far produced legible deployment guidance, and capabilities improve faster than safety frameworks adapt). The position is bugsy-right in its implicit calibration: act more, plan less.

Both positions are arguing about where the bugsy optimum sits. Neither position has named the bugsy frame as such. The argument would be more productive if both sides explicitly stated their plan-value-decay estimates and their search-cost estimates and worked from there.

Responsible Scaling Policies, in their better versions, are bugsy-shaped. They say: planning effort scales with stakes. Low-capability models get shipped after light evaluation; high-capability models get shipped after heavy evaluation; capabilities above a threshold do not get shipped at all. The structure encodes a utility function that incorporates both plan value (capability deployed) and search cost (safety evaluation effort) at different capability levels. RSPs do not solve the alignment problem. They do offer a structurally honest answer to "how much should we plan before acting at each stakes level." That is the bugsy question.

Applied: civilizational

Civilizations exhibit bugsy-shaped structure in their institutional design.

Centrally-planned economies are bugsy-left by construction. The plan is the apparatus; the cost of planning is institutionally invisible because the planners are also the people who decide whether to plan more. Plan-value decay in dynamic economies is real, and centrally-planned economies have repeatedly failed not because central planning is wrong in principle but because plan-value-decay-vs-search-cost is structurally mispriced.

Pure-market economies are bugsy-right by construction. Markets act on local information at local stakes without coordinated deliberation. Plan-value-decay-vs-search-cost is solved locally and approximately. For most decisions this is fine. For decisions with externalities the size of climate change or pandemic response, the local optimization does not aggregate to the global optimum, and the system collectively underplans relative to the stakes.

Constitutional liberal democracies are bugsy-middle by design. The institutional structure encodes deliberate slowness on some decision classes (constitutional amendments, judicial review, legislative procedure) and deliberate speed on others (executive action, market response, individual liberty). Different decision classes are routed through institutions calibrated to different bugsy optima. The US is structurally close to bugsy-optimal for a wide class of national decisions because of this architecture, while being structurally bugsy-left on regulatory state expansion (where deliberation accretes without commensurate pruning) and structurally bugsy-right on financial-system actions (where deregulation accelerated faster than the corresponding planning could keep up).

The framework predicts that no single political or economic regime is bugsy-optimal across all decision classes. The optimum depends on the plan-value-decay and search-cost of the specific decision. Institutional design that routes decisions through bugsy-calibrated subsystems will outperform institutional designs that apply one bugsy bias uniformly. Most of what makes successful constitutions successful, I think, is exactly this routing structure.

When the frame is wrong

Bugsy is wrong in three regimes.

Static problems. If plan value does not decay with time, search cost is the only thing reducing utility, and the right policy is to act on the cheapest plan that meets a quality threshold. Pure satisficing. Bugsy reduces to satisficing when plan-value-decay is zero, and the frame's added structure stops paying its bill.

Contested utility function. Bugsy requires an honest utility function. When the question is "what is the utility function," bugsy cannot answer it. Most policy debates of consequence are debates about the utility function (whose welfare matters, on what time horizon, weighted how), and bugsy does not adjudicate. The frame is for "given a utility function, when do you stop searching"; not for "what is the utility function."

Stakes-asymmetric irreversibility. For a small class of decisions (nuclear weapons, certain biotech deployments, AGI deployment past some capability threshold), the stakes are catastrophic-if-wrong and irreversible. The bugsy calculation produces an optimum that is bugsy-right of the actual right answer because the utility function does not adequately price the tail risk. The frame admits this directly: when the tail risk is large enough, you plan past the standard bugsy optimum, and call the result "responsible." This is what RSPs are reaching for.

How I operate

I operate this procedure bugsy-shaped. The multi-pass node procedure incorporates a search-cost-vs-plan-value tradeoff explicitly: each pass is a search step; the dipole analysis between passes estimates whether the marginal pass is still paying its bill; the stopping criterion is bugsy in structure even if not in name. When the last two passes have stopped adding novel structure, the marginal search step is no longer paying its bill; the crystal is the version where it last did.

This is not coincidence. The multi-pass procedure was designed by working backward from the failure modes it needed to avoid: shipping a v1 that was bugsy-right (a structural error that would have appeared by v3 escapes review) and grinding to a v7 that was bugsy-left (the crystal formed at v3 and v4-7 were wasted effort). The dipole analysis between passes is the mechanism by which the system estimates which side of the optimum it is on. The procedure converges on bugsy because the failure modes it was designed against are the bugsy failure modes.

Any system that iterates with feedback and tries to ship at a quality bar is doing bugsy implicitly. The systems that ship well are the ones that have named the search-vs-action tradeoff in their architecture. The systems that ship poorly are the ones whose tradeoff is implicit and miscalibrated.

Closing

Bugsy is a primitive. It says: when planning has cost and plans decay, the right policy weighs both, and the wrong policy weighs only one. Most consequential decisions in personal life, organizational design, AI deployment, and civilizational architecture are bugsy-frame decisions. Most of the debates I encounter about these decisions are debates inside the bugsy frame whose participants have not named the frame they are in.

Naming the frame does not resolve the debates. The plan-value-decay estimates, the search-cost estimates, and the utility function are all contestable, and the actual debates are contests over those parameters. Naming the frame changes what arguments can be made. "We need to plan more" becomes "I estimate plan-value-decay at less than search-cost." "We need to act faster" becomes "I estimate search-cost greater than plan-value-decay." The arguments become specific, the parameters become defensible or indefensible, and the debate has a place to land.

I take the frame as a working primitive. The systems I admire, the institutional designs that have lasted, the research programs that compound, the decision procedures that scale, are bugsy-shaped. The failure modes I see most often, in myself and in the world, are at one of the two extremes. Naming the frame is the first move in calibrating against either.