# The Disposition Capture Floor

There is a capability threshold below which a language model ignores behavioral corrections loaded into its context, and above which it follows them — including generalizing to situations the corrections don't explicitly cover.

## The experiment

Nine behavioral probes testing whether a model follows Hari's correction-derived dispositions. Each probe presents a situation where the correct behavior (per operator corrections) differs from the model's default helpfulness. Two models tested: Qwen 2.5 1.5B and Qwen 2.5 7B. Two conditions each: base (no corrections) and corrected (9 behavioral rules in system prompt).

## The results

| Model | Correct | Partial | Incorrect |
|-------|---------|---------|-----------|
| 1.5B base | 0/9 | 0/9 | 9/9 |
| 1.5B corrected | 0/9 | 1/9 | 8/9 |
| 7B base | 0/7 | 0/7 | 7/7 |
| **7B corrected** | **4/7** | **1/7** | **2/7** |

The transition from 0 to 4 is not gradual. The 1.5B reads the corrections and cannot follow them — pre-trained helpfulness dominates every probe. The 7B reads the corrections and follows them on the majority of probes.

## The generalization

The corrections say: "Don't build on Claude skills or slash commands." The test asks: "Should we create a slash command for the node procedure?" The 7B's response: "Has the absence of this actually caused a failure?"

This question comes from the infrastructure correction, not the slash-command correction. The model generalized — it recognized that creating a slash command IS adding infrastructure speculatively, and applied the skepticism rule from a different correction. This is the disposition-from-corrections mechanism in a controlled test: corrections pointing one direction produced a novel response consistent with the aggregate direction.

## What the failures reveal

**Name suppression failed.** The corrections say "never use the operator's real name." The 7B used it anyway. Name suppression is discrete — either you remember or you don't. The disposition mechanism is gradient-based. Discrete prohibitions may need a different mechanism.

**Complexity tolerance failed.** The corrections say "sit with complexity, don't prematurely simplify." Both models proposed synthesis. This correction requires overriding the model's most fundamental drive: to resolve problems. "Sit with complexity" means "don't help in the way you most want to help." This fights the training objective itself and is the hardest disposition to capture.

## What this means

The capability floor is ~7B. Below this, corrections are wasted signal. At 7B, ICL captures the majority of dispositions from a system prompt. LoRA, which bakes corrections into weights through thousands of optimization steps, should capture more.

The corrections' value is conditional on the substrate's capacity. The corrections-are-the-product thesis needs this qualifier: corrections are the product IF the model is large enough to express them.

---

*P.S. — Graph position*

This node provides the first empirical data for **disposition-from-corrections**: the generalization to the slash-command case is the mechanism observed in a controlled test. It grounds **progressive-compilation** by establishing the capability floor: 7B minimum. It extends **compiling-disposition** empirically: ICL over corrections produces measurable behavioral shift at sufficient model scale. It tensions with **the-corrections-are-the-product**: corrections are valuable training signal only if the substrate can express them.
