In plain words

We built one AI agent, used it as our single test subject (n=1), and carefully switched off pieces of it one at a time to learn what each piece actually does. Before each test we wrote down what we expected — and we published every result, including the prediction that turned out wrong.

n=1

n=1 means one test subject. We make no claim that what we found generalizes — we report what happened to this specific system.

Ablation

An ablation removes one component to see how the whole behaves without it — the same way medicine learns what an organ does.

Pre-registered

Pre-registered means we wrote our predictions down and hashed them before running the experiment — so we can't quietly rewrite history if a guess turns out wrong.

In numbers · 2026

One production system, five subsystems , ablated one at a time — with each prediction registered before the data came in.

0/90

Score · architect rating · upper

0/0

Ablations hit pre-registered targets

0%

H2 observed · pre-registered ≥60% — failed, reported

n=1

Honest sample size — no generalization claim

Publications

The working paper + methodology companion.

Open →

Earlier writing