Institute for Agentic Research · live
Research that gives AI agents the power to act .
Most institutes publish their successes. We publish the ablations that broke our predictions.
Independent Austrian research institute. Pre-registered ablations. n=1 case studies. No hype. No black boxes. Founded by Dr. Andreas Unterweger and Gabriel Gschaider.
The system we study
Frank.ink — a stateful agent platform we built before we wrote about it.
Frank is a multi-tenant agent platform running in production on a single small VPS with no local GPU. A CPU-only vision pipeline (OCR + YOLO + CLIP + DINO), STT/TTS running locally on the host, persistent per-tenant state, and Hivemind — user-owned machines joined via Tailscale. The whole stack is the case-study subject of the working paper.
4 vCPU · 8 GB
Single small VPS
0 GPU
Inference rented, not run locally
~900 ms
p50 vision pipeline · 1 vCPU
95% recall
Internal 6-image benchmark
SYSTEM · Digital Retina
VLM-class image coverage on a CPU — without renting one.
Sixteen cooperating perceptual stages approximate the output of large vision-language models on a CPU — a local pipeline, no GPU, no external VLM. 92–95 % visual concept coverage (CLIP-lenient) and 70–87 % strict text coverage against Gemini 2.0 Flash and Llama-4 Scout. 1.5 – 2 s per image on a 4-vCPU box. Live at retina.frank.ink. Patent pending.
93.1 %
visual concept coverage (CLIP-lenient)
n = 44
images · Gemini 2.0 Flash + Llama-4 Scout
1.7 s
warm p50, 16 stages
4 vCPU
AMD EPYC · no GPU · no external VLM
Verifiable identity
Registered Austrian non-profit (gemeinnütziger Verein). The institute's legal record is public.
Registered seat
Feldkirchen
bei Graz
Austria · EU
Our method
Five subsystems, removed one at a time. Predictions hashed before each test.
We wrote down what we expected each subsystem to be doing — and cryptographically sealed those predictions before any data was collected. Four hit their pre-registered targets. One didn't. Below: what we removed, what we expected, and what actually happened.
Figure 1 · Predicted vs observed · 5 ablations
Score impact · 0 — 12 pts
Identity Forge
Within range
HitMemory accuracy 91% → 73%
Predictions Ledger
Within range
HitBrier-score 0.142 → 0.27
Thalamus
Beyond range
Honest failureAST-1 collateral — predicted ≤6, observed 8
Presence Scheduler
Within range
HitLong-horizon completion 74% → 25%
BODY block
Within range
HitNull control · confirmed null
- № 01Hit target
Identity Forge
Cross-session relationship + pact memory. The system that lets each Frank remember who you are.
Predicted
Memory accuracy degrades; user-history hallucinations rise.
Observed
Accuracy 91% → 73%. Hallucinations 4.7% → 12.4%. Hit pre-registered range.
- № 02Hit target
Predictions Ledger
The component that calibrates Frank's own confidence on predictions.
Predicted
Brier-score calibration degrades; other capabilities unchanged.
Observed
Brier-score 0.142 → 0.27. Selective and within range.
- № 03Beyond range
Thalamus
Attention-gating subsystem that mediates mode-sensitivity.
Predicted
Mode-sensitivity flattens; attention-schema unchanged (∆ −4 to −6).
Observed
Mode-sensitivity flat as predicted, BUT attention-schema dropped −8 — uncovered an undocumented AST-1 dependency on Thalamus channel-gain.
- № 04Hit target
Presence Scheduler
The background-task scheduler that keeps long-running work alive between sessions.
Predicted
Long-horizon task completion collapses.
Observed
Completion rate 74% → 25%. Hit pre-registered range.
- № 05Hit target
BODY block
Optional proprioceptive context block in the system prompt.
Predicted
Null operational drop — included as negative control.
Observed
Null. Confirmed as null control.
“We do not publish papers about systems we cannot ablate, audit, or shut down.”
— On deployment discipline
In numbers · 2026
One production system, five subsystems , ablated one at a time — every prediction registered before the data came in.
73/90
Score · architect rater · upper bound
5/5
Ablations hit their pre-registered targets
28%
H2 observed · pre-registered ≥60% — failed, published as-is
n=1
Honest sample size · no generalization claim
Publications + transparency
Everything on the record, downloadable.
Working paper, methodology companion, raw markdown sources, registry-of-record. Verify it yourself.
Working paper
Ablating a Stateful Agent
Methodology companion
Operational Self-Model Density in Stateful LLM Agents
Public registry
Verein registry record
ZVR 1741094409


