Institute for Agentic Research · live

Research that gives AI agents the power to act .

Most institutes publish their successes. We publish the ablations that broke our predictions.

Independent Austrian research institute. Pre-registered ablations. n=1 case studies. No hype. No black boxes. Founded by Dr. Andreas Unterweger and Gabriel Gschaider.

Austrian non-profitZVR 17410944095 papers live

Read the working paper→See the case-study system→

Featured · PRIMER · 14 May 2026

We took our AI apart on purpose — and one of our own predictions broke.

We built a stateful AI agent, then carefully removed pieces of its architecture one at a time to see what each one actually does. Five subsystems, five honest results, one prediction we got wrong and reported anyway.

Gabriel GschaiderRead →

Frank.ink Hivemind visualizer — agent orbs around a central Frank, with connected user machines.

The system we study

Frank.ink — a stateful agent platform we built before we wrote about it.

Frank is a multi-tenant agent platform running in production on a single small VPS with no local GPU. A CPU-only vision pipeline (OCR + YOLO + CLIP + DINO), STT/TTS running locally on the host, persistent per-tenant state, and Hivemind — user-owned machines joined via Tailscale. The whole stack is the case-study subject of the working paper.

4 vCPU · 8 GB

Single small VPS

0 GPU

Inference rented, not run locally

~900 ms

p50 vision pipeline · 1 vCPU

95% recall

Internal 6-image benchmark

Explore Frank.ink →

SYSTEM · Digital Retina

VLM-class image coverage on a CPU — without renting one.

Sixteen cooperating perceptual stages approximate the output of large vision-language models on a CPU — a local pipeline, no GPU, no external VLM. 92–95 % visual concept coverage (CLIP-lenient) and 70–87 % strict text coverage against Gemini 2.0 Flash and Llama-4 Scout. 1.5 – 2 s per image on a 4-vCPU box. Live at retina.frank.ink. Patent pending.

93.1 %

visual concept coverage (CLIP-lenient)

n = 44

images · Gemini 2.0 Flash + Llama-4 Scout

1.7 s

warm p50, 16 stages

4 vCPU

AMD EPYC · no GPU · no external VLM

Explore Digital Retina ↗

Digital Retina conceptual diagram — light enters a stylised eye, passes through translucent layers of neural circuitry, emerges as a pixel grid.

Verifiable identity

Registered Austrian non-profit (gemeinnütziger Verein). The institute's legal record is public.

ZVR1741094409

Look up on bmi.gv.at↗

Registered seat

Feldkirchen
bei Graz

Austria · EU

Contact

office@agentic-research.org

Official inbox

Founders

Andreas Unterweger · Gabriel GschaiderSee profiles →

Our method

Five subsystems, removed one at a time. Predictions hashed before each test.

We wrote down what we expected each subsystem to be doing — and cryptographically sealed those predictions before any data was collected. Four hit their pre-registered targets. One didn't. Below: what we removed, what we expected, and what actually happened.

Figure 1 · Predicted vs observed · 5 ablations

Score impact · 0 — 12 pts

Identity Forge
Within range
Hit
Memory accuracy 91% → 73%
Predictions Ledger
Within range
Hit
Brier-score 0.142 → 0.27
Thalamus
Beyond range
Honest failure
AST-1 collateral — predicted ≤6, observed 8
Presence Scheduler
Within range
Hit
Long-horizon completion 74% → 25%
BODY block
Within range
Hit
Null control · confirmed null

Predicted range (pre-registered, hashed)Observed valueOut of range — reported as-is

№ 01Hit target
Identity Forge
Cross-session relationship + pact memory. The system that lets each Frank remember who you are.
Predicted
Memory accuracy degrades; user-history hallucinations rise.
Observed
Accuracy 91% → 73%. Hallucinations 4.7% → 12.4%. Hit pre-registered range.
№ 02Hit target
Predictions Ledger
The component that calibrates Frank's own confidence on predictions.
Predicted
Brier-score calibration degrades; other capabilities unchanged.
Observed
Brier-score 0.142 → 0.27. Selective and within range.
№ 03Beyond range
Thalamus
Attention-gating subsystem that mediates mode-sensitivity.
Predicted
Mode-sensitivity flattens; attention-schema unchanged (∆ −4 to −6).
Observed
Mode-sensitivity flat as predicted, BUT attention-schema dropped −8 — uncovered an undocumented AST-1 dependency on Thalamus channel-gain.
№ 04Hit target
Presence Scheduler
The background-task scheduler that keeps long-running work alive between sessions.
Predicted
Long-horizon task completion collapses.
Observed
Completion rate 74% → 25%. Hit pre-registered range.
№ 05Hit target
BODY block
Optional proprioceptive context block in the system prompt.
Predicted
Null operational drop — included as negative control.
Observed
Null. Confirmed as null control.

Read the full methodology in the paper →

“We do not publish papers about systems we cannot ablate, audit, or shut down.”

— On deployment discipline

In numbers · 2026

One production system, five subsystems , ablated one at a time — every prediction registered before the data came in.

73/90

Score · architect rater · upper bound

5/5

Ablations hit their pre-registered targets

28%

H2 observed · pre-registered ≥60% — failed, published as-is

n=1

Honest sample size · no generalization claim

Publications + transparency

Everything on the record, downloadable.

Working paper, methodology companion, raw markdown sources, registry-of-record. Verify it yourself.

Working paper
Ablating a Stateful Agent
- Read in browser →
- Markdown source ↓
Methodology companion
Operational Self-Model Density in Stateful LLM Agents
- Read in browser →
- Markdown source ↓
Public registry
Verein registry record
ZVR 1741094409
- Look up on bmi.gv.at ↗
- llms.txt ↓

Research that gives AI agents the power to act .

We took our AI apart on purpose — and one of our own predictions broke.

Frank.ink — a stateful agent platform we built before we wrote about it.

VLM-class image coverage on a CPU — without renting one.

Registered Austrian non-profit (gemeinnütziger Verein). The institute's legal record is public.

Five subsystems, removed one at a time. Predictions hashed before each test.

Identity Forge

Predictions Ledger

Thalamus

Presence Scheduler

BODY block

Everything on the record, downloadable.

Ablating a Stateful Agent

Operational Self-Model Density in Stateful LLM Agents

Verein registry record