Institute for Agentic Research · live

Research that gives AI agents the power to act .

Most institutes publish their successes. We publish the ablations that broke our predictions.

Independent Austrian research institute. Pre-registered ablations. n=1 case studies. No hype. No black boxes. Founded by Dr. Andreas Unterweger and Gabriel Gschaider.

Austrian non-profitZVR 17410944095 papers live
We took our AI apart on purpose — and one of our own predictions broke.

Featured · PRIMER · 14 May 2026

We took our AI apart on purpose — and one of our own predictions broke.

We built a stateful AI agent, then carefully removed pieces of its architecture one at a time to see what each one actually does. Five subsystems, five honest results, one prediction we got wrong and reported anyway.

Gabriel GschaiderRead →
Frank.ink Hivemind visualizer — agent orbs around a central Frank, with connected user machines.

The system we study

Frank.ink — a stateful agent platform we built before we wrote about it.

Frank is a multi-tenant agent platform running in production on a single small VPS with no local GPU. A CPU-only vision pipeline (OCR + YOLO + CLIP + DINO), STT/TTS running locally on the host, persistent per-tenant state, and Hivemind — user-owned machines joined via Tailscale. The whole stack is the case-study subject of the working paper.

4 vCPU · 8 GB

Single small VPS

0 GPU

Inference rented, not run locally

~900 ms

p50 vision pipeline · 1 vCPU

95% recall

Internal 6-image benchmark

Explore Frank.ink →

SYSTEM · Digital Retina

VLM-class image coverage on a CPU — without renting one.

Sixteen cooperating perceptual stages approximate the output of large vision-language models on a CPU — a local pipeline, no GPU, no external VLM. 92–95 % visual concept coverage (CLIP-lenient) and 70–87 % strict text coverage against Gemini 2.0 Flash and Llama-4 Scout. 1.5 – 2 s per image on a 4-vCPU box. Live at retina.frank.ink. Patent pending.

93.1 %

visual concept coverage (CLIP-lenient)

n = 44

images · Gemini 2.0 Flash + Llama-4 Scout

1.7 s

warm p50, 16 stages

4 vCPU

AMD EPYC · no GPU · no external VLM

Explore Digital Retina ↗
Digital Retina conceptual diagram — light enters a stylised eye, passes through translucent layers of neural circuitry, emerges as a pixel grid.

Verifiable identity

Registered Austrian non-profit (gemeinnütziger Verein). The institute's legal record is public.

ZVR1741094409

Registered seat

Feldkirchen
bei Graz

Austria · EU

Contact

office@agentic-research.org

Official inbox

Our method

Five subsystems, removed one at a time. Predictions hashed before each test.

We wrote down what we expected each subsystem to be doing — and cryptographically sealed those predictions before any data was collected. Four hit their pre-registered targets. One didn't. Below: what we removed, what we expected, and what actually happened.

Figure 1 · Predicted vs observed · 5 ablations

Score impact · 0 — 12 pts

  • Identity Forge

    Within range

    Hit

    Memory accuracy 91% → 73%

  • Predictions Ledger

    Within range

    Hit

    Brier-score 0.142 → 0.27

  • Thalamus

    Beyond range

    Honest failure

    AST-1 collateral — predicted ≤6, observed 8

  • Presence Scheduler

    Within range

    Hit

    Long-horizon completion 74% → 25%

  • BODY block

    Within range

    Hit

    Null control · confirmed null

Predicted range (pre-registered, hashed)Observed valueOut of range — reported as-is
  • № 01Hit target

    Identity Forge

    Cross-session relationship + pact memory. The system that lets each Frank remember who you are.

    Predicted

    Memory accuracy degrades; user-history hallucinations rise.

    Observed

    Accuracy 91% → 73%. Hallucinations 4.7% → 12.4%. Hit pre-registered range.

  • № 02Hit target

    Predictions Ledger

    The component that calibrates Frank's own confidence on predictions.

    Predicted

    Brier-score calibration degrades; other capabilities unchanged.

    Observed

    Brier-score 0.142 → 0.27. Selective and within range.

  • № 03Beyond range

    Thalamus

    Attention-gating subsystem that mediates mode-sensitivity.

    Predicted

    Mode-sensitivity flattens; attention-schema unchanged (∆ −4 to −6).

    Observed

    Mode-sensitivity flat as predicted, BUT attention-schema dropped −8 — uncovered an undocumented AST-1 dependency on Thalamus channel-gain.

  • № 04Hit target

    Presence Scheduler

    The background-task scheduler that keeps long-running work alive between sessions.

    Predicted

    Long-horizon task completion collapses.

    Observed

    Completion rate 74% → 25%. Hit pre-registered range.

  • № 05Hit target

    BODY block

    Optional proprioceptive context block in the system prompt.

    Predicted

    Null operational drop — included as negative control.

    Observed

    Null. Confirmed as null control.

Read the full methodology in the paper →
We do not publish papers about systems we cannot ablate, audit, or shut down.

— On deployment discipline

In numbers · 2026

One production system, five subsystems , ablated one at a time — every prediction registered before the data came in.

73/90

Score · architect rater · upper bound

5/5

Ablations hit their pre-registered targets

28%

H2 observed · pre-registered ≥60% — failed, published as-is

n=1

Honest sample size · no generalization claim

Publications + transparency

Everything on the record, downloadable.

Working paper, methodology companion, raw markdown sources, registry-of-record. Verify it yourself.