Live · in production · case-study system

Frank.ink — an agent you can hand a task to.

A stateful agentic AI companion. You give it a brief in plain language; it runs in the background — across mail, browser, files, your own machines via a private tunnel — and comes back with the work done, not a transcript of how it would do it.

frank.ink·Austrian non-profit·Self-hosted on commodity VPS

Visit frank.ink↗Read the case-study paper →

Frank.ink Hivemind dashboard — orbiting agent nodes around a central Frank, with connected user-owned machines visualized as a galaxy of orbs. — Hivemind ·Every project gets its own specialist Frank running on your connected machines — visualized as an orbiting orb around a central master.

Frank.ink is a multi-tenant agentic AI platform. Each user runs their own isolated master Frank, which can spawn purpose-built specialist Franks for individual projects — long-running tasks that need their own memory, tool access, and execution scope.

Unlike a chat product, Frank is built around persistence. A specialist Frank keeps state between sessions: relationship graph, voice drift, pacts with the user, project memory, ongoing TODOs. You can close the tab. It keeps working.

The platform runs on a single small VPS, augmented by user-provided hardware (laptops, desktops, servers) joined into a private Tailscale tunnel. That distributed layer is called Hivemind — every connected machine becomes accessible to the user's own Franks for SSH-grade administration and computation, but never to anyone else's.

Master chat
A single conversation that orchestrates everything — projects, mail, files, system tasks.
Project Franks
Specialist agents per project, with isolated state, tools, and long-running heartbeats.
Mail · Cloud · Calendar
Read-and-act-on-behalf via IMAP/SMTP, R2 object storage, and CalDAV. Never deletes, never mass-sends without sign-off.
Web hosting
Each user gets a slug — name.frank.ink — and can have Frank build them a site live, deployed behind a wildcard cert.
Hivemind
Connect your own machines via Tailscale. Frank SSHs in, installs, debugs, ships — under your audit log.
Marketplace + Browser
Structured searches across Amazon + eBay with real listings, plus a sandboxed Chromium for arbitrary web tasks.
Identity Forge
Each project Frank builds its own relationship graph with the user — pacts honored or broken, voice drift, mood.
Terminal
Browser-resident shell sandboxed per chat with bwrap + gVisor — full root within its own filesystem, invisible to peers.

When the user pastes a screenshot or drag-drops an image, Frank routes it through a four-stage CPU-only vision pipeline. No GPU is involved. The stages, in order:

Stage 1

OCR

PaddleOCR · text layer

→

Stage 2

Object det.

YOLOv8n · INT8 · 80 COCO classes

→

Stage 3

Open-vocab

CLIP-B/32 · INT8 · ~280 phrases

→

Stage 4

Embed

DINOv2 · scene similarity

→

Stage 5

Narrative

VLM-style 2–5 sentence describe

A smart router decides which stages to run. A pure UI screenshot short-circuits the semantic models — OCR + a glance at the layout is enough. A photo with possible people, objects, or scenes runs the full pipeline. The final stage writes a natural-language summary ("a robot head sliced open in profile, copper filament behind the eye…") — what Frank then reasons about.

p50 latency: ~900 ms on 1 vCPU. Concept recall: 95% on a 6-image internal benchmark. The whole pipeline costs roughly the same as one extra LLM round-trip — and the user's screenshot never leaves the institute's VPS.

Voice in Frank is push-to-talk by default, two stages each direction:

Stage 1

Mic

WebRTC capture · 16 kHz

→

Stage 2

STT

faster-whisper · small · INT8

→

Stage 3

LLM turn

Frank's normal text path

→

Stage 4

TTS

Piper · per-user voice profile

→

Stage 5

Stream

WS frames · sub-200ms first byte

STT runs locally on the VPS — recordings are not sent to a third party. TTS uses Piper voices selected per user; the voice drift system slowly shifts pace and intonation toward the user's own speech patterns over time (the same drift mechanism that shapes Frank's text voice). Latency end-to-end on a quiet user message is roughly 1.2–1.8 seconds.

§5 · The footprint

One small VPS. Four vCPU. Eight GB. No GPUs in the loop.

Frank's production deployment runs on a single Hetzner-class VPS augmented by user-provided Hivemind machines. Inference is rented from external token providers; everything else — orchestration, state, presence, vision, audio, sandboxes — runs on the box itself.

0 vCPU

Compute budget

0 GB

RAM ceiling at peak

0 GPU

Inference rented, not run locally

0 ms

p50 vision pipeline · 1 vCPU

Concept-recall · internal 6-image bench

Prompt-cache hit-rate · chitchat path

~$0.06

Cost per 20-turn chitchat session

0 sub

Subsystems ablated · case-study

Numbers are operational, not benchmark-curated. Vision and audio latencies measured on the production VPS under typical load; cost figures from real provider billing dashboards after the cache double-count bug was patched (May 12, 2026).

The institute's working paper used this exact system as its case study — five subsystems ablated, one at a time.

Read the paper→