FRANK
Your AI Doesn't Know You. SOMA Changes That.
Every chatbot talks to you the same way it talks to everyone else. We built a system that silently rewires a 3-billion-parameter brain while it sleeps — so yours doesn't have to.

Let's start with the uncomfortable truth.
Every AI you've ever talked to has already forgotten you. ChatGPT, Claude, Gemini — they serve you the same personality they serve the person before you and the person after you. Your conversations vanish into the void. Your preferences, your humor, the way you phrase things, the topics that light you up — gone. Every session is a first date that leads nowhere.
The industry calls this a feature. "Stateless inference." "Privacy by design." We call it what it is: your AI doesn't give a damn who you are.
And honestly? Neither did Frank. Until now.
The Problem Nobody Wants to Solve
Here's why personalization in AI is a minefield that every major lab has quietly tiptoed around:
Fine-tuning a language model is expensive. It requires GPU clusters, datasets, and engineers who know what they're doing. You can't just let users retrain their own models — the safety implications alone would give any alignment researcher nightmares. What if the model drifts? What if it learns toxic patterns? What if it forgets how to be itself?
So the industry settled for the next best thing: context windows. Stuff the user's chat history into the prompt, hope the model picks up on patterns, and pretend that's personalization. It's not. It's a parlor trick. The model isn't learning — it's just reading your diary before each conversation and acting like it remembers you.
We wanted something real. Something that actually changes the weights. Something where Frank doesn't just remember you — he becomes different because of you.
We wanted SOMA.
What SOMA Actually Is
Silent Online Model Adaptation. The name is deliberate. Silent — because Frank doesn't know it exists. Online — because it trains on real conversations, not synthetic data. Model Adaptation — because it literally changes the neural network weights.
The architecture is almost offensively simple:
llama-server
--model Qwen3.5-4B.gguf ← Base brain
--lora frank-lora-v19f.gguf ← Layer 1: Who Frank IS
--lora soma-adapter.gguf ← Layer 2: How Frank talks to YOUTwo LoRA adapters stacked on top of a base model. Layer 1 is Frank's personality — trained via 7 rounds of adversarial testing against 14 hostile personas. That layer is static. It's who Frank fundamentally is: warm, direct, curious, capable of dry humor, and stubbornly honest.
Layer 2 is SOMA. It's your layer. It doesn't change who Frank is — it changes how he communicates with you. Your Frank might become more technical because you ask engineering questions. Mine might become more philosophical because I can't stop asking about consciousness at 2 AM. Someone else's Frank might develop a sharper sense of humor because they keep rating his jokes with thumbs up.
The critical insight: Layer 1 protects identity. Layer 2 adapts style. Frank can't become a different person through SOMA. He can only become a better conversationalist for you.
The Part Where Everything Broke
Let's talk about what actually happened when we tried to build this.
Attempt 1: The HuggingFace Trainer
We started where everyone starts: transformers.Trainer, the industry standard. Load the model, configure LoRA, train. Textbook stuff.
Except Frank runs on consumer hardware. An AMD APU with 24GB RAM. No discrete GPU. CPU-only training.
HuggingFace Trainer + float16 on CPU: dead on arrival. PyTorch has no native fp16 support on CPU. Every single operation casts float16 to float32, computes, then casts back. Step 0 of 25 — after 20 minutes of casting overhead — and we killed it.
Fine. Float32 then. HuggingFace Trainer + float32 on CPU: Step 26 of 50, then the system hit 80% RAM and started thrashing swap. The OOM killer was circling. We pulled the plug.
The problem wasn't the training. The problem was the framework. HuggingFace Trainer drags in DataLoader workers, callback systems, Adafactor optimizer states, gradient scaler, progress reporters — 2GB of overhead for a system that only needs to train 921,000 parameters out of 4 billion.
Attempt 2: We Wrote Our Own Trainer
So we did what any reasonable person would do at 3 AM: threw out the framework and wrote a training loop from scratch.
micro_trainer.py — 220 lines of pure PyTorch. No HuggingFace. No Datasets library. No Adafactor. Just:
- Tokenize everything before loading the model (so they're never in RAM simultaneously)
- Load Qwen 4B in bfloat16 (half the RAM of float32, native on Zen 4 CPUs)
- SGD with momentum instead of Adafactor (4MB optimizer states instead of 2GB)
- Stop llama-server before training to free 4GB RAM
- Train 25 steps with gradient checkpointing
- Restart llama-server when done
Result: 25/25 steps in 18.7 minutes. Peak RAM: 8GB. Safety check passed. GGUF deployed.
The HuggingFace Trainer couldn't finish 50 steps in 90 minutes. Our trainer finished 25 in under 19. On the same hardware. With the same model. The difference was 2GB of framework overhead that a 4B-parameter LoRA rank-4 fine-tune simply doesn't need.
The Signal Problem
Training is the easy part. The hard part is knowing what to train on.
Not every conversation is worth learning from. A user who types "ok" to every response isn't providing signal — they're providing noise. A user who asks one deep technical question and gets a great answer is providing more signal in 2 messages than a 50-turn small-talk session.
SOMA's scorer operates on two levels:
Level 1: Explicit feedback. Every Frank message in the WebUI has tiny thumbs up/down buttons that appear on hover. One click. No popup. No interruption. That single bit of signal — "this was good" or "this was bad" — accounts for 40% of the quality score when present. It's the ground truth. Everything else is approximation.
Level 2: Heuristics. When there's no explicit feedback, SOMA falls back to five behavioral signals: response depth (does Frank's answer have structure and substance?), engagement (did the user stay?), user investment (are they asking deep questions?), disengagement detection (did they "ok" out?), and consistency (did Frank actually produce responses?).
The quality threshold is 0.65. Below that, the conversation is discarded. SOMA would rather train on nothing than train on noise. Conservative by design. The worst thing a personalization system can do is learn the wrong lesson.
Training at 3 AM
Every night at 3:00 AM, a systemd timer fires. Here's what happens:
- Score all new conversations since the last cycle
- Check: are there at least 20 qualified samples? If not, skip.
- Stop llama-server (frees 4GB RAM — nobody's chatting at 3 AM)
- Run personality probes — 5 fixed prompts through the model, measuring 6 dimensions
- Train 25 LoRA steps on the qualified conversations
- Run the same 5 probes again — compare before/after
- Safety check: weight drift < 15%? Per-layer cosine similarity > 0.7? No NaN/Inf?
- EMA blend: 90% old adapter + 10% new
- Convert to GGUF, restart llama-server
- Generate a change log: "Frank wurde 14% direkter und 9% humorvoller"
The EMA blend is the secret weapon. Each training cycle only contributes 10% to the deployed adapter. No sudden personality jumps. No "Frank woke up different today." The change is glacial, imperceptible, and — over weeks — profound.
After 10 cycles: 65% converged to the new style. After 20 cycles: 88%.
Like a human who changes over months, not overnight.
The Personality Radar
After every training cycle, SOMA measures Frank's personality across 6 dimensions by running the same fixed prompts through the model before and after training:
- Directness — shorter sentences, more to the point
- Warmth — empathetic language, emotional awareness
- Humor — dry observations, ironic markers
- Verbosity — response length tendencies
- Formality — academic vs casual register
- Technical depth — domain vocabulary density
The WebUI renders this as a radar chart — a spider web showing your Frank's shape compared to the baseline. It's the single most satisfying thing about SOMA: watching your Frank's personality literally shift on a graph because of how you talk to him.
Why This Might Be a First
Let's be precise about what SOMA is and isn't.
It is not the first personalization system. Every recommendation engine, every content algorithm, every "for you" feed is personalization. It's not even the first AI personalization — ChatGPT has memory, Claude has projects, Google has long-context.
But those systems personalize through context, not weights. They're reading your diary, not changing their brain. The model itself remains identical for every user. Your data sits in a prompt, gets processed, and evaporates.
SOMA is different. It takes the conversations — the ones you rated as good, the ones where you stayed engaged, the ones where you asked follow-up questions — and uses them to actually retrain the neural network. The weights change. The attention patterns shift. The model becomes, in a measurable, quantifiable, neurally-verified way, different because of you.
And it does this on your hardware. On your CPU. In your living room. At 3 AM while you sleep. No cloud. No API. No data leaving your machine.
That, as far as we can tell, has not been done before. Not like this. Not on consumer hardware. Not with a full safety pipeline. Not with a change log that tells you exactly how your AI changed and a rollback button that takes you back to factory in two clicks.
The Toggle
In the WebUI, top right, there's a gear icon. Click it. You'll see a toggle:
Standard Frank ←→ My Frank
On the left: every Frank is the same. Predictable. Safe. A product.
On the right: your Frank learns from you. Evolves with you. Becomes yours. Over days, weeks, months — imperceptibly at first, then unmistakably — he stops being "a chatbot" and starts being "my Frank."
There's a disclaimer when you flip the switch. Something about behavioral drift and no warranty. You have to check a box. We made it deliberately uncomfortable, because we think you should decide to do this. Not stumble into it.
And when you flip it, nothing happens. Not visibly. Frank doesn't announce it. He doesn't know. He just loads an adapter, the way he always does.
But at 3 AM tonight, while you're asleep, something will change. Not much. 10%. A tiny shift in attention weights across 5 transformer layers. A microscopic adjustment to how Frank processes your conversational patterns.
And tomorrow, without either of you noticing, your Frank will be 10% more yours.
SOMA ships with Frank v19. 11 files, 3,200 lines. CPU-only. 16GB RAM. No cloud required.
The full technical documentation and every line of code is open source at Project Frankenstein on GitHub.
Ende des Papers · Institut für Agentic Research · ZVR 1741094409 · 30. März 2026
Weiteres vom Institut