Personal AI Operations Agent
Self-hosted multi-agent system. One Telegram chat for the entire business and personal productivity across 21 functional modules.
A tool-use orchestrator with ~95 LLM tools, hybrid RAG over corporate documents, an auto-extracting knowledge graph, financial automation at 97% accuracy, 8 background daemons. Single-VM deploy at €5/mo. Full source code, no vendor lock-in.
- 21
- Functional modules
- ~95
- LLM tools
- 49
- DB migrations
- 14
- External integrations
- 8
- Background daemons
- 289
- Documents indexed
- 97%
- Finance auto-categorization
- €5/mo
- Hetzner VM
Operating a one-person business from memory and 17 browser tabs
Each pair below is a real scenario that used to cost hours every week.
Without
Notes scattered across Apple Notes, Google Drive drafts, email saved, Telegram saved, paper. ~30% of ideas get lost.
With
Single inbox capture: voice / photo / text into Telegram → auto-classify → routed to the right store (task / fact / contact / event / commitment / journal).
Without
Finances done by hand: export CSV from each bank, categorize 200+ rows / mo manually, project month-end on a piece of paper.
With
Auto-sync with 5 bank accounts. 97% LLM auto-categorize, recurring-merchant rules, EOM forecast with category breakdown, bulk-undo for mistakes.
Without
Overdue tasks pile up in layers. Deadlines lost in the stream. Promises to people forgotten — trust erodes.
With
Smart task lifecycle: due_at coercion (odd hours → end-of-day), 2-tier dedup (exact + fuzzy), auto-snooze stale > 7 days with one nudge, commitments auto-expire 24 h after due.
Without
“Where was that pricing email from X / decision in some Drive doc / note about Y” — 20–40 minute hunt every time.
With
Hybrid RAG search (BM25 + vector cosine) over 289 unified documents (email + Drive + voice). Answer in 1–2 sec with source citation [src: email «Pricing inquiry from X»].
Without
Voice messages garbled in noisy environments (car, street), Ukrainian / mixed terms misheard.
With
Production voice pipeline: gpt-4o-transcribe + ffmpeg pre-process (highpass 80 Hz, FFT-denoise, EBU R128 loudnorm) + auto-detect lang (RU/UK/EN) + entity-name prompt context.
Without
LLM occasionally invents actions (“I deleted task X”) without an actual tool call.
With
Two-layer anti-hallucination: action-verb-name match in system prompt + supervisor judge (gpt-4o) verifies every reply against tool_results. Hallucination → reply replaced with system fallback. Logged in /quality dashboard.
How the agent works: the request pipeline
Every tool-use call passes through 8 layers — from security guard to supervisor judge. Hallucinations are caught before the reply reaches the user.
01
Quick check
in-process regex denylist
02
Validate
agent_security: input sanity, prompt injection guard
03
Routing
agent_brain LLM router → tier (cheap / mid / premium)
04
Cost gate
downgrade @80 % budget, hard-block @95 %
05
Recall
knowledge graph + session summary injection
06
Tool loop
Claude Sonnet 4.6, max 6 iterations, ~95 tools
07
Verify
supervisor judge (gpt-4o) anti-hallucination
08
Reply
auto-split > 4096 chars, strip markdown for TG
Phase 20 architectural refactor: 11 previously-separate sub-agents (capture, scheduler, inbox, drive, tasks, surface, finance, wellness, journal, subscriptions, viz) moved into backend/skills/*.py — they now run in-process in the orchestrator. Container count: 21 → 10. Zero HTTP latency for hot tool calls.
One Telegram chat. 21 functional modules. Zero context switching.
Compact catalog: modules grouped by purpose. Full tool list and ~95 endpoint details live in the project's docs.
Personal productivity
Capture
Universal inbox: voice / photo / text → auto-classify
Tasks
Smart due-coercion, 2-tier dedup, auto-snooze stale, escalation
Routines
Recurring cadence (cron), streak tracking, building/steady/wobbly trend analytics
Life
Goals + dreams + achievements + profile across 12 life areas
Reminders
TG push from notifications table with auto-dedupe and priority filtering
Wellness
Habit logging + daily state (mood / energy / focus) + summary analytics
Business operations
Finance
Monobank auto-sync (5 accounts), 97% LLM auto-categorize, EOM forecast, bulk-undo
Subscriptions
Auto-detect from email, weekly renewal alerts at +7 days, recurring tracking
Inbox
Gmail every 15 min, per-email LLM classification, auto-task for high+requires_action
Drive
GDrive metadata + textual ingest (Google Docs / text / PDF) into RAG store
Calendar
Two-way Google Calendar sync, conflict detection, plan-for-window queries
Commitments
Promise tracking, owe / awaiting direction, auto-expire 24 h after due
Knowledge & memory
Documents (RAG)
Universal documents table: 289 docs / 441 vector chunks, hybrid BM25 + vector search
Memory graph
entities + facts (subject-predicate-object), 4-step entity resolve, /memory react-flow UI
Conversations
Long-session summary daemon: > 50 msg → gpt-4o-mini summary injected next message
Library / PKB
Legacy knowledge base with daily / weekly / monthly automatic summaries
Privacy, system & ops
Journal
AES-GCM 256 column-level encryption, key never logged, never plaintext in DB
Surface
Morning Brief / Evening Recap / Weekly Review (cron)
Approvals queue
Human-in-the-loop for destructive / high-cost / bulk > 10 ops, 24 h auto-expire
Quality dashboard
verdict per LLM call (ok / suspicious / hallucination), 24 h hallucination rate
Cost tracking
Single source of truth for LLM / embedding / voice ops, EOM projection, hard caps
LLM routing: cheap tasks run on cheap models
agent_brain classifies the request and picks a tier. Premium hits Opus 4.7, fallback chain swaps to Haiku/Gemini when the primary is down.
| Tier | Model | Use case |
|---|---|---|
| Premium | Claude Opus 4.7 | Complex planning queries |
| Default | Claude Sonnet 4.6 | Orchestrator main loop (1 h prompt cache, −47 % input) |
| Mid | GPT-4o | Supervisor anti-hallucination judge |
| Cheap | GPT-4o-mini | Categorization, summarization, entity extraction |
| Embedding | text-embedding-3-small | 1536-dim vectors for hybrid RAG |
| Voice STT | gpt-4o-transcribe | Auto-detect language with entity-name prompt |
| Fallback | Claude Haiku 4.5 + Gemini 2.0 Flash | Provider outage chain |
What's under the hood
AI core
Claude Sonnet 4.6 / Opus 4.7 · GPT-4o + mini · Haiku 4.5 · Gemini 2.0 Flash · text-embedding-3-small
Voice & input
gpt-4o-transcribe · ffmpeg pipeline (highpass / FFT-denoise / EBU R128) · Telegram Bot API + MTProto Pyrogram · Web Speech
Storage
PostgreSQL 16 · pgvector (1536-dim ivfflat) · pg_trgm · 49 migrations · 18 roles RLS · AES-GCM 256
Frontend
Vite + React 19 + TypeScript · TanStack Query · react-router v7 · react-flow · Recharts · framer-motion
Infrastructure
Hetzner CX22 VM (€4.99/mo) · Ubuntu 24.04 LTS · Docker Compose (10 containers) · Tailscale Funnel · least-privilege service user
Backend
Python 3.12 · FastAPI 0.115 · uvicorn · asyncpg · pydantic v2 · OpenAI / Anthropic / Google GenAI SDKs
What an agent like this costs
Build is paid once, runtime lives at $65–$130/mo. For comparison: ChatGPT Team is $25/user with no custom tools, no bank integrations, no automation logic.
1 — Agent Development
$6,000
one-time setup
Tool-use orchestrator + 14 in-process skills + Docker Compose deploy + 30+ dashboard pages + 49 DB migrations with RLS + 18 per-agent roles + 2-layer anti-hallucination + RAG ingest pipeline + voice quality pipeline + full docs (ARCHITECTURE / INFRASTRUCTURE / SECURITY / PHASES). 1 month support after deploy.
2 — LLM Operating Cost
~$50 – $100
per month
Claude Sonnet 4.6 main loop ~$2–5/day (1 h prompt caching = −47 % on input), embedding pipeline ~$0.95/day, GPT-4o supervisor judge ~$10–20/mo, gpt-4o-mini categorization ~$5/mo. Hard caps: $5/day, $100/mo, $0.05/task.
3 — Optional add-ons
$5 – $30
per month each
ElevenLabs voice TTS for TG voice replies $5–$22/mo · Vapi outbound voice calls $0.07/min · Twilio SMS notifications $0.01/msg · Hetzner Storage Box off-site backup €4/mo.
4 — Infrastructure
€5
per month
Hetzner Cloud CX22 VM (2 vCPU / 4 GB RAM / 40 GB SSD / 20 TB egress). Tailscale Free Personal $0. GitHub private repo $0. Total runtime ~$65 – $130 / mo (= $780 – $1 560 / year) with infra + LLM + add-ons.
What actually changed
- 30-minute hunt for «that email/doc about X» → 2-second RAG search with citation
- 0 missed subscription renewals — all in weekly brief 7 days ahead
- 97 % auto-categorized finance (1073 / 1102 tx) — was 0 %
- 5–15 % more accurate voice transcription in noisy environments (ffmpeg pre-process)
- Container count: 21 → 10 after Phase 20 merge — −65 % RAM vs initial architecture
- 289 documents indexed, 441 vector chunks — searchable knowledge corpus
- 8 background daemons run unattended — no manual triggers required
- Cost transparency — daily / monthly / per-agent / per-operation breakdown, hard caps. No surprises.
Want this agent? 6 weeks from zero to production.
Self-hosted multi-agent system — one Telegram interface for the entire business. Single source of truth for tasks / finance / knowledge / contacts. LLM-powered automation with anti-hallucination guarantees. Full source code + docs. $5 VM, no vendor lock-in.
The architecture grows for years: realtime voice WebSocket, Whisper.cpp self-hosted, GitHub / Notion integrations, multi-tenant variant for team deployments.