Skip to content

    We use cookies for analytics and to improve your experience. Learn more in our Cookie Policy.

    Book a Call

    Personal AI Operations Agent

    Self-hosted multi-agent system. One Telegram chat for the entire business and personal productivity across 21 functional modules.

    A tool-use orchestrator with ~95 LLM tools, hybrid RAG over corporate documents, an auto-extracting knowledge graph, financial automation at 97% accuracy, 8 background daemons. Single-VM deploy at €5/mo. Full source code, no vendor lock-in.

    Solo entrepreneur|Self-hosted|6 weeks (Phase 0 → 26)|Telegram + Web Dashboard
    21
    Functional modules
    ~95
    LLM tools
    49
    DB migrations
    14
    External integrations
    8
    Background daemons
    289
    Documents indexed
    97%
    Finance auto-categorization
    €5/mo
    Hetzner VM

    Operating a one-person business from memory and 17 browser tabs

    Each pair below is a real scenario that used to cost hours every week.

    Without

    Notes scattered across Apple Notes, Google Drive drafts, email saved, Telegram saved, paper. ~30% of ideas get lost.

    With

    Single inbox capture: voice / photo / text into Telegram → auto-classify → routed to the right store (task / fact / contact / event / commitment / journal).

    Without

    Finances done by hand: export CSV from each bank, categorize 200+ rows / mo manually, project month-end on a piece of paper.

    With

    Auto-sync with 5 bank accounts. 97% LLM auto-categorize, recurring-merchant rules, EOM forecast with category breakdown, bulk-undo for mistakes.

    Without

    Overdue tasks pile up in layers. Deadlines lost in the stream. Promises to people forgotten — trust erodes.

    With

    Smart task lifecycle: due_at coercion (odd hours → end-of-day), 2-tier dedup (exact + fuzzy), auto-snooze stale > 7 days with one nudge, commitments auto-expire 24 h after due.

    Without

    “Where was that pricing email from X / decision in some Drive doc / note about Y” — 20–40 minute hunt every time.

    With

    Hybrid RAG search (BM25 + vector cosine) over 289 unified documents (email + Drive + voice). Answer in 1–2 sec with source citation [src: email «Pricing inquiry from X»].

    Without

    Voice messages garbled in noisy environments (car, street), Ukrainian / mixed terms misheard.

    With

    Production voice pipeline: gpt-4o-transcribe + ffmpeg pre-process (highpass 80 Hz, FFT-denoise, EBU R128 loudnorm) + auto-detect lang (RU/UK/EN) + entity-name prompt context.

    Without

    LLM occasionally invents actions (“I deleted task X”) without an actual tool call.

    With

    Two-layer anti-hallucination: action-verb-name match in system prompt + supervisor judge (gpt-4o) verifies every reply against tool_results. Hallucination → reply replaced with system fallback. Logged in /quality dashboard.

    How the agent works: the request pipeline

    Every tool-use call passes through 8 layers — from security guard to supervisor judge. Hallucinations are caught before the reply reaches the user.

      01

      Quick check

      in-process regex denylist

      02

      Validate

      agent_security: input sanity, prompt injection guard

      03

      Routing

      agent_brain LLM router → tier (cheap / mid / premium)

      04

      Cost gate

      downgrade @80 % budget, hard-block @95 %

      05

      Recall

      knowledge graph + session summary injection

      06

      Tool loop

      Claude Sonnet 4.6, max 6 iterations, ~95 tools

      07

      Verify

      supervisor judge (gpt-4o) anti-hallucination

      08

      Reply

      auto-split > 4096 chars, strip markdown for TG

    Phase 20 architectural refactor: 11 previously-separate sub-agents (capture, scheduler, inbox, drive, tasks, surface, finance, wellness, journal, subscriptions, viz) moved into backend/skills/*.py — they now run in-process in the orchestrator. Container count: 21 → 10. Zero HTTP latency for hot tool calls.

    One Telegram chat. 21 functional modules. Zero context switching.

    Compact catalog: modules grouped by purpose. Full tool list and ~95 endpoint details live in the project's docs.

    Personal productivity

    Capture

    Universal inbox: voice / photo / text → auto-classify

    Tasks

    Smart due-coercion, 2-tier dedup, auto-snooze stale, escalation

    Routines

    Recurring cadence (cron), streak tracking, building/steady/wobbly trend analytics

    Life

    Goals + dreams + achievements + profile across 12 life areas

    Reminders

    TG push from notifications table with auto-dedupe and priority filtering

    Wellness

    Habit logging + daily state (mood / energy / focus) + summary analytics

    Business operations

    Finance

    Monobank auto-sync (5 accounts), 97% LLM auto-categorize, EOM forecast, bulk-undo

    Subscriptions

    Auto-detect from email, weekly renewal alerts at +7 days, recurring tracking

    Inbox

    Gmail every 15 min, per-email LLM classification, auto-task for high+requires_action

    Drive

    GDrive metadata + textual ingest (Google Docs / text / PDF) into RAG store

    Calendar

    Two-way Google Calendar sync, conflict detection, plan-for-window queries

    Commitments

    Promise tracking, owe / awaiting direction, auto-expire 24 h after due

    Knowledge & memory

    Documents (RAG)

    Universal documents table: 289 docs / 441 vector chunks, hybrid BM25 + vector search

    Memory graph

    entities + facts (subject-predicate-object), 4-step entity resolve, /memory react-flow UI

    Conversations

    Long-session summary daemon: > 50 msg → gpt-4o-mini summary injected next message

    Library / PKB

    Legacy knowledge base with daily / weekly / monthly automatic summaries

    Privacy, system & ops

    Journal

    AES-GCM 256 column-level encryption, key never logged, never plaintext in DB

    Surface

    Morning Brief / Evening Recap / Weekly Review (cron)

    Approvals queue

    Human-in-the-loop for destructive / high-cost / bulk > 10 ops, 24 h auto-expire

    Quality dashboard

    verdict per LLM call (ok / suspicious / hallucination), 24 h hallucination rate

    Cost tracking

    Single source of truth for LLM / embedding / voice ops, EOM projection, hard caps

    LLM routing: cheap tasks run on cheap models

    agent_brain classifies the request and picks a tier. Premium hits Opus 4.7, fallback chain swaps to Haiku/Gemini when the primary is down.

    TierModelUse case
    PremiumClaude Opus 4.7Complex planning queries
    DefaultClaude Sonnet 4.6Orchestrator main loop (1 h prompt cache, −47 % input)
    MidGPT-4oSupervisor anti-hallucination judge
    CheapGPT-4o-miniCategorization, summarization, entity extraction
    Embeddingtext-embedding-3-small1536-dim vectors for hybrid RAG
    Voice STTgpt-4o-transcribeAuto-detect language with entity-name prompt
    FallbackClaude Haiku 4.5 + Gemini 2.0 FlashProvider outage chain

    What's under the hood

    AI core

    Claude Sonnet 4.6 / Opus 4.7 · GPT-4o + mini · Haiku 4.5 · Gemini 2.0 Flash · text-embedding-3-small

    Voice & input

    gpt-4o-transcribe · ffmpeg pipeline (highpass / FFT-denoise / EBU R128) · Telegram Bot API + MTProto Pyrogram · Web Speech

    Storage

    PostgreSQL 16 · pgvector (1536-dim ivfflat) · pg_trgm · 49 migrations · 18 roles RLS · AES-GCM 256

    Frontend

    Vite + React 19 + TypeScript · TanStack Query · react-router v7 · react-flow · Recharts · framer-motion

    Infrastructure

    Hetzner CX22 VM (€4.99/mo) · Ubuntu 24.04 LTS · Docker Compose (10 containers) · Tailscale Funnel · least-privilege service user

    Backend

    Python 3.12 · FastAPI 0.115 · uvicorn · asyncpg · pydantic v2 · OpenAI / Anthropic / Google GenAI SDKs

    What an agent like this costs

    Build is paid once, runtime lives at $65–$130/mo. For comparison: ChatGPT Team is $25/user with no custom tools, no bank integrations, no automation logic.

    1 — Agent Development

    $6,000

    one-time setup

    Tool-use orchestrator + 14 in-process skills + Docker Compose deploy + 30+ dashboard pages + 49 DB migrations with RLS + 18 per-agent roles + 2-layer anti-hallucination + RAG ingest pipeline + voice quality pipeline + full docs (ARCHITECTURE / INFRASTRUCTURE / SECURITY / PHASES). 1 month support after deploy.

    2 — LLM Operating Cost

    ~$50 – $100

    per month

    Claude Sonnet 4.6 main loop ~$2–5/day (1 h prompt caching = −47 % on input), embedding pipeline ~$0.95/day, GPT-4o supervisor judge ~$10–20/mo, gpt-4o-mini categorization ~$5/mo. Hard caps: $5/day, $100/mo, $0.05/task.

    3 — Optional add-ons

    $5 – $30

    per month each

    ElevenLabs voice TTS for TG voice replies $5–$22/mo · Vapi outbound voice calls $0.07/min · Twilio SMS notifications $0.01/msg · Hetzner Storage Box off-site backup €4/mo.

    4 — Infrastructure

    €5

    per month

    Hetzner Cloud CX22 VM (2 vCPU / 4 GB RAM / 40 GB SSD / 20 TB egress). Tailscale Free Personal $0. GitHub private repo $0. Total runtime ~$65 – $130 / mo (= $780 – $1 560 / year) with infra + LLM + add-ons.

    What actually changed

    • 30-minute hunt for «that email/doc about X» → 2-second RAG search with citation
    • 0 missed subscription renewals — all in weekly brief 7 days ahead
    • 97 % auto-categorized finance (1073 / 1102 tx) — was 0 %
    • 5–15 % more accurate voice transcription in noisy environments (ffmpeg pre-process)
    • Container count: 21 → 10 after Phase 20 merge — −65 % RAM vs initial architecture
    • 289 documents indexed, 441 vector chunks — searchable knowledge corpus
    • 8 background daemons run unattended — no manual triggers required
    • Cost transparency — daily / monthly / per-agent / per-operation breakdown, hard caps. No surprises.

    Want this agent? 6 weeks from zero to production.

    Self-hosted multi-agent system — one Telegram interface for the entire business. Single source of truth for tasks / finance / knowledge / contacts. LLM-powered automation with anti-hallucination guarantees. Full source code + docs. $5 VM, no vendor lock-in.

    The architecture grows for years: realtime voice WebSocket, Whisper.cpp self-hosted, GitHub / Notion integrations, multi-tenant variant for team deployments.