AI Stack¶

AKKO ships a complete AI-native layer that runs 100 % in-cluster. No OpenAI, no Anthropic, no Bedrock, no external embedding service. Every model lives next to your data.

Topology¶

flowchart TB
    subgraph Interfaces[User and Agent Interfaces]
        direction LR
        COCK[Cockpit chat<br/>/api/cockpit/aden]
        JH[JupyterHub<br/>jupyter-ai]
        CLAUDE[Claude Desktop<br/>Cursor, VS Code]
        SQL[SQL client<br/>Trino CLI, DBeaver]
    end

    subgraph Sovereign[Sovereign AI Layer]
        direction TB
        ADEN[ADEN<br/>FastAPI + Streamlit]
        MCPT[MCP Trino<br/>8 tools]
        MCPO[MCP OpenMetadata<br/>catalog + lineage]
        TPL[Trino AI Plugin<br/>25 ai_* UDFs<br/>20 data + 5 admin]
        AIS[AI Service<br/>FastAPI]
        RAG[RAG pipeline<br/>pgvector + Ollama]
    end

    subgraph Gateway[Inference Gateway]
        LLM[LiteLLM<br/>OpenAI-compatible]
        OLL[Ollama<br/>qwen2.5-coder:7b<br/>qwen2.5:3b<br/>nomic-embed-text]
        VLLM[vLLM<br/>optional GPU backend]
    end

    subgraph Data[Data Plane]
        TRINO[Trino 480]
        OM[OpenMetadata]
        PG[(PostgreSQL<br/>pgvector)]
        MLF[MLflow<br/>model registry]
    end

    COCK --> ADEN
    JH --> AIS
    JH --> MLF
    CLAUDE --> MCPT
    CLAUDE --> MCPO
    SQL --> TRINO
    TRINO --> TPL
    TPL --> AIS
    ADEN --> OM
    ADEN --> TRINO
    ADEN --> LLM
    MCPT --> TRINO
    MCPT --> AIS
    MCPO --> OM
    AIS --> LLM
    RAG --> PG
    RAG --> LLM
    LLM --> OLL
    LLM -.optional.-> VLLM

Component Map¶

Component	Role	Doc
ADEN	Natural-language question -> SQL -> Trino -> Streamlit dashboard	AI / ADEN
Trino AI Plugin	17 `ai_*` scalar UDFs inside the Trino JVM	AI / Trino functions
AI Service	FastAPI backend called by the Trino plugin (sentiment, classify, summarize, embed, translate, entities)	Services / AI Service
RAG pipeline	pgvector + nomic-embed-text + Ollama, demo notebook	AI / RAG pipeline
MCP Servers	Trino (8 tools) + OpenMetadata (catalog, lineage) for AI agents	AI / MCP servers
LiteLLM	OpenAI-compatible gateway, multi-tenant keys	Services / LiteLLM
Ollama	Local LLM inference (CPU / GPU)	Services / Ollama
vLLM	Optional GPU backend for higher throughput	Services / vLLM
MLflow	Experiment tracking, model registry, artifact store on MinIO	Services / MLflow
jupyter-ai	In-notebook IA assistant wired to LiteLLM	Services / JupyterHub

ADEN Request Life-Cycle¶

sequenceDiagram
    participant U as User (Cockpit)
    participant A as ADEN
    participant OM as OpenMetadata
    participant OPA as OPA
    participant L as LiteLLM / Ollama
    participant T as Trino
    participant S as Streamlit

    U->>A: POST /ask {question, session_id}
    A->>OM: search candidate tables (top N)
    A->>OPA: allow(user, role, table) per candidate
    OPA-->>A: allowed subset
    A->>L: prompt (system + tables + history)
    L-->>A: SQL candidate
    A->>A: sqlglot validate + keyword denylist + LIMIT 10000
    A->>T: EXPLAIN (TYPE IO)
    T-->>A: bytes_estimate
    alt estimate > cost gate
        A-->>U: 413 (confirm_cost=true to override)
    else under gate
        A->>T: execute with user OAuth token
        T-->>A: rows
        A->>A: redact PII columns from OM tags
        A->>S: publish dashboard to PVC
        A-->>U: 200 {dashboard_url, sql, session_id}
    end

Trino `ai_*` Functions — Inline AI in SQL¶

SELECT id,
       akko_ai_sentiment(comment)                            AS sentiment,
       akko_ai_classify(comment, 'fraud,retention,support')  AS topic,
       akko_ai_pii(comment)                                  AS redacted,
       akko_ai_embed(comment)                                AS vector
FROM   iceberg.banking.transactions
WHERE  ts > current_date - INTERVAL '7' DAY;

25 functions total — 20 data functions (sentiment, classification, pii, sql, keywords, language, entities, risk, anomaly, embed, similarity, search, summarize, translate, ask, sensitivity, ocr, parse_document, transcribe, describe_image) + 5 admin helpers (cache_clear, cb_reset, stats, health, version). See Trino functions for the full list, signatures and performance notes. Run bash scripts/check-ai-functions-count.sh to verify live count against the docs.

RAG Pipeline — Local Embeddings + Retrieval¶

flowchart LR
    DOCS[Docs / PDFs<br/>customer KB] --> CHUNK[Chunker<br/>LangChain]
    CHUNK --> EMB[Ollama<br/>nomic-embed-text<br/>768 dims]
    EMB --> PG[(PostgreSQL<br/>pgvector)]
    Q[User question] --> EMBQ[Ollama<br/>nomic-embed-text]
    EMBQ --> SEARCH[pgvector<br/>cosine top-k]
    PG --> SEARCH
    SEARCH --> CTX[Context<br/>top 5 chunks]
    CTX --> LLM[Ollama<br/>qwen2.5:3b]
    Q --> LLM
    LLM --> ANS[Answer]

See the full walkthrough in RAG pipeline and the notebook notebooks/rag-pipeline-demo.ipynb.

Sovereignty Checklist¶

Guarantee	How
No external API call	LiteLLM routes only to Ollama / vLLM; egress blocked by NetworkPolicies in production values
Model provenance	Models pulled at install time by `ollama-init`; hashes pinned
Reproducible inference	`temperature=0` defaults in ADEN, seed in notebooks
RBAC on models	LiteLLM virtual keys per Keycloak role (see LLM RBAC)
Observability	`aden_query_duration_seconds`, `ai_stats()` JMX, Grafana dashboards

Overview · Data Flow · Security Flow
AI / ADEN · AI / Trino functions · AI / MCP servers · AI / RAG pipeline