AI Stack¶
AKKO ships a complete AI-native layer that runs 100 % in-cluster. No OpenAI, no Anthropic, no Bedrock, no external embedding service. Every model lives next to your data.
Topology¶
flowchart TB
subgraph Interfaces[User and Agent Interfaces]
direction LR
COCK[Cockpit chat<br/>/api/cockpit/aden]
JH[JupyterHub<br/>jupyter-ai]
CLAUDE[Claude Desktop<br/>Cursor, VS Code]
SQL[SQL client<br/>Trino CLI, DBeaver]
end
subgraph Sovereign[Sovereign AI Layer]
direction TB
ADEN[ADEN<br/>FastAPI + Streamlit]
MCPT[MCP Trino<br/>8 tools]
MCPO[MCP OpenMetadata<br/>catalog + lineage]
TPL[Trino AI Plugin<br/>25 ai_* UDFs<br/>20 data + 5 admin]
AIS[AI Service<br/>FastAPI]
RAG[RAG pipeline<br/>pgvector + Ollama]
end
subgraph Gateway[Inference Gateway]
LLM[LiteLLM<br/>OpenAI-compatible]
OLL[Ollama<br/>qwen2.5-coder:7b<br/>qwen2.5:3b<br/>nomic-embed-text]
VLLM[vLLM<br/>optional GPU backend]
end
subgraph Data[Data Plane]
TRINO[Trino 480]
OM[OpenMetadata]
PG[(PostgreSQL<br/>pgvector)]
MLF[MLflow<br/>model registry]
end
COCK --> ADEN
JH --> AIS
JH --> MLF
CLAUDE --> MCPT
CLAUDE --> MCPO
SQL --> TRINO
TRINO --> TPL
TPL --> AIS
ADEN --> OM
ADEN --> TRINO
ADEN --> LLM
MCPT --> TRINO
MCPT --> AIS
MCPO --> OM
AIS --> LLM
RAG --> PG
RAG --> LLM
LLM --> OLL
LLM -.optional.-> VLLM
Component Map¶
| Component | Role | Doc |
|---|---|---|
| ADEN | Natural-language question -> SQL -> Trino -> Streamlit dashboard | AI / ADEN |
| Trino AI Plugin | 17 ai_* scalar UDFs inside the Trino JVM |
AI / Trino functions |
| AI Service | FastAPI backend called by the Trino plugin (sentiment, classify, summarize, embed, translate, entities) | Services / AI Service |
| RAG pipeline | pgvector + nomic-embed-text + Ollama, demo notebook | AI / RAG pipeline |
| MCP Servers | Trino (8 tools) + OpenMetadata (catalog, lineage) for AI agents | AI / MCP servers |
| LiteLLM | OpenAI-compatible gateway, multi-tenant keys | Services / LiteLLM |
| Ollama | Local LLM inference (CPU / GPU) | Services / Ollama |
| vLLM | Optional GPU backend for higher throughput | Services / vLLM |
| MLflow | Experiment tracking, model registry, artifact store on MinIO | Services / MLflow |
| jupyter-ai | In-notebook IA assistant wired to LiteLLM | Services / JupyterHub |
ADEN Request Life-Cycle¶
sequenceDiagram
participant U as User (Cockpit)
participant A as ADEN
participant OM as OpenMetadata
participant OPA as OPA
participant L as LiteLLM / Ollama
participant T as Trino
participant S as Streamlit
U->>A: POST /ask {question, session_id}
A->>OM: search candidate tables (top N)
A->>OPA: allow(user, role, table) per candidate
OPA-->>A: allowed subset
A->>L: prompt (system + tables + history)
L-->>A: SQL candidate
A->>A: sqlglot validate + keyword denylist + LIMIT 10000
A->>T: EXPLAIN (TYPE IO)
T-->>A: bytes_estimate
alt estimate > cost gate
A-->>U: 413 (confirm_cost=true to override)
else under gate
A->>T: execute with user OAuth token
T-->>A: rows
A->>A: redact PII columns from OM tags
A->>S: publish dashboard to PVC
A-->>U: 200 {dashboard_url, sql, session_id}
end
Trino ai_* Functions — Inline AI in SQL¶
SELECT id,
akko_ai_sentiment(comment) AS sentiment,
akko_ai_classify(comment, 'fraud,retention,support') AS topic,
akko_ai_pii(comment) AS redacted,
akko_ai_embed(comment) AS vector
FROM iceberg.banking.transactions
WHERE ts > current_date - INTERVAL '7' DAY;
25 functions total — 20 data functions (sentiment, classification, pii, sql, keywords, language, entities, risk, anomaly, embed, similarity, search, summarize, translate, ask, sensitivity, ocr, parse_document, transcribe, describe_image) + 5 admin helpers (cache_clear, cb_reset, stats, health, version). See Trino functions for the full list, signatures and performance notes. Run bash scripts/check-ai-functions-count.sh to verify live count against the docs.
RAG Pipeline — Local Embeddings + Retrieval¶
flowchart LR
DOCS[Docs / PDFs<br/>customer KB] --> CHUNK[Chunker<br/>LangChain]
CHUNK --> EMB[Ollama<br/>nomic-embed-text<br/>768 dims]
EMB --> PG[(PostgreSQL<br/>pgvector)]
Q[User question] --> EMBQ[Ollama<br/>nomic-embed-text]
EMBQ --> SEARCH[pgvector<br/>cosine top-k]
PG --> SEARCH
SEARCH --> CTX[Context<br/>top 5 chunks]
CTX --> LLM[Ollama<br/>qwen2.5:3b]
Q --> LLM
LLM --> ANS[Answer]
See the full walkthrough in RAG pipeline and the notebook notebooks/rag-pipeline-demo.ipynb.
Sovereignty Checklist¶
| Guarantee | How |
|---|---|
| No external API call | LiteLLM routes only to Ollama / vLLM; egress blocked by NetworkPolicies in production values |
| Model provenance | Models pulled at install time by ollama-init; hashes pinned |
| Reproducible inference | temperature=0 defaults in ADEN, seed in notebooks |
| RBAC on models | LiteLLM virtual keys per Keycloak role (see LLM RBAC) |
| Observability | aden_query_duration_seconds, ai_stats() JMX, Grafana dashboards |