Architecture Overview¶
AKKO is organised as six horizontal layers that together deliver a complete sovereign analytics platform. Every layer is backed by open-source components, runs on Kubernetes, and is wired to Keycloak SSO, OPA authorization, OpenMetadata cataloguing, and Prometheus/Loki observability.
Six-Layer Reference Diagram¶
flowchart TB
subgraph L6[Layer 6 — Governance and Security]
direction LR
KC[Keycloak SSO<br/>5 RBAC roles]
OPA[OPA<br/>row/column policies]
OM[OpenMetadata<br/>catalog and lineage]
O2P[oauth2-proxy<br/>ForwardAuth]
LLDAP[LLDAP<br/>optional LDAP]
end
subgraph L5[Layer 5 — Analytics and Consumption]
direction LR
SUP[Superset<br/>dashboards]
JH[JupyterHub<br/>notebooks]
AF[Airflow 3<br/>orchestration]
COCK[Cockpit<br/>portal]
end
subgraph L4[Layer 4 — AI and ML]
direction LR
ADEN[ADEN<br/>NL to SQL to dashboard]
OLL[Ollama<br/>Qwen 2.5 and Nomic]
LLM[LiteLLM<br/>AI gateway]
MLF[MLflow<br/>tracking and registry]
MCP[MCP Servers<br/>Trino and OpenMetadata]
end
subgraph L3[Layer 3 — Compute and Query]
direction LR
TRINO[Trino 480<br/>federated SQL + 25 ai_* UDFs]
SPARK[Spark Connect<br/>gRPC]
DBT[dbt Core<br/>semantic layer]
end
subgraph L2[Layer 2 — Storage and Catalog]
direction LR
POLARIS[Apache Polaris<br/>Iceberg REST catalog]
MINIO[SeaweedFS<br/>S3 object store]
PG[PostgreSQL<br/>PostGIS + pgvector]
end
subgraph L1[Layer 1 — Ingestion]
direction LR
AFI[Airflow DAGs<br/>file + API + JDBC]
SPARKI[Spark ETL<br/>batch loaders]
SRC[External sources<br/>OLTP, files, APIs]
end
SRC --> AFI
SRC --> SPARKI
AFI --> PG
AFI --> MINIO
SPARKI --> MINIO
PG --> SPARK
PG --> TRINO
MINIO --> POLARIS
POLARIS --> TRINO
POLARIS --> SPARK
SPARK --> POLARIS
DBT --> TRINO
TRINO --> ADEN
TRINO --> SUP
TRINO --> JH
TRINO --> MCP
LLM --> OLL
ADEN --> LLM
MCP --> LLM
JH --> MLF
JH --> SPARK
AF --> SPARK
AF --> TRINO
OM -.lineage.-> TRINO
OM -.lineage.-> AF
OM -.catalog.-> POLARIS
KC -.OIDC.-> SUP
KC -.OIDC.-> JH
KC -.OIDC.-> AF
KC -.OIDC.-> OM
KC -.OIDC.-> COCK
O2P -.forward-auth.-> MLF
OPA -.policies.-> TRINO
LLDAP -.users.-> KC
Layer Responsibilities¶
| Layer | Purpose | Key Services |
|---|---|---|
| 1. Ingestion | Move data from source systems into the platform | Airflow DAGs, Spark batch, Postgres CDC |
| 2. Storage | Durable raw and curated tables | SeaweedFS (S3), Apache Polaris (Iceberg REST), PostgreSQL (PostGIS + pgvector) |
| 3. Compute | Run ELT and interactive queries | Trino 480 (federated SQL + 17 ai_* functions), Spark Connect (gRPC), dbt Core |
| 4. AI / ML | Natural-language SQL, embeddings, training, serving | ADEN, Ollama, LiteLLM, MLflow, MCP servers, jupyter-ai |
| 5. Analytics | Dashboards, notebooks, pipeline authoring | Superset, JupyterHub, Airflow 3, Cockpit portal |
| 6. Governance | Identity, policy, catalogue, lineage | Keycloak, OPA, OpenMetadata, oauth2-proxy, LLDAP |
See the dedicated diagrams for each dimension:
- Data Flow — end-to-end pipeline from source to consumption.
- AI Stack — ADEN, Trino
ai_*, RAG and MCP. - Security Flow — Keycloak → OPA → Trino masking.
Cross-Cutting Concerns¶
flowchart LR
subgraph Users
U1[Analyst]
U2[Engineer]
U3[Admin]
U4[AI Agent]
end
subgraph Edge
TRAEFIK[Traefik<br/>TLS + routing]
COCK[Cockpit<br/>portal]
end
subgraph Control[Control Plane]
KC[Keycloak]
OPA[OPA]
OM[OpenMetadata]
end
subgraph Observe[Observability]
PROM[Prometheus]
GRAF[Grafana]
LOKI[Loki]
TEMPO[Tempo]
AM[Alertmanager]
end
U1 --> TRAEFIK
U2 --> TRAEFIK
U3 --> TRAEFIK
U4 --> TRAEFIK
TRAEFIK --> COCK
TRAEFIK --> KC
COCK -.health probes.-> Observe
KC --> OPA
OPA --> OM
Observe -.dashboards.-> COCK
AM -.alerts.-> Control
Every service emits Prometheus metrics, structured JSON logs to Loki, and OTLP traces to Tempo. Alertmanager routes SLO breaches (ADEN fast-burn, trace ingestion stalled, Trino AI plugin breaker open) to the Cockpit Alerts page.
Sovereignty Guarantees¶
- No external API calls. Every AI inference, embedding, and catalogue lookup happens in-cluster.
- Reproducible images. All 12 custom images are pinned to
2026.03. No:latestanywhere. - Portable manifests. The Helm chart runs unchanged on k3d, k3s, EKS, GKE, AKS, OpenShift, bare-metal, and air-gapped clusters.
- Open standards. Iceberg, S3, OIDC, OpenLineage, Prometheus exposition format, OpenTelemetry.
See also: Security Flow · Kubernetes Deployment · Service Catalogue