Architecture¶
AKKO (Analytics Kernel, Keep Ownership) is a sovereign analytics stack that unifies 33+ services into a turnkey lakehouse stack. Deploy with Helm on Kubernetes -- locally via k3d for development, or on any CNCF-conformant cluster for production. Every component is open-source, every byte of data stays on your infrastructure.
High-Level Overview¶
Traefik (TLS + routing)
|
+------------------------+------------------------+
| | | | |
JupyterHub Superset Dashboards Airflow Cockpit
(notebooks) (BI) (monitoring) (pipelines) (portal)
| | | |
+-----+------+-----+----+-----------+
| |
Trino (SQL) Spark Connect
federation (gRPC)
| |
+------+------+
|
Apache Polaris
(Iceberg catalog)
|
Object storage (S3)
|
PostgreSQL
(PostGIS + pgvector)
Data flow: the object storage layer stores raw data as Parquet files. Apache Polaris manages Iceberg table metadata (schemas, snapshots, partitions). Spark and Trino both connect to Polaris via the Iceberg REST protocol to read and write tables. JupyterHub and Superset consume query results for analysis and visualization.
Service Layers¶
Infrastructure¶
| Service | Image | Role |
|---|---|---|
| Traefik | traefik:v3.6.9 |
Reverse proxy, TLS termination, routes *.akko.local to services via Kubernetes Ingress |
| Cockpit | nginx:1.29-alpine |
Portal dashboard with service health cards, latency display, architecture diagram |
| PostgreSQL/PostGIS | akko-postgres:2026.04 (custom) |
Relational database with PostGIS (geospatial) and pgvector (embeddings) extensions |
| oauth2-proxy | oauth2-proxy:v7.14.3 |
ForwardAuth middleware for services without native OIDC support |
| AKKO Docs | nginx:1.29-alpine |
Serves published Quarto reports at docs.akko.local |
Data Lake¶
| Service | Image | Role |
|---|---|---|
| Object storage | chrislusf/seaweedfs |
S3-compatible object storage for Iceberg data files |
| storage-init | amazon/aws-cli |
Creates the akko-warehouse + per-service buckets on first start |
Iceberg Catalog¶
| Service | Image | Role |
|---|---|---|
| Apache Polaris | apache/polaris:1.3.0-incubating |
Iceberg REST catalog server (replaced tabulario/iceberg-rest). Stores metadata in PostgreSQL, serves the Iceberg REST protocol on port 8181 |
| polaris-init | curlimages/curl:8.11.1 |
Idempotent bootstrap: creates catalog, RBAC principals/roles, and default namespaces via the Polaris Management API |
Compute¶
| Service | Image | Role |
|---|---|---|
| Spark Master | akko-spark:2026.04 (custom) |
Cluster manager for distributed Spark jobs |
| Spark Worker | akko-spark:2026.04 (custom) |
Executor node (2 cores, 2GB RAM) |
| Spark Connect | akko-spark:2026.04 (custom) |
gRPC server (port 15002) for remote Spark sessions from notebooks |
| Trino | trinodb/trino:480 |
Federated SQL engine with Iceberg (via Polaris REST) and PostgreSQL catalogs, file-based RBAC |
| DuckDB | (in-process) | Embedded analytics engine inside notebook containers, zero-copy Arrow interop |
IDE¶
| Service | Image | Role |
|---|---|---|
| JupyterHub | akko-jupyterhub (custom) |
Multi-user hub that spawns individual notebook containers via KubeSpawner |
| akko-notebook | akko-notebook (custom) |
User container with Python, R (IRkernel + tidyverse), Julia (DataFrames.jl), code-server (VS Code), Quarto, jupyter-ai, dbt-trino, Great Expectations |
Analytics¶
| Service | Image | Role |
|---|---|---|
| Apache Superset | apache/superset:4.1.1 |
BI platform with auto-provisioned Trino datasource and banking dashboard |
| Apache Airflow | apache/airflow:3.1.7-python3.12 |
Pipeline orchestration with OpenLineage console transport for lineage tracking |
AI¶
| Service | Image | Role |
|---|---|---|
| Ollama | ollama/ollama:0.17.4 |
Local LLM inference server (CPU/GPU). Ships with qwen2.5-coder:7b (code generation), qwen2.5:3b (chat), and nomic-embed-text (embeddings, 768 dim) |
| ollama-init | ollama/ollama:0.17.4 |
Pulls embedding and LLM models on first start |
| LiteLLM | ghcr.io/berriai/litellm:main-stable |
Unified AI gateway with OpenAI-compatible API. Routes requests to Ollama models, provides usage tracking and rate limiting |
Data Governance (profile: governance)¶
These services require 16GB+ RAM. Enable them by setting openmetadata.enabled=true and akko-opensearch.enabled=true in your Helm values.
| Service | Image | Role |
|---|---|---|
| OpenMetadata Server | openmetadata/server:1.12.1 |
Data catalog, discovery, lineage, quality tests, glossary. Authenticated via Keycloak OIDC (server-side, confidential client) |
| OpenMetadata Ingestion | openmetadata/ingestion:1.12.1 |
Embedded Airflow for metadata ingestion workflows |
| OpenMetadata Migrate | openmetadata/server:1.12.1 |
Database migration sidecar (runs once before server starts) |
| OpenSearch | opensearchproject/opensearch:2.19.1 |
Search backend for OpenMetadata (Apache 2.0 licensed, replaces Elasticsearch) |
Observability (metrics · logs · traces)¶
| Service | Image | Role |
|---|---|---|
| Prometheus | prom/prometheus:3.6.7 |
Metrics collection from all services. Scrapes ADEN /metrics for aden_query_duration_seconds + aden_query_total |
| Dashboards | persesdev/perses |
PromQL panel rendering, alerting, log exploration. Keycloak SSO integration. Twelve first-party dashboards ship out of the box (cluster-overview, aden-slo, storage-layer, trino, audit-trail, litellm, pipelines, platform-slo, trino-slo, mlflow, jupyterhub, trino-ai-plugin). |
| Logs | victoriametrics/victoria-logs |
Log aggregation + LogsQL search backend |
| Log shipper | fluent/fluent-bit |
Container log collector, ships to the logs layer |
| Tempo | grafana/tempo:2.6.1 |
Distributed tracing. ADEN emits OTLP spans (service.name=akko-aden) via opentelemetry-instrumentation-fastapi. Datasource auto-wired (service map → Prometheus, traces-to-logs). |
| Alertmanager | prom/alertmanager:v0.31.1 |
Alert routing. Ships AdenSLOFastBurn (1 h × 14.4 rate, warning), AdenSLOSlowBurn (6 h × 6 rate, info), AdenTraceIngestionStalled (10 min). |
Security¶
| Service | Image | Role |
|---|---|---|
| Keycloak | keycloak:26.1 |
SSO identity provider with the akko realm: 13 OAuth2 clients, 5 RBAC roles (akko-admin, akko-user, akko-engineer, akko-analyst, akko-viewer), pre-configured test users |
| OPA | openpolicyagent/opa:1.4.2 |
Open Policy Agent for fine-grained access control. Provides row-level security and column masking policies for Trino via the OPA plugin |
Network Topology¶
Kubernetes (primary): Services run as pods in the akko namespace. Traefik handles ingress routing with TLS termination via cert-manager or self-signed certificates. Inter-service communication uses Kubernetes service DNS names. In k3d, KC_HOSTNAME_BACKCHANNEL_DYNAMIC=true enables pod-to-pod OIDC token exchange with internal URLs.
Port Mapping¶
| Service | Ingress URL | Internal Port |
|---|---|---|
| Cockpit | https://akko.local |
80 |
| Traefik Dashboard | https://traefik.akko.local |
8080 |
| JupyterHub | https://lab.akko.local |
8000 |
| Superset | https://bi.akko.local |
8088 |
| Airflow | https://orchestrator.akko.local |
8082 |
| Dashboards | https://metrics.akko.local |
8080 |
| Keycloak | https://identity.akko.local |
8080 |
| Trino | https://federation.akko.local |
8080 |
| Spark Master UI | https://compute.akko.local |
8080 |
| Object storage console | https://storage.akko.local |
8888 |
| Polaris | https://polaris.akko.local |
8181 |
| Prometheus | https://prometheus.akko.local |
9090 |
| Ollama | https://ollama.akko.local |
11434 |
| LiteLLM | https://llm.akko.local |
4000 |
| AKKO Docs | https://docs.akko.local |
80 |
| OpenMetadata | https://catalog.akko.local |
8585 |
| PostgreSQL | ClusterIP (internal only) | 5432 |
| Object storage S3 API | ClusterIP (internal only) | 8333 |
| Spark Master | ClusterIP (internal only) | 7077 |
| Spark Connect | ClusterIP (internal only) | 15002 |
Init Sidecars¶
AKKO uses idempotent init sidecars (Helm hooks in Kubernetes, or init containers) to guarantee that all required resources exist on every deployment, regardless of whether it is a fresh install or an upgrade.
| Sidecar | Runs After | What It Does |
|---|---|---|
| postgres-init | postgres healthy |
Executes ensure.sql: creates extensions (PostGIS, pgvector), schemas (rag, geospatial), tables, and users. Safe to re-run (uses IF NOT EXISTS and ON CONFLICT DO NOTHING) |
| polaris-init | polaris + storage healthy |
Creates the akko-warehouse catalog with S3 storage config, sets up RBAC (principal role ALL, catalog role with CATALOG_MANAGE_CONTENT), creates default namespaces (banking, raw, staging, analytics). Idempotent via HTTP status codes (201 = created, 409 = exists) |
| storage-init | storage healthy |
Creates the akko-warehouse bucket with --ignore-existing |
| superset-init | superset healthy |
Imports Trino datasource YAML and runs bootstrap_dashboard.py to create 8 datasets, 8 charts, and 1 dashboard |
| ollama-init | ollama healthy |
Pulls qwen2.5-coder:7b, qwen2.5:3b, and nomic-embed-text models |
Custom Docker Images¶
AKKO ships 12 custom Docker images built by helm/scripts/build-images.sh.
All images are tagged 2026.04 (no :latest anywhere).
| Image | Dockerfile | Contents |
|---|---|---|
akko-postgres:2026.04 |
docker/postgres/Dockerfile |
PostgreSQL 16 with PostGIS 3.4, pgvector, and pgaudit extensions |
akko-spark:2026.04 |
docker/spark/Dockerfile |
Apache Spark 3.5.1 with Iceberg 1.5.2, AWS S3 JARs, Spark Connect server. JARs must be baked into the image (not loaded via --jars or --packages) |
akko-notebook:2026.04 |
docker/jupyterhub/Dockerfile.notebook |
Python 3.11, R (IRkernel, tidyverse, sf, RPostgres), Julia 1.11 (DataFrames.jl, CSV.jl), Scala 2.13 (Almond kernel), code-server v4.109.5, Quarto, Mermaid diagrams, jupyter-ai, dbt-trino, Great Expectations, GeoPandas, Polars, DuckDB, Altair, Folium |
akko-cockpit:2026.04 |
branding/cockpit/Dockerfile |
Portal with 8 pages (Home, DevHub, AI, Governance, Architecture, Logs, Monitoring, Alerts) + nginx health proxies |
akko-trino:2026.04 |
docker/trino-ai-functions/Dockerfile |
Trino 480 with AI functions plugin (14 scalar functions: akko_ai_ask, akko_ai_sentiment, akko_ai_classify, etc.) |
akko-ai-service:2026.04 |
docker/ai-service/Dockerfile |
FastAPI AI service (sentiment, classify, summarize, embed, translate, entities) |
akko-mlflow:2026.04 |
docker/mlflow/Dockerfile |
MLflow tracking server with S3 object storage and PostgreSQL backends |
akko-airflow:2026.04 |
docker/airflow/Dockerfile |
Airflow 3.1.7 with trino, mlflow, boto3, psycopg2, PyJWT baked in |
akko-dbt:2026.04 |
docker/dbt/Dockerfile |
dbt Core + dbt-trino adapter (semantic layer, 6 banking models) |
akko-mcp-trino:2026.04 |
docker/mcp-trino/Dockerfile |
MCP Server for Trino (8 tools, Model Context Protocol, sovereign) |
akko-mcp-openmetadata:2026.04 |
docker/mcp-openmetadata/Dockerfile |
MCP Server for OpenMetadata (catalog discovery, lineage) |
Deployment Modes¶
AKKO deploys identically in two modes:
Helm on Kubernetes (Primary)¶
All services are packaged as an umbrella Helm chart (helm/akko/) with 29 sub-chart dependencies. Deploy on any CNCF-conformant Kubernetes distribution (k3s, k3d, kubeadm, RKE2, OVHcloud, OpenShift). Use k3d for local development.
cd helm/scripts
./k3d-create.sh # Create a local k3d cluster
./deploy.sh # Single-phase deploy: all services at once
The deploy.sh script uses a two-phase deployment strategy: Step 1 installs infrastructure (PostgreSQL, object storage, Keycloak) with --no-hooks, then Step 2 runs helm upgrade with hooks enabled to execute init jobs (postgres-init, polaris-init) in the correct order.
For full details, see the Kubernetes Deployment guide.
Design Principles¶
-
Idempotent initialization: Init jobs and sidecars run on every startup and guarantee resource existence without errors if resources already exist. No manual steps are ever required after
helm install. -
No manual fixes: Every configuration change must survive a full
helm uninstall && helm install. If a fix requires a manual command, it is an architecture bug. -
Health probes everywhere: Every pod defines readiness and liveness probes. Helm hooks and init containers enforce correct startup ordering.
-
Pinned versions: Every container image uses explicit version tags. Zero
latesttags in the entire chart. -
Kubernetes-native networking: All services communicate over the
akkonamespace. Traefik handles ingress routing and TLS termination. Internal communication uses Kubernetes service DNS names (e.g.,akko-postgresql,akko-trino,akko-akko-polaris). -
Secrets from Kubernetes: All credentials are stored in Kubernetes Secrets (auto-generated by Helm templates or pre-created). Services reference secrets via
valueFrom.secretKeyRefin pod specs. For dev, deterministic passwords are set invalues-dev.yaml.