Skip to content

Architecture

AKKO (Analytics Kernel, Keep Ownership) is a sovereign analytics stack that unifies 33+ services into a turnkey lakehouse stack. Deploy with Helm on Kubernetes -- locally via k3d for development, or on any CNCF-conformant cluster for production. Every component is open-source, every byte of data stays on your infrastructure.

High-Level Overview

                         Traefik (TLS + routing)
                                 |
        +------------------------+------------------------+
        |            |           |           |            |
   JupyterHub    Superset   Dashboards   Airflow    Cockpit
   (notebooks)    (BI)     (monitoring) (pipelines) (portal)
        |            |           |           |
        +-----+------+-----+----+-----------+
              |             |
         Trino (SQL)   Spark Connect
         federation      (gRPC)
              |             |
              +------+------+
                     |
              Apache Polaris
             (Iceberg catalog)
                     |
           Object storage (S3)
                     |
              PostgreSQL
            (PostGIS + pgvector)

Data flow: the object storage layer stores raw data as Parquet files. Apache Polaris manages Iceberg table metadata (schemas, snapshots, partitions). Spark and Trino both connect to Polaris via the Iceberg REST protocol to read and write tables. JupyterHub and Superset consume query results for analysis and visualization.

Service Layers

Infrastructure

Service Image Role
Traefik traefik:v3.6.9 Reverse proxy, TLS termination, routes *.akko.local to services via Kubernetes Ingress
Cockpit nginx:1.29-alpine Portal dashboard with service health cards, latency display, architecture diagram
PostgreSQL/PostGIS akko-postgres:2026.04 (custom) Relational database with PostGIS (geospatial) and pgvector (embeddings) extensions
oauth2-proxy oauth2-proxy:v7.14.3 ForwardAuth middleware for services without native OIDC support
AKKO Docs nginx:1.29-alpine Serves published Quarto reports at docs.akko.local

Data Lake

Service Image Role
Object storage chrislusf/seaweedfs S3-compatible object storage for Iceberg data files
storage-init amazon/aws-cli Creates the akko-warehouse + per-service buckets on first start

Iceberg Catalog

Service Image Role
Apache Polaris apache/polaris:1.3.0-incubating Iceberg REST catalog server (replaced tabulario/iceberg-rest). Stores metadata in PostgreSQL, serves the Iceberg REST protocol on port 8181
polaris-init curlimages/curl:8.11.1 Idempotent bootstrap: creates catalog, RBAC principals/roles, and default namespaces via the Polaris Management API

Compute

Service Image Role
Spark Master akko-spark:2026.04 (custom) Cluster manager for distributed Spark jobs
Spark Worker akko-spark:2026.04 (custom) Executor node (2 cores, 2GB RAM)
Spark Connect akko-spark:2026.04 (custom) gRPC server (port 15002) for remote Spark sessions from notebooks
Trino trinodb/trino:480 Federated SQL engine with Iceberg (via Polaris REST) and PostgreSQL catalogs, file-based RBAC
DuckDB (in-process) Embedded analytics engine inside notebook containers, zero-copy Arrow interop

IDE

Service Image Role
JupyterHub akko-jupyterhub (custom) Multi-user hub that spawns individual notebook containers via KubeSpawner
akko-notebook akko-notebook (custom) User container with Python, R (IRkernel + tidyverse), Julia (DataFrames.jl), code-server (VS Code), Quarto, jupyter-ai, dbt-trino, Great Expectations

Analytics

Service Image Role
Apache Superset apache/superset:4.1.1 BI platform with auto-provisioned Trino datasource and banking dashboard
Apache Airflow apache/airflow:3.1.7-python3.12 Pipeline orchestration with OpenLineage console transport for lineage tracking

AI

Service Image Role
Ollama ollama/ollama:0.17.4 Local LLM inference server (CPU/GPU). Ships with qwen2.5-coder:7b (code generation), qwen2.5:3b (chat), and nomic-embed-text (embeddings, 768 dim)
ollama-init ollama/ollama:0.17.4 Pulls embedding and LLM models on first start
LiteLLM ghcr.io/berriai/litellm:main-stable Unified AI gateway with OpenAI-compatible API. Routes requests to Ollama models, provides usage tracking and rate limiting

Data Governance (profile: governance)

These services require 16GB+ RAM. Enable them by setting openmetadata.enabled=true and akko-opensearch.enabled=true in your Helm values.

Service Image Role
OpenMetadata Server openmetadata/server:1.12.1 Data catalog, discovery, lineage, quality tests, glossary. Authenticated via Keycloak OIDC (server-side, confidential client)
OpenMetadata Ingestion openmetadata/ingestion:1.12.1 Embedded Airflow for metadata ingestion workflows
OpenMetadata Migrate openmetadata/server:1.12.1 Database migration sidecar (runs once before server starts)
OpenSearch opensearchproject/opensearch:2.19.1 Search backend for OpenMetadata (Apache 2.0 licensed, replaces Elasticsearch)

Observability (metrics · logs · traces)

Service Image Role
Prometheus prom/prometheus:3.6.7 Metrics collection from all services. Scrapes ADEN /metrics for aden_query_duration_seconds + aden_query_total
Dashboards persesdev/perses PromQL panel rendering, alerting, log exploration. Keycloak SSO integration. Twelve first-party dashboards ship out of the box (cluster-overview, aden-slo, storage-layer, trino, audit-trail, litellm, pipelines, platform-slo, trino-slo, mlflow, jupyterhub, trino-ai-plugin).
Logs victoriametrics/victoria-logs Log aggregation + LogsQL search backend
Log shipper fluent/fluent-bit Container log collector, ships to the logs layer
Tempo grafana/tempo:2.6.1 Distributed tracing. ADEN emits OTLP spans (service.name=akko-aden) via opentelemetry-instrumentation-fastapi. Datasource auto-wired (service map → Prometheus, traces-to-logs).
Alertmanager prom/alertmanager:v0.31.1 Alert routing. Ships AdenSLOFastBurn (1 h × 14.4 rate, warning), AdenSLOSlowBurn (6 h × 6 rate, info), AdenTraceIngestionStalled (10 min).

Security

Service Image Role
Keycloak keycloak:26.1 SSO identity provider with the akko realm: 13 OAuth2 clients, 5 RBAC roles (akko-admin, akko-user, akko-engineer, akko-analyst, akko-viewer), pre-configured test users
OPA openpolicyagent/opa:1.4.2 Open Policy Agent for fine-grained access control. Provides row-level security and column masking policies for Trino via the OPA plugin

Network Topology

Kubernetes (primary): Services run as pods in the akko namespace. Traefik handles ingress routing with TLS termination via cert-manager or self-signed certificates. Inter-service communication uses Kubernetes service DNS names. In k3d, KC_HOSTNAME_BACKCHANNEL_DYNAMIC=true enables pod-to-pod OIDC token exchange with internal URLs.

Port Mapping

Service Ingress URL Internal Port
Cockpit https://akko.local 80
Traefik Dashboard https://traefik.akko.local 8080
JupyterHub https://lab.akko.local 8000
Superset https://bi.akko.local 8088
Airflow https://orchestrator.akko.local 8082
Dashboards https://metrics.akko.local 8080
Keycloak https://identity.akko.local 8080
Trino https://federation.akko.local 8080
Spark Master UI https://compute.akko.local 8080
Object storage console https://storage.akko.local 8888
Polaris https://polaris.akko.local 8181
Prometheus https://prometheus.akko.local 9090
Ollama https://ollama.akko.local 11434
LiteLLM https://llm.akko.local 4000
AKKO Docs https://docs.akko.local 80
OpenMetadata https://catalog.akko.local 8585
PostgreSQL ClusterIP (internal only) 5432
Object storage S3 API ClusterIP (internal only) 8333
Spark Master ClusterIP (internal only) 7077
Spark Connect ClusterIP (internal only) 15002

Init Sidecars

AKKO uses idempotent init sidecars (Helm hooks in Kubernetes, or init containers) to guarantee that all required resources exist on every deployment, regardless of whether it is a fresh install or an upgrade.

Sidecar Runs After What It Does
postgres-init postgres healthy Executes ensure.sql: creates extensions (PostGIS, pgvector), schemas (rag, geospatial), tables, and users. Safe to re-run (uses IF NOT EXISTS and ON CONFLICT DO NOTHING)
polaris-init polaris + storage healthy Creates the akko-warehouse catalog with S3 storage config, sets up RBAC (principal role ALL, catalog role with CATALOG_MANAGE_CONTENT), creates default namespaces (banking, raw, staging, analytics). Idempotent via HTTP status codes (201 = created, 409 = exists)
storage-init storage healthy Creates the akko-warehouse bucket with --ignore-existing
superset-init superset healthy Imports Trino datasource YAML and runs bootstrap_dashboard.py to create 8 datasets, 8 charts, and 1 dashboard
ollama-init ollama healthy Pulls qwen2.5-coder:7b, qwen2.5:3b, and nomic-embed-text models

Custom Docker Images

AKKO ships 12 custom Docker images built by helm/scripts/build-images.sh. All images are tagged 2026.04 (no :latest anywhere).

Image Dockerfile Contents
akko-postgres:2026.04 docker/postgres/Dockerfile PostgreSQL 16 with PostGIS 3.4, pgvector, and pgaudit extensions
akko-spark:2026.04 docker/spark/Dockerfile Apache Spark 3.5.1 with Iceberg 1.5.2, AWS S3 JARs, Spark Connect server. JARs must be baked into the image (not loaded via --jars or --packages)
akko-notebook:2026.04 docker/jupyterhub/Dockerfile.notebook Python 3.11, R (IRkernel, tidyverse, sf, RPostgres), Julia 1.11 (DataFrames.jl, CSV.jl), Scala 2.13 (Almond kernel), code-server v4.109.5, Quarto, Mermaid diagrams, jupyter-ai, dbt-trino, Great Expectations, GeoPandas, Polars, DuckDB, Altair, Folium
akko-cockpit:2026.04 branding/cockpit/Dockerfile Portal with 8 pages (Home, DevHub, AI, Governance, Architecture, Logs, Monitoring, Alerts) + nginx health proxies
akko-trino:2026.04 docker/trino-ai-functions/Dockerfile Trino 480 with AI functions plugin (14 scalar functions: akko_ai_ask, akko_ai_sentiment, akko_ai_classify, etc.)
akko-ai-service:2026.04 docker/ai-service/Dockerfile FastAPI AI service (sentiment, classify, summarize, embed, translate, entities)
akko-mlflow:2026.04 docker/mlflow/Dockerfile MLflow tracking server with S3 object storage and PostgreSQL backends
akko-airflow:2026.04 docker/airflow/Dockerfile Airflow 3.1.7 with trino, mlflow, boto3, psycopg2, PyJWT baked in
akko-dbt:2026.04 docker/dbt/Dockerfile dbt Core + dbt-trino adapter (semantic layer, 6 banking models)
akko-mcp-trino:2026.04 docker/mcp-trino/Dockerfile MCP Server for Trino (8 tools, Model Context Protocol, sovereign)
akko-mcp-openmetadata:2026.04 docker/mcp-openmetadata/Dockerfile MCP Server for OpenMetadata (catalog discovery, lineage)

Deployment Modes

AKKO deploys identically in two modes:

Helm on Kubernetes (Primary)

All services are packaged as an umbrella Helm chart (helm/akko/) with 29 sub-chart dependencies. Deploy on any CNCF-conformant Kubernetes distribution (k3s, k3d, kubeadm, RKE2, OVHcloud, OpenShift). Use k3d for local development.

cd helm/scripts
./k3d-create.sh    # Create a local k3d cluster
./deploy.sh        # Single-phase deploy: all services at once

The deploy.sh script uses a two-phase deployment strategy: Step 1 installs infrastructure (PostgreSQL, object storage, Keycloak) with --no-hooks, then Step 2 runs helm upgrade with hooks enabled to execute init jobs (postgres-init, polaris-init) in the correct order.

For full details, see the Kubernetes Deployment guide.


Design Principles

  1. Idempotent initialization: Init jobs and sidecars run on every startup and guarantee resource existence without errors if resources already exist. No manual steps are ever required after helm install.

  2. No manual fixes: Every configuration change must survive a full helm uninstall && helm install. If a fix requires a manual command, it is an architecture bug.

  3. Health probes everywhere: Every pod defines readiness and liveness probes. Helm hooks and init containers enforce correct startup ordering.

  4. Pinned versions: Every container image uses explicit version tags. Zero latest tags in the entire chart.

  5. Kubernetes-native networking: All services communicate over the akko namespace. Traefik handles ingress routing and TLS termination. Internal communication uses Kubernetes service DNS names (e.g., akko-postgresql, akko-trino, akko-akko-polaris).

  6. Secrets from Kubernetes: All credentials are stored in Kubernetes Secrets (auto-generated by Helm templates or pre-created). Services reference secrets via valueFrom.secretKeyRef in pod specs. For dev, deterministic passwords are set in values-dev.yaml.