Skip to content

Architecture Overview

AKKO is organised as six horizontal layers that together deliver a complete sovereign analytics platform. Every layer is backed by open-source components, runs on Kubernetes, and is wired to Keycloak SSO, OPA authorization, OpenMetadata cataloguing, and Prometheus/Loki observability.

Six-Layer Reference Diagram

flowchart TB
    subgraph L6[Layer 6 — Governance and Security]
        direction LR
        KC[Keycloak SSO<br/>5 RBAC roles]
        OPA[OPA<br/>row/column policies]
        OM[OpenMetadata<br/>catalog and lineage]
        O2P[oauth2-proxy<br/>ForwardAuth]
        LLDAP[LLDAP<br/>optional LDAP]
    end

    subgraph L5[Layer 5 — Analytics and Consumption]
        direction LR
        SUP[Superset<br/>dashboards]
        JH[JupyterHub<br/>notebooks]
        AF[Airflow 3<br/>orchestration]
        COCK[Cockpit<br/>portal]
    end

    subgraph L4[Layer 4 — AI and ML]
        direction LR
        ADEN[ADEN<br/>NL to SQL to dashboard]
        OLL[Ollama<br/>Qwen 2.5 and Nomic]
        LLM[LiteLLM<br/>AI gateway]
        MLF[MLflow<br/>tracking and registry]
        MCP[MCP Servers<br/>Trino and OpenMetadata]
    end

    subgraph L3[Layer 3 — Compute and Query]
        direction LR
        TRINO[Trino 480<br/>federated SQL + 25 ai_* UDFs]
        SPARK[Spark Connect<br/>gRPC]
        DBT[dbt Core<br/>semantic layer]
    end

    subgraph L2[Layer 2 — Storage and Catalog]
        direction LR
        POLARIS[Apache Polaris<br/>Iceberg REST catalog]
        MINIO[SeaweedFS<br/>S3 object store]
        PG[PostgreSQL<br/>PostGIS + pgvector]
    end

    subgraph L1[Layer 1 — Ingestion]
        direction LR
        AFI[Airflow DAGs<br/>file + API + JDBC]
        SPARKI[Spark ETL<br/>batch loaders]
        SRC[External sources<br/>OLTP, files, APIs]
    end

    SRC --> AFI
    SRC --> SPARKI
    AFI --> PG
    AFI --> MINIO
    SPARKI --> MINIO
    PG --> SPARK
    PG --> TRINO
    MINIO --> POLARIS
    POLARIS --> TRINO
    POLARIS --> SPARK
    SPARK --> POLARIS
    DBT --> TRINO
    TRINO --> ADEN
    TRINO --> SUP
    TRINO --> JH
    TRINO --> MCP
    LLM --> OLL
    ADEN --> LLM
    MCP --> LLM
    JH --> MLF
    JH --> SPARK
    AF --> SPARK
    AF --> TRINO
    OM -.lineage.-> TRINO
    OM -.lineage.-> AF
    OM -.catalog.-> POLARIS
    KC -.OIDC.-> SUP
    KC -.OIDC.-> JH
    KC -.OIDC.-> AF
    KC -.OIDC.-> OM
    KC -.OIDC.-> COCK
    O2P -.forward-auth.-> MLF
    OPA -.policies.-> TRINO
    LLDAP -.users.-> KC

Layer Responsibilities

Layer Purpose Key Services
1. Ingestion Move data from source systems into the platform Airflow DAGs, Spark batch, Postgres CDC
2. Storage Durable raw and curated tables SeaweedFS (S3), Apache Polaris (Iceberg REST), PostgreSQL (PostGIS + pgvector)
3. Compute Run ELT and interactive queries Trino 480 (federated SQL + 17 ai_* functions), Spark Connect (gRPC), dbt Core
4. AI / ML Natural-language SQL, embeddings, training, serving ADEN, Ollama, LiteLLM, MLflow, MCP servers, jupyter-ai
5. Analytics Dashboards, notebooks, pipeline authoring Superset, JupyterHub, Airflow 3, Cockpit portal
6. Governance Identity, policy, catalogue, lineage Keycloak, OPA, OpenMetadata, oauth2-proxy, LLDAP

See the dedicated diagrams for each dimension:

  • Data Flow — end-to-end pipeline from source to consumption.
  • AI Stack — ADEN, Trino ai_*, RAG and MCP.
  • Security Flow — Keycloak → OPA → Trino masking.

Cross-Cutting Concerns

flowchart LR
    subgraph Users
        U1[Analyst]
        U2[Engineer]
        U3[Admin]
        U4[AI Agent]
    end
    subgraph Edge
        TRAEFIK[Traefik<br/>TLS + routing]
        COCK[Cockpit<br/>portal]
    end
    subgraph Control[Control Plane]
        KC[Keycloak]
        OPA[OPA]
        OM[OpenMetadata]
    end
    subgraph Observe[Observability]
        PROM[Prometheus]
        GRAF[Grafana]
        LOKI[Loki]
        TEMPO[Tempo]
        AM[Alertmanager]
    end
    U1 --> TRAEFIK
    U2 --> TRAEFIK
    U3 --> TRAEFIK
    U4 --> TRAEFIK
    TRAEFIK --> COCK
    TRAEFIK --> KC
    COCK -.health probes.-> Observe
    KC --> OPA
    OPA --> OM
    Observe -.dashboards.-> COCK
    AM -.alerts.-> Control

Every service emits Prometheus metrics, structured JSON logs to Loki, and OTLP traces to Tempo. Alertmanager routes SLO breaches (ADEN fast-burn, trace ingestion stalled, Trino AI plugin breaker open) to the Cockpit Alerts page.

Sovereignty Guarantees

  • No external API calls. Every AI inference, embedding, and catalogue lookup happens in-cluster.
  • Reproducible images. All 12 custom images are pinned to 2026.03. No :latest anywhere.
  • Portable manifests. The Helm chart runs unchanged on k3d, k3s, EKS, GKE, AKS, OpenShift, bare-metal, and air-gapped clusters.
  • Open standards. Iceberg, S3, OIDC, OpenLineage, Prometheus exposition format, OpenTelemetry.

See also: Security Flow · Kubernetes Deployment · Service Catalogue