Skip to content

AKKO Control Plane

AKKO integrates ~42 components (storage, catalog, compute, orchestration, BI, AI, RBAC). Exposing each one's native UI made the platform read like "a stack of open-source bricks" rather than "one product". The Control Plane fixes that: a thin FastAPI service that owns the product abstractions and proxies downstream to the right component.

Principle — the cockpit and external clients only talk to /api/v1/*. Each native component keeps its URL for direct operator use, but day-to-day product flows never need it.

What it exposes

Abstraction Endpoint What's behind
Dataset GET /api/v1/datasets OpenMetadata tables + OPA allow/deny + Trino FQN
Workspace GET /api/v1/workspaces
POST /api/v1/workspaces
JupyterHub single-user server (spawn / stop / describe)
Pipeline GET /api/v1/pipelines Airflow DAGs + DagRun history (trigger coming Phase 1.5)
Agent POST /api/v1/agents/ask Forwards to ADEN with caller identity

OpenAPI spec: /api/openapi.json · Swagger UI: /api/docs.

Architecture

flowchart LR
    Client[Cockpit React / CLI / SDK] --> Traefik[Traefik + oauth2-proxy]
    Traefik -->|X-Auth-Request-* headers| CP[akko-control-plane]
    CP --> OM[OpenMetadata]
    CP --> JH[JupyterHub]
    CP --> AF[Airflow 3]
    CP --> AD[ADEN]
    CP --> OPA[OPA]
    CP --> TR[Trino]

The Control Plane is stateless: no database, no background jobs. Identity comes from the X-Auth-Request-Email and X-Auth-Request-Groups headers injected by oauth2-proxy forward-auth. A NetworkPolicy limits ingress to cockpit + Traefik pods, so the headers cannot be forged by other workloads in the namespace.

What it is NOT

  1. Not a god service. No business logic duplicated from downstream components. If a call can be a 1-line proxy, it is. The role of the Control Plane is translation, not re-implementation.
  2. Not on the hot path. Trino query execution, JupyterHub WebSocket kernel traffic, Airflow task logs stream directly from their native ingress. The Control Plane is for product intent (list / create / describe), not sub-second data flow.
  3. Not the auth layer. oauth2-proxy + Keycloak do the JWT validation; the Control Plane only reads the trusted headers that land after.

Role of each abstraction

Dataset

A Dataset is the product-level view of a governed data asset. Each Dataset record composes:

  • the OpenMetadata entity (name, owner, description, tags, domain)
  • the Trino FQN the caller can paste into any notebook or BI client (catalog.schema.table)
  • the OPA allow/deny flag for the caller (arriving in Phase 1.2)
  • the PII / classification tags from OpenMetadata

The cockpit never shows iceberg.analytics.transactions as a raw path — it shows "Transactions dataset, owner @carol, PII tagged, you can read".

Workspace

A Workspace is the user's isolated working area. Behind the scenes it composes:

  • a JupyterHub single-user server (the notebook compute)
  • a Trino session scoped to the caller's identity (so OPA row-filter + column-mask apply)
  • a MinIO prefix under akko-users/<user>/ for artefacts
  • metadata (name, description, created_at)

POST /api/v1/workspaces returns 501 until the JupyterHub admin-token wiring ships (Phase 1.4b). Read-only endpoints already work.

Pipeline

A Pipeline projects an Airflow DAG as a business pipeline:

  • the DAG run status (healthy / stale / failing)
  • the OpenLineage graph (upstream / downstream Datasets)
  • the last N DagRun states — at a glance, not one click into Airflow

Users never learn the word "DAG". Trigger / pause / resume land in Phase 1.5b.

Agent

Today, a single Agent: ADEN. POST /api/v1/agents/ask forwards a natural-language question to ADEN with the caller headers injected so ADEN's per-user OPA decisions see the right principal. Multiple Agents (SQL duel validator, auto-doc, auto-remediation) plug in here in Phase 3 without breaking the cockpit contract.

Deployment

Part of the umbrella chart when enabled. Ships as its own sub-chart:

# helm/akko/charts/akko-control-plane/values.yaml
image:
  repository: akko-control-plane
  tag: "2026.04"
env:
  openmetadataUrl: "http://openmetadata:8585"
  trinoUrl: "http://akko-trino:8080"
  airflowUrl: "http://akko-api-server:8080"
  jupyterhubUrl: "http://proxy-public"
  jupyterhubHubUrl: "http://hub:8081"
  adenUrl: "http://akko-akko-aden:8000"
  opaUrl: "http://akko-akko-opa:8181"

Every URL is overridable for air-gapped / non-standard naming.

Roadmap

The Control Plane is Phase 1 of the broader product-over-tech plan documented in the AKKO planning repository (private). Timeline:

Phase Period Goal
1.1 (done) 2026-04-22 Service skeleton + Helm chart + 4 endpoint stubs
1.2 Sprint 42 OPA enrichment, projection polish, Trino FQN rewrite
1.3-1.5 Sprint 42 Full abstraction coverage (Dataset / Workspace / Pipeline mutating verbs)
2 Sprint 43-44 Cockpit canvas + Cmd+K built on top of the Control Plane
3 Sprint 45 Additional Agents (SQL duel, auto-remediation, auto-doc)

See also