Skip to content

ADR-040 — Cockpit backend with Keycloak service account (no user-bearer pass-through)

  • Status : Accepted (2026-04-27)
  • Stewards : platform-security, cockpit-team
  • Supersedes : (none — companion to ADR-039)
  • Related : ADR-038 (sub-based OIDC matching), gotcha_keycloak_realm_drift, feedback_no_bricolage_research_first

Context

The cockpit Gouvernance page (and its tabs Data Access, LLM RBAC, audit trail builder) needs to read and write Keycloak data : list groups, search users, attach role mappings, persist attributes['data-policies'] on groups, etc. Today the JS calls /api/keycloak/admin/{groups,users,...} which is reverse-proxied by the akko-cockpit nginx pod with the user's own access token forwarded as Bearer to Keycloak Admin REST.

Two problems with that design :

  1. It does not work out of the box. The access token issued for the oauth2-proxy client does not include realm-management in its audience claim. Keycloak Admin REST returns 401, and the cockpit surfaces ERROR: Keycloak admin call denied. We patched it with a protocol mapper (oidc-audience-mapper for realm-management) — see the post-upgrade hook in akko-keycloak/templates/post-upgrade-audience-mapper-hook.yaml and the realm template patch in helm/examples/realm-akko-k3d.json. This fix unblocks the demo but does not address the design issue.

  2. It violates the principle of least privilege. Forwarding a user's bearer token to the IdP admin plane means every cockpit tab carries realm-admin privileges by chain — any XSS on cockpit, any compromised user with akko-admin, becomes a Keycloak realm admin. The blast radius is the entire identity provider for AKKO.

Decision

We will introduce a dedicated akko-cockpit-backend sub-chart (FastAPI or Go) that holds the only credentials authorised to call Keycloak Admin REST. The browser will only call our own backend, never Keycloak directly.

High-level shape

Browser
   │  Authorization: Bearer <user-access-token>
oauth2-proxy ─(ForwardAuth, sets X-Auth-Request-User/Email/Groups)──┐
   │                                                                 │
   │  /api/governance/groups, /api/governance/policies, …            │
   ▼                                                                 │
akko-cockpit-backend  ◄── reads X-Auth-Request-* + verifies JWT ────┘
   │                       (validates `akko-admin` group; rejects 403 otherwise)
   │  Authorization: Bearer <service-account-token>   (client_credentials grant)
Keycloak Admin REST  /admin/realms/akko/{users,groups,…}

Authorization layers

  1. oauth2-proxy validates the user session cookie and injects X-Auth-Request-{User,Email,Groups,Access-Token} headers.
  2. akko-cockpit-backend validates the JWT signature against https://identity.<domain>/realms/akko/protocol/openid-connect/certs, checks the groups claim contains akko-admin (or whatever role the endpoint requires), and writes an audit row (actor_sub, actor_email, action, target, ts) to PostgreSQL before performing the admin call.
  3. The service account (Keycloak client akko-cockpit-backend with serviceAccountsEnabled: true, granted realm-admin) is the only identity that ever talks to Keycloak Admin REST.

Why this is the enterprise norm

  • Apache Ranger / Knox : the Ranger Admin UI calls the Ranger REST API which holds Hadoop service-impersonation credentials. The browser never carries Hadoop kerberos tickets.
  • OpenShift Console : the console pod uses an OAuth client + ServiceAccount token to call kube-apiserver. The browser cookie talks to the console only.
  • HashiCorp Boundary admin UI : same split. UI ↔ Boundary controller (with priv creds) ↔ targets.
  • Snowflake / Dremio / DataHub : same shape. Frontend calls a backend that holds the priv creds for the identity store.

What we explicitly avoid

  • Forwarding user bearers to the IdP admin plane — even with the audience mapper in place, this creates a privilege escalation path through cockpit JS.
  • A "thin" cockpit backend that just relays calls — the backend must enforce its own RBAC + write its own audit log. Otherwise the delta from "nginx proxy with bearer" is zero.

Consequences

Positive

  • Single point of audit (cockpit-backend → Postgres akko_audit.cockpit_events).
  • Service account credentials stored in Vault/k8s Secret only, rotated via existing helm/scripts/generate-dev-secrets.sh flow.
  • Frontend code shrinks : no more kcFetch wrapper, no more 401/403 handling spread across branding/cockpit/app.js.
  • Keycloak oauth2-proxy client can drop the realm-management audience mapper once cockpit no longer calls Admin REST directly.
  • ADR-039 alignment : per-tenant cockpit-backend with the customer's IdP client credentials, never AKKO-side hardcoded admin keys.

Negative

  • New sub-chart to maintain (akko-cockpit-backend) — Sprint 57.5 estimate : 5 days (3 dev + 2 test + audit log schema).
  • Each governance feature in cockpit JS needs a backend handler ; current kcFetch(...) call sites become fetch('/api/governance/…'). Total ~25 endpoints (groups, users, role-mappings, scope-mappings, client-scopes, attributes).
  • Migration is staged : we keep the audience-mapper hook + nginx proxy active until the backend covers ≥100 % of cockpit governance JS calls.

Implementation roadmap

Sprint Deliverable
Sprint 56 (today) Audience mapper short-term fix + this ADR + xfail E2E test + memory entry. Demo unblocked.
Sprint 57.5 D1-2 Scaffold akko-cockpit-backend sub-chart : FastAPI app, NetworkPolicy, Service, Deployment, Ingress (/api/governance/*). Service account akko-cockpit-backend in realm with realm-admin clientRole. Vault-stored client secret.
Sprint 57.5 D3 First three endpoints migrated : GET /groups, POST /groups/{id}/role-mappings/realm, PUT /groups/{id} (attributes). Each writes audit row + checks akko-admin group. Backend pytest test suite (12 cases).
Sprint 57.5 D4 Remaining 22 endpoints migrated. Cockpit JS removes kcFetch. nginx /api/keycloak/admin/ location deleted from akko-cockpit/templates/configmap.yaml.
Sprint 57.5 D5 Audience mapper hook + realm patch removed. E2E test test_persona_alice_governance.py flips from xfail to required. PR + ADR-040 closeout.

Open questions

  • Audit retention — 90 days hot in Postgres, 7 years cold in Iceberg (DPIA Art. 30 alignment). To resolve in Sprint 57.5 D2.
  • Backend language — FastAPI is the AKKO default (akko-rag, ADEN, catalog-manager). No reason to deviate.
  • Per-tenant credential isolation — cockpit-backend must not hardcode the realm name "akko" ; templated via env var AKKO_KEYCLOAK_REALM derived from global.auth.realm (default: akko, customer override via Helm value).

References

  • gotcha_keycloak_realm_drift (why Keycloak realm imports are not idempotent, and why we run post-upgrade Helm hooks)
  • ADR-038 (sub claim is the stable identifier — same matching applies to the cockpit-backend audit log)
  • ADR-039 (no hardcoded identities — applies to the service account credentials, which must come from the customer's secret store, not the umbrella chart's defaults)
  • feedback_no_bricolage_research_first (this ADR is the research-first rationale documented as required by the rule)
  • Apache Ranger Admin Architecture (Ranger Admin Server + REST API + Postgres) — public docs at https://ranger.apache.org/
  • OpenShift Console source (pkg/server/middleware.go, service account bearer pattern)