External Source Federation — Climscore (read-only)¶
Sprint 82 A3 — first external customer data source federated through AKKO Query Engine.
This page documents how AKKO federates a customer's existing PostgreSQL database without touching its data. The reference deployment uses the Climscore climate-risk database hosted in a separate Docker Compose stack on the same Netcup host, but the pattern applies to any external source.
The promise — and the only thing the customer needs to trust — is read-only access. AKKO never writes, never alters, never deletes upstream data. Three independent enforcement layers stack on top of each other so a single misconfiguration does not break that promise.
Architecture¶
+-------------------- AKKO cluster (k3s, akko namespace) ----------------+
| |
| alice (cockpit) -> ADEN/Trino -> climscore.public.communes (SELECT) |
| | |
| v |
| Trino coordinator/workers --> Service akko-climscore-postgres:15432 |
| |
| +-----------+ hostNetwork=true |
| | socat pod | bound to host:15432 +-------- forwards --> |
| +-----------+ | |
+-------------------------------------------------|---------------------+
v
+---------------------- Netcup host (Linux) ----------+
| Docker bridge climscore_climscore (172.18.0.0/16) |
| 172.18.0.2:5432 climscore-postgres-1 (postgis) |
| role akko_readonly GRANT SELECT ONLY |
+-----------------------------------------------------+
Three Kubernetes resources are created in the akko namespace, all gated by global.climscore.enabled (see helm/akko/templates/climscore-readonly-federation.yaml):
| Resource | Purpose |
|---|---|
Deployment akko-climscore-proxy |
alpine/socat running with hostNetwork: true bridges k3s pods (10.42.0.0/16) into the Docker bridge that hosts the climscore PostGIS container. The Docker DOCKER iptables chain rejects pod ingress into 172.18.0.0/16 directly, so the proxy lives in the host network namespace where routing is open. |
Service akko-climscore-postgres |
ClusterIP facade on port 15432. Trino's catalog uses the DNS name akko-climscore-postgres.akko.svc.cluster.local — no host IP leaks into the catalog. |
Secret akko-climscore-readonly-creds |
Holds username, password, host, port, database. Mounted into Trino as env vars (CLIMSCORE_RO_*) and read by the catalog climscore.properties block. |
The Trino catalog climscore is appended to additionalCatalogs in helm/examples/values-netcup.yaml:
connector.name=postgresql
connection-url=jdbc:postgresql://${ENV:CLIMSCORE_RO_HOST}:${ENV:CLIMSCORE_RO_PORT}/${ENV:CLIMSCORE_RO_DB}
connection-user=${ENV:CLIMSCORE_RO_USER}
connection-password=${ENV:CLIMSCORE_RO_PASS}
case-insensitive-name-matching=false
Defense in depth — three independent read-only layers¶
Layer 1 — Postgres role grants (primary)¶
The PostgreSQL role akko_readonly is created out-of-band on the upstream database with SELECT only on the public schema. The database engine rejects every INSERT, UPDATE, DELETE, TRUNCATE, DROP, ALTER, and CREATE regardless of which client issues it. This is the layer the customer audits — they own the database, they own the grants.
Provisioning script (one-shot, idempotent) :
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'akko_readonly') THEN
CREATE USER akko_readonly WITH PASSWORD '<random-strong-pass>';
ELSE
ALTER USER akko_readonly WITH PASSWORD '<random-strong-pass>';
END IF;
END$$;
GRANT CONNECT ON DATABASE climscore TO akko_readonly;
GRANT USAGE ON SCHEMA public TO akko_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO akko_readonly;
GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO akko_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO akko_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON SEQUENCES TO akko_readonly;
REVOKE CREATE ON SCHEMA public FROM akko_readonly;
REVOKE CREATE ON DATABASE climscore FROM akko_readonly;
Verification :
psql -U akko_readonly -d climscore -c 'SELECT count(*) FROM communes' # OK
psql -U akko_readonly -d climscore -c 'CREATE TABLE x()' # ERROR: permission denied
psql -U akko_readonly -d climscore -c 'INSERT INTO communes ...' # ERROR: permission denied
Layer 2 — Helm value marker¶
global.climscore.readOnly: true is set in values.yaml and explicitly overridden in values-netcup.yaml. The value is :
- written into the OPA data file
climscore.json(data.akko.climscore.read_only) - surfaced in the cockpit "External Sources" panel as a green badge
- visible in
helm get values akko -n akkofor the on-call operator
It is documentation that has teeth — flipping it to false requires a Helm overlay change reviewable by the security pipeline.
Layer 3 — OPA write-attempt block¶
The Trino-side OPA policy (helm/akko/charts/akko-opa/templates/configmap.yaml, package trino) carries a _climscore_write_attempt guard that fires when any caller targets the climscore catalog with a write or destructive DDL operation. The guard is ANDed onto every allow rule that grants write privileges :
_climscore_catalog_targeted if {
input.action.resource.table.catalogName == "climscore"
}
_climscore_write_op if {
input.action.operation in write_ops
}
_climscore_write_op if {
input.action.operation in destructive_ops
}
_climscore_write_attempt if {
_climscore_catalog_targeted
_climscore_write_op
not _climscore_user_exempt
}
allow if {
has_role("admin")
not _climscore_write_attempt # admins can still NOT write to climscore
}
Exemption is granted by adding usernames to global.climscore.opaWriteAllowList. Empty by default — even the admin role cannot write to the climscore catalog through Trino without an explicit operator opt-in.
This layer catches one class of mistake the other two miss : a future change that accidentally extends the akko_readonly grant or that switches the catalog to a different connection-user. The DB grant + the OPA guard would both need to break before write traffic reaches the upstream.
Use case — "let me see your data without touching it"¶
The most common objection to a federated query engine sale is :
"What if your engine corrupts our production database?"
The three-layer guarantee answers that objection :
| Customer fear | AKKO mitigation |
|---|---|
| "AKKO writes accidentally" | Postgres rejects writes — wrong privilege at the DB engine. |
| "Someone exploits an admin role" | OPA layer 3 blocks DML on climscore.* for every role. |
| "A future deploy enables writes" | Helm value readOnly: true plus PR review on opaWriteAllowList. |
The customer keeps their connection-string secret with us, scopes the DB role themselves, and audits the AKKO Postgres user. AKKO operators cannot widen that scope unilaterally.
Operations¶
Onboarding a new external source¶
- Customer creates a Postgres role
<src>_readonlywithGRANT SELECTonly. - Operator pre-creates the Secret
akko-<src>-readonly-credswithusername,password,host,port,database. - Add the catalog block to
helm/examples/values-netcup.yamladditionalCatalogs. - Enable
global.<src>.enabled: truein the same overlay. helm upgrade akko helm/akko -n akko -f helm/examples/values-netcup.yaml.- Validation :
kubectl -n akko exec deploy/akko-trino-coordinator -- trino --execute "SHOW TABLES FROM <src>.public".
Rotating the read-only password¶
kubectl -n akko delete secret akko-climscore-readonly-creds
kubectl -n akko create secret generic akko-climscore-readonly-creds \
--from-literal=username=akko_readonly \
--from-literal=password="<new-random>" \
--from-literal=host=akko-climscore-postgres \
--from-literal=port=15432 \
--from-literal=database=climscore
psql -U postgres -d climscore -c "ALTER USER akko_readonly WITH PASSWORD '<new-random>'"
kubectl -n akko rollout restart deploy/akko-trino-coordinator deploy/akko-trino-worker
Health checks¶
# Proxy alive
kubectl -n akko get pod -l app.kubernetes.io/name=akko-climscore-proxy
# Trino sees the catalog
kubectl -n akko exec deploy/akko-trino-coordinator -- trino --execute "SHOW CATALOGS LIKE 'climscore'"
# Read works
kubectl -n akko exec deploy/akko-trino-coordinator -- trino --execute "SELECT count(*) FROM climscore.public.communes"
# Write blocked (must error)
kubectl -n akko exec deploy/akko-trino-coordinator -- trino --execute "CREATE TABLE climscore.public.akko_should_fail (id int)"
Related¶
feedback_readonly_external_source_defense_in_depth.md— design rationale.gotcha_postgres_externalname_host_docker_internal.md— why we run a socat proxy with hostNetwork instead of a plain ExternalName.feedback_dont_touch_climscore.md— STRICT operational boundary on the climscore docker compose stack.