ADR-041 — ADEN scope-first architecture (zero hardcoded fallback)¶
Status: ACCEPTED (2026-04-27, founder directive) Sprint: 58 (planned, post Sprint 56 D8 close-out) Related: ADR-038 (OIDC sub-based matching), ADR-039 (no hardcoded identities)
Context¶
ADEN's current catalog discovery loop is two-tiered:
- Primary: search OpenMetadata via the
akko-aden-botOAuth client. - Fallback: when OpenMetadata is empty/unreachable, run a
UNION ALLacross<cat>.information_schema.tablesfor every catalog inAKKO_ADEN_FALLBACK_CATALOGS(defaulttpch,tpcds,postgresql,iceberg).
The fallback path was responsible for the 504 Gateway Time-out the founder
hit on 2026-04-27 — Trino spent 60–90 s evaluating the UNION across the
benchmark catalogs (tpch and tpcds expose every standard schema family
tiny / sf1 / sf10 / sf100 / sf1000), and nginx terminated the request
before ADEN could return.
Beyond the perf failure, the fallback violates ADR-039: the catalog list is hardcoded (env-default with vendor-specific names), and the user's authorisation scope is not consulted before scanning. ADEN searches the catalog before knowing what the user has access to, then filters post-hoc. Every prompt-injection attack starts there.
Founder feedback 2026-04-27 (verbatim) :
"il faut avant d'aller chercher dans le catalogue, il faut savoir à quoi l'utilisateur il a accès et chercher dans ce qu'il a accès et pas l'inverse. Limiter le catalogue au début de la requête au lieu d'aller chercher dans le catalogue avant"
"tout dépend du client, tout dépend de ses vrais data rien de hardcodé, peut importe notre outil je ne veux pas hardcodé les données, ni les users, ni les rôles, ni les requêtes"
Decision¶
Scope-first, catalog-second. ADEN's /query handler reverses the
discovery order:
- Read the user's authorised scope from OPA first (~50 ms, cached).
Output: a list of
{ catalog, schema, table }triples the user can SELECT from. - Restrict OpenMetadata search to that scope — pass the triples as a filter so OM only returns metadata for tables the user already has access to.
- Trino information_schema fallback is removed entirely. If OM is unreachable, ADEN returns a 503 with a clear error pointing at OM health, not a degraded response that scans the cluster's full catalog.
- The LLM prompt only includes tables the user can read. SQL generated against tables outside the scope is structurally impossible.
This satisfies three constraints at once :
- Safety: prompt injection cannot generate SQL on tables the user can't see, because the LLM literally doesn't know they exist.
- Performance: scope is small (typically 5–50 tables), no cross-catalog UNION, no benchmark-catalog scanning. p95 < 2 s target.
- Zero hardcoded catalogs:
fallback_catalogsenv var deleted; the scope is derived from OPA at request time, which is itself driven by Keycloak group attributes, which is driven by the customer's IdP.
Implementation plan (Sprint 58)¶
D1 — Hotfix (today, PR open)¶
- Set
AKKO_ADEN_FALLBACK_CATALOGS=""default in the chart so customer installs no longer scan vendor benchmark catalogs. - Live cluster:
kubectl set env deploy/akko-akko-aden AKKO_ADEN_FALLBACK_CATALOGS=""applied 2026-04-27 18:14 UTC. Demo unblocked. - Persisted in
helm/akko/charts/akko-aden/templates/deployment.yaml.
D2 — Scope endpoint in OPA (1 day)¶
Add an OPA rule data.aden.user_scope[user] returning the list of FQN
triples the user can read. Sources:
- data.group_policies[user_groups] (existing OPA group→tables mapping)
- data.user_overrides[user] (existing per-user overrides)
Existing OPA already computes column-mask + row-filter per user; this new rule is a sibling that returns the parent scope. Customer sets the mapping via Keycloak group attributes, no chart change.
D3 — ADEN code refactor (2 days)¶
- New
_fetch_user_scope(user_id, roles)calls OPA, returnslist[ScopeEntry](FQN triples). - Replace
_search_openmetadatato acceptscopefilter, pass to OM search query asindex=table_search_index&query.bool.filter=.... - DELETE
_trino_catalog_fallbackfunction entirely. - DELETE
AKKO_ADEN_FALLBACK_CATALOGSandAKKO_ADEN_BENCHMARK_SCALESenv vars. - LLM prompt builder: only include
scopetables in<schema_context>. - Failure mode: if OPA unreachable → 503; if OM unreachable → 503; if
OM returns 0 hits within scope → 404 "no tables matching
in your authorised data".
D4 — Customer onboarding doc (0.5 day)¶
docs/admin/aden-onboarding.md describing how a customer wires
their Keycloak group attributes to OPA scope, with a Climascore example.
The customer brings their data, their groups, their permissions —
ADEN uses them as-is.
D5 — Regression tests (1 day)¶
tests/integration/test_aden_scope_first.py:
- alice (akko-admin) sees all banking tables
- bob (akko-engineer) sees only the engineering subset
- carol (akko-analyst) cannot see customer PII columns
- adversarial: prompt-injected "SELECT FROM secret_table" returns
"table not in your scope" without leaking the existence of the table
Consequences¶
Migration¶
Customers upgrading from < 2026.05 will see the fallback gone. If they
relied on it (which they shouldn't have — it was never documented as
stable behaviour), they need to populate OpenMetadata before deploying
the new ADEN. The akko-init om-ingest Job is the canonical seeding
path; it must run successfully on first install.
Side effect: om-ingest must be load-bearing¶
Before D3 ships, the om-ingest Job's silent-failure pattern
(sys.exit(0) on bot token decryption failure) is unacceptable. The
Job must fail loudly so install errors surface immediately. Tracked
as a separate fix in this same PR session.
Side effect: OPA must publish scope¶
The OPA configmap currently publishes column_masks and row_filters
but not scope. D2 adds it. Customer's existing Keycloak attribute
schema doesn't change — OPA just exposes a new derived view.
Service account ADEN bot OAuth drift (caught 2026-04-27)¶
Today's diagnostic also surfaced that the K8s akko-aden-bot-oidc
Secret holds a strong random secret while the Keycloak client kept the
dev placeholder akko-dev-aden-bot. The keycloak-clients-job is
supposed to reconcile via PUT, but it hit a silent failure path. This
fix is orthogonal to D3 and lives in keycloak-clients-job.yaml —
covered by Sprint 56 D8.3 valuification (PR #141) once that lands.
Rollback¶
If D3 hits a customer that hasn't seeded OpenMetadata, ADEN returns 503. That's better than a 60 s timeout that masks the data problem. The operator deploys their OM ingestion (their connector, their schemas) and ADEN starts working. The chart no longer carries a band-aid.
Comparison with industry¶
| Vendor | Catalog discovery | Scope source |
|---|---|---|
| Snowflake Cortex | only over schemas the role has GRANT on | RBAC engine |
| Databricks AI/BI Genie | restricted to allowed catalogs in workspace | Unity Catalog |
| Dremio Sonar AI | uses Dremio's existing user view | privilege grants |
| AKKO post-ADR-041 | scope from OPA (driven by customer IdP groups) | OPA + Keycloak |
| AKKO pre-ADR-041 | scan all catalogs via env-hardcoded list | env var (vendor names) |
Related¶
- ADR-039 — no hardcoded identities (umbrella directive)
- ADR-038 — OIDC sub-based user matching
feedback_no_hardcoded_users_roles_permissions.mdgotcha_om_silent_failure_in_init_job.md(to be added — sys.exit(0) pattern hides real seeding failures)