ADEN — Autonomous Data Exploration Node¶
ADEN turns a natural-language question into Trino SQL, runs it with the caller's identity, and publishes a Streamlit dashboard. It is AKKO's sovereign open-source equivalent of Databricks Genie, Snowflake Copilot, and Dremio Text-to-SQL — with one big difference: nothing ever leaves your cluster.
| Comparison | ADEN | Databricks Genie | Snowflake Copilot |
|---|---|---|---|
| Self-hosted, OSS | ✓ | ✗ | ✗ |
| RBAC enforcement | OPA | Unity Catalog | Snowflake Roles |
| PII redaction | ✓ | ✓ | ✓ |
| EXPLAIN cost gate | ✓ | ✗ | ✗ |
| Publish dashboard | Streamlit | Genie session | None |
| Conversation memory | ✓ | ✓ | ✓ |
| 100 % offline | ✓ | ✗ | ✗ |
Architecture¶
Cockpit /api/cockpit/aden/ (nginx forward-auth)
│
▼
docker/aden (FastAPI)
│ │ │ │
│ │ │ └──> OPA /v1/data/akko/aden/allow
│ │ └──────> OpenMetadata /api/v1/tables/... (PII tag lookup)
│ └──────────> LiteLLM → Ollama (qwen2.5-coder:7b)
└──────────────> Trino /v1/statement (user token, RBAC enforced)
│
▼
PVC akko-aden-reports (RWO)
│
▼
docker/streamlit (RO mount) — reports.<domain>
Every step is logged as structured JSON to logs layer via audit_log(). The
events you can grep for in Dashboards → logs layer are:
aden_query_receivedaden_cache_hitaden_no_tables,aden_opa_deniedaden_sql_generated,aden_cost_gate,aden_trino_executedaden_pii_masked,aden_dashboard_publishedaden_feedback
Safety pipeline¶
Every ADEN request goes through a strict pipeline before any query reaches Trino. The order is intentional — each step assumes the previous one succeeded.
- Cache lookup — sha256(question | role) keyed TTL cache (5 min).
force=trueon the request bypasses it. - Catalog search — OpenMetadata
_searchreturns the top-N candidate tables for the question. - OPA per-table check — for each candidate, POST to
/v1/data/akko/aden/allowwith{user, role, table}. Tables that come back denied are dropped silently from the prompt context. - LLM generation — qwen2.5-coder:7b receives the system prompt + the
allowed tables + the last 3 turns of session history (when a
session_idis provided). sqlglotvalidation — the LLM output is parsed in the Trino dialect. Anything that is not a singleSELECT/WITH/UNION/EXCEPT/INTERSECTis rejected. A defensive keyword denylist (INSERT/UPDATE/DELETE/...) runs as a second line of defence.LIMIT 10000is auto-injected when absent.EXPLAIN (TYPE IO)— Trino estimates how many bytes the query would scan. If the estimate is over the gate limit (default 1 GiB,AKKO_COST_BYTES_LIMIT), the request returnsHTTP 413. The user can re-send the same question withconfirm_cost=trueto override.- Trino execution — runs with the caller's OAuth token, so any row-filter or column-mask configured upstream still applies.
- PII redaction — every column tagged
PIIin OpenMetadata is replaced by***in the preview rows AND the persisted dashboard, unless the role is in the bypass list (default: admin + engineer). - Dashboard publication — a Jinja template renders a standalone
/srv/reports/aden-<id>.pyStreamlit page and writes it to the shared PVC. The runtime sub-chartakko-streamlitexposes it atreports.<domain>. - Cache + session memory — the
AdenResponseis stored under the cache key, and the conversation history is appended for follow-ups.
Sharing & multi-tenant runtime (Sprint 25)¶
Every dashboard is owned by the user who created it. The owner can change its visibility through the cockpit ADEN page → Share… modal:
| Visibility | Who can open the dashboard |
|---|---|
private |
Only the owner (and admins, who always override) |
users |
Owner + each address in allowed_users[] |
groups |
Owner + every member of any Keycloak group in allowed_groups[] |
public |
Every authenticated AKKO user except akko-viewer |
The decision is enforced server-side by the OPA akko.aden_share package
(see helm/akko/charts/akko-opa/templates/configmap.yaml). The cockpit
asks for the matrix via GET /dashboards?scope=mine|shared|public|all
and never relies on client-side filtering.
Why ShinyProxy¶
Visibility metadata in Postgres only solves discovery. It does not
prevent two users from sharing a Streamlit Python interpreter — a
malicious dashboard could read the next user's st.session_state. We
need physical isolation.
ShinyProxy (Apache 2.0, deployed as the akko-shinyproxy sub-chart)
spawns a fresh container per (user × dashboard) and routes the
browser to it. State is per-container, lifecycle is per-session, and a
NetworkPolicy locks the pod's egress to Trino + DNS only — no reading
sibling pods, no internet, no Postgres outside ADEN's schema.
App catalog flow¶
akko-adenwrites a row toaden.dashboardswhenever a query succeeds (/queryendpoint).- The
akko-app-catalog-syncerDeployment polls that table every 30 s and renders one ShinyProxy app spec per row into theakko-shinyproxy-appsConfigMap. - ShinyProxy hot-reloads on ConfigMap change (
apps-config-folder). - When a user opens
https://reports.<domain>/app/aden-<id>, ShinyProxy checks its OIDC session, matches the user's Keycloak groups against the app'saccess-groups, then spawns the pod.
Dashboard signing (HMAC-SHA256)¶
Because Streamlit executes arbitrary Python from the PVC, ADEN signs
every dashboard file with HMAC-SHA256 of AKKO_DASHBOARD_HMAC_KEY
before writing it. The header line # AKKO_SIG: <hex> is verified at
launch — a tampered file is refused.
Key rotation: set AKKO_DASHBOARD_HMAC_KEY_OLD to the previous key
before changing the active one. Both keys validate during the
transition; remove _OLD once every dashboard has been re-signed by
its owner (or after the configured retention period).
Configuration¶
All operational knobs are env vars on the akko-aden deployment, so no
code change is needed to tune the safety gates:
| Env var | Default | Purpose |
|---|---|---|
AKKO_SQL_AUTO_LIMIT |
10000 |
Auto-injected LIMIT |
AKKO_COST_BYTES_LIMIT |
1 GiB |
EXPLAIN gate threshold |
AKKO_COST_GATE |
true |
Disable the gate (testing only) |
AKKO_QUERY_CACHE_TTL |
300 |
Cache TTL in seconds |
AKKO_QUERY_CACHE_SIZE |
256 |
Cache entries |
AKKO_SESSION_TTL |
3600 |
Conversation history TTL |
AKKO_PII_BYPASS_ROLES |
akko-admin,akko-engineer |
Roles that see raw PII |
AKKO_PII_REDACTION |
*** |
Replacement string |
Role-to-table mapping lives in the OPA configmap under
aden_access.json, in the akko_aden_access.roles block. To open a new
schema to analysts, edit that file and helm upgrade — no code change.
REST API¶
POST /query¶
Headers: X-User-Id, X-User-Role, X-User-Token.
{
"question": "top 10 customers by revenue in 2025",
"session_id": "s-abcd1234",
"execute": true,
"confirm_cost": false,
"force": false,
"persist": true
}
Response shape includes sql, columns, preview_rows, row_count,
dashboard_url, reasoning, masked_columns, cost_bytes, cached,
session_history_size.
The optional model field overrides the server default
(AKKO_LLM_MODEL) for that single query. The value must match a
model_name declared in LiteLLM (GET /models); unknown names are
rejected with 400 Unknown LLM model.
GET /models¶
Returns the LLM models ADEN can route to and the current default:
{
"default": "akko-chat",
"models": ["akko-chat", "akko-fast", "akko-coder", "claude-sonnet"],
"details": [...]
}
Backed by LiteLLM's /v1/models (results memoised 60 s). The cockpit
"Modèle IA" dropdown above the chat is populated from this endpoint and
the user's last choice is persisted in localStorage.
To add a new provider (Claude, OpenAI, Mistral, a second Ollama
instance), append to akko-litellm.config.model_list in your values
file and helm upgrade — no code change. Example:
akko-litellm:
config:
model_list:
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-6
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: mistral-on-prem
litellm_params:
model: openai/mistral
api_base: http://internal-mistral.acme.local:8000/v1
api_key: os.environ/MISTRAL_KEY
The new entries appear in the picker on the next cockpit reload (60 s cache).
POST /feedback¶
Captures thumbs up/down per dashboard. Used to build the future quality dashboard and to seed prompt-tuning datasets.
GET /session/{session_id}¶
Returns the conversation history for the calling user. The cockpit calls this on page reload to restore the chat panel.
GET /catalog?q=<keywords>¶
Searches OpenMetadata then falls back to a Trino multi-catalog scan
(tpch, tpcds, postgresql, iceberg by default — override via
AKKO_ADEN_FALLBACK_CATALOGS). Returns fqn, name, description,
columns (filled via system.jdbc.columns so the LLM receives a real
schema, not hallucinated columns).
GET /dashboards?scope={mine,shared,public,all}¶
Lists dashboards visible to the caller. The cockpit "My dashboards", "Shared with me", "Public" tabs call this endpoint.
PATCH /dashboards/{id}/share¶
Updates visibility + allow-lists. Body:
{
"visibility": "public | users | groups | private",
"allowed_users": ["bob@akko.local"],
"allowed_groups": ["akko-analyst"]
}
Owner-only (non-owners get 404 so existence is not leaked).
DELETE /dashboards/{id}¶
Owner or allow-listed user. Deletes the DB row (the generated .py
file on the PVC is GC'd by the next janitor tick).
GET /dashboards/{id}/access¶
OPA-backed allow/deny decision. Used by ShinyProxy before spawning a
Streamlit pod for a non-owner caller. Returns {allow: bool, reason: string}.
Observability¶
Tracing is active when AKKO_TRACING_ENABLED=true (default on Netcup).
OTLP spans land in Tempo as service.name=akko-aden. The Dashboards
dashboard akko-aden-slo shows p50/p95/p99 latency, success rate,
error-budget burn (1 h/6 h/1 d windows).
/metrics is exposed by prometheus-fastapi-instrumentator:
aden_query_duration_seconds_bucket (histogram) and aden_query_total
(counter) are the series the SLO dashboard queries.
Operational gotchas (caught 2026-04-16 on Netcup)¶
- Env var prefix: Helm passes
AKKO_TRINO_URL,AKKO_LITELLM_URL,AKKO_OPA_URL,AKKO_OPENMETADATA_URL,AKKO_REPORTS_DIR. Python reads those first then falls back to legacy names (TRINO_URLetc.) for localdocker run -e …testing. - NetworkPolicy labels:
app.kubernetes.io/namematches bothakko-opa/akko-litellm/akko-keycloak/akko-postgres/akko-tempoAND the unprefixed equivalents so the policy stays portable. - OPA ConfigMap: the 3 ADEN policy files (
aden_access.json,aden_access.rego,aden_share.rego) are mounted only whenakko-opa.adenPoliciesEnabled=true. - Database bootstrap:
akko-initcreates theadendatabase and appliesaden-schema.sqlat every helm upgrade. Idempotent. - Trino service user:
akko-aden.trino.serviceUseris the username ADEN uses for its OWN metadata queries (jdbc.tables scan in degraded mode, EXPLAIN for cost gate). Must match a real Keycloak user that OPA/Ranger accepts. User queries still run under their individual identity (fromX-User-Id). - Cockpit header:
X-User-Idcarries the Keycloakpreferred_username(not email). Trino OPA policy matches on username.
Cockpit page¶
The cockpit ships an ADEN sidebar item visible from akko-user and
above. The page has:
- A textarea for the question (Cmd/Ctrl+Enter to submit).
- Two checkboxes: Dry run (return SQL without exec) and Confirm cost (override the EXPLAIN gate).
- A Reasoning panel that shows what ADEN understood: intent, candidate tables, OPA-allowed tables, tables actually used in the SQL.
- An editable SQL textarea with cost estimate.
- A Results table with row count and PII-redaction badge.
- 👍 / 👎 feedback buttons + a link to the published dashboard.
See also¶
- LLM RBAC — quotas + role-to-model matrix
- OpenMetadata — where catalog metadata + PII tags live
- Trino — query engine, OPA Trino access policies
- ADR XXX (planned) — ADEN architecture decision record