Dashboards¶
AKKO embeds a first-party dashboarding layer so every metric the
platform emits has a curated visualisation out of the box. The
deployment lives in the akko-observability sub-chart and runs in
the same akko namespace as the rest of the platform.
The layer is exposed at https://metrics.akko-ai.com
(or the value of global.functionalAliases.metrics in your
overrides) — gated by oauth2-proxy + Keycloak SSO.
The cockpit Overview iframe pulls the cluster-overview dashboard
directly from this endpoint, so the chart bootstraps a default
project + datasource + dashboards on every helm upgrade.
What ships out of the box¶
The post-install Job loads every JSON file from
helm/akko/charts/akko-observability/dashboards/*.json into the
akko project. Twelve dashboards :
| Name | Story | What it shows |
|---|---|---|
cluster-overview |
Platform health (cockpit Overview) | Pods ready, ingress 2xx, restarts, node memory |
aden-slo |
ADEN AI service | Query latency p50/p95, success rate, LLM tokens |
storage-layer |
Object storage | Volume server stats |
trino |
Federation flagship | Running queries, failures/min, latency p50/p95, container CPU/mem |
audit-trail |
DORA / NIS2 / GDPR | OPA decisions, Trino queries by user, login activity, ADEN questions |
litellm |
AI gateway | Requests/sec, tokens/5min, errors/min, latency p95, models |
pipelines |
Orchestration | Run status, task duration p95, scheduler heartbeat, top failing pipelines |
platform-slo |
Top-level summary | Pods ready, ingress 2xx, 5xx by service, ingress p95 by service |
trino-slo |
Federation SLO | Success rate, queued queries, p99 latency, memory headroom, queries by catalog |
mlflow |
ML tracking | Runs/h, registered models, model versions, tracking HTTP rate, pod CPU/mem |
jupyterhub |
IDE pods | Active users, running notebooks, spawn p95, per-pod memory, hub HTTP rate |
trino-ai-plugin |
Plugin internals | AI HTTP requests/errors, cache hit rate, circuit breaker, latency by function |
Every panel falls back to vector(0) when its underlying metric is
absent — an idle deployment shows a clean baseline rather than "No data".
Add your own dashboard¶
Drop a JSON file alongside the existing ones. The bootstrap Job picks
it up automatically on the next helm upgrade.
# helm/akko/charts/akko-observability/dashboards/13-my-team.json
{
"kind": "Dashboard",
"metadata": { "name": "my-team", "project": "akko" },
"spec": {
"display": { "name": "My team — KPIs" },
"duration": "1h",
"refreshInterval": "30s",
"panels": {
"rps": {
"kind": "Panel",
"spec": {
"display": { "name": "Requests / sec" },
"plugin": { "kind": "TimeSeriesChart", "spec": {} },
"queries": [{
"kind": "TimeSeriesQuery",
"spec": { "plugin": { "kind": "PrometheusTimeSeriesQuery", "spec": {
"datasource": { "kind": "PrometheusDatasource", "name": "prometheus" },
"query": "sum(rate(http_requests_total{namespace=\"my-team\"}[5m]))"
} } }
}]
}
}
},
"layouts": [{ "kind": "Grid", "spec": { "items": [
{ "x": 0, "y": 0, "width": 24, "height": 8, "content": { "$ref": "#/spec/panels/rps" } }
] } }]
}
}
The default prometheus datasource is already wired (HTTPProxy mode),
so any new dashboard can simply reference
{"name": "prometheus", "kind": "PrometheusDatasource"}.
Smoke test¶
To verify the bootstrap succeeded without opening a browser :
kubectl -n akko exec deploy/akko-akko-cockpit -- \
curl -s http://akko-dashboards:8080/api/v1/projects/akko/dashboards \
| jq -r '.[].metadata.name' | sort
Expected output (one name per line, in any order) :
aden-slo
audit-trail
cluster-overview
jupyterhub
litellm
mlflow
pipelines
platform-slo
storage-layer
trino
trino-ai-plugin
trino-slo
If the count is below 12, the bootstrap Job logs (kubectl -n akko
logs job/akko-dashboards-bootstrap) will show which file failed
to POST.
Why the layer is swappable¶
The platform does not depend on a specific dashboarding engine. The only contract is :
- The engine speaks PromQL natively.
- It exposes a REST API to POST a dashboard JSON.
- It supports a public iframe behind oauth2-proxy.
That contract is satisfied by several Apache 2.0 / MIT dashboarding projects, so the engine can be swapped without rewriting any dashboard JSON or moving any metric. The 12 dashboards above are the contract the platform commits to ; the binary that renders them is an implementation detail.
Related¶
- Cockpit — Overview tab embeds
cluster-overview - Monitoring stack — Prometheus-compatible metrics backend