Monitoring Stack¶
Metrics, logs, dashboards, and alerting.
| Dashboards URL | https://metrics.akko.local |
| Prometheus URL | https://prometheus.akko.local |
| Authentication | Keycloak SSO (Dashboards) |
| Helm sub-charts | monitoring (kube-prometheus-stack), loki (logs layer Stack) |
Overview¶
AKKO includes a full observability stack that provides metrics collection, log aggregation, visualization, and alerting. All five monitoring services start with the core profile -- no extra flags needed.
grafana.akko.local
|
Traefik (TLS)
|
Dashboards (:3000)
/ | \
Prometheus logs layer Alertmanager
(:9090) (:3100) (:9093)
| |
scrape log shipper
targets (:9080)
| |
+-------+---+ Docker
| | | container
object storage JHub ... logs
Components¶
Prometheus¶
Metrics collection engine. Scrapes targets every 15 seconds and evaluates alert rules every 15 seconds.
| Setting | Value |
|---|---|
| URL | https://prometheus.akko.local |
| Internal port | 9090 |
| Config file | monitoring/prometheus/prometheus.yml |
| Alert rules | monitoring/prometheus/rules/akko-alerts.yml |
| Scrape interval | 15s |
Active scrape targets:
| Job | Target | Metrics Path |
|---|---|---|
prometheus |
localhost:9090 |
/metrics |
alertmanager |
akko-alertmanager:9093 |
/metrics |
minio |
akko-minio:9000 |
/minio/v2/metrics/cluster |
jupyterhub |
akko-jupyterhub:8000 |
/hub/metrics |
Trino and Airflow metrics
Trino and Airflow scrape targets are currently disabled because their health endpoints return JSON, not Prometheus format. To enable them:
- Trino: Deploy JMX Exporter as a Java agent in the Trino image (port 9483)
- Airflow: Deploy a
statsd_exportersidecar or installapache-airflow-providers-statsd
Dashboards¶
Visualization and dashboard platform. Pre-configured with three datasources and four dashboards.
| Setting | Value |
|---|---|
| URL | https://metrics.akko.local |
| Internal port | 3000 |
| Authentication | Keycloak SSO |
| Datasource config | monitoring/grafana/provisioning/datasources/datasource.yml |
| Dashboard config | monitoring/grafana/provisioning/dashboards/dashboards.yml |
Pre-configured datasources:
| Datasource | Type | URL | Default |
|---|---|---|---|
| Prometheus | prometheus |
http://akko-prometheus-server:9090 |
Yes |
| logs layer | loki |
http://akko-loki:3100 |
No |
| Alertmanager | alertmanager |
http://akko-alertmanager:9093 |
No |
Pre-built dashboards:
| Dashboard | File | Description |
|---|---|---|
| AKKO Overview | akko-overview.json |
Platform-wide health and resource usage |
| AKKO object storage | akko-minio.json |
object storage metrics (objects, bandwidth) |
| AKKO Trino | akko-trino.json |
Trino query metrics (requires JMX Exporter) |
| AKKO JupyterHub | akko-jupyterhub.json |
JupyterHub user and spawner metrics |
logs layer¶
Log aggregation engine. Receives logs from log shipper and makes them queryable through Dashboards using LogQL.
| Setting | Value |
|---|---|
| Internal port | 3100 |
| Config file | monitoring/loki/loki-config.yml |
| Max query lines | 1000 |
log shipper¶
Log collection agent. Scrapes Docker container logs and forwards them to logs layer with labels for service name, container ID, and other metadata.
| Setting | Value |
|---|---|
| Internal port | 9080 |
| Config file | monitoring/promtail/promtail-config.yml |
| Source | Docker container logs |
Alertmanager¶
Alert routing and notification engine. Receives alerts from Prometheus and routes them based on severity.
| Setting | Value |
|---|---|
| Internal port | 9093 |
| Config file | monitoring/alertmanager/alertmanager.yml |
| Default receiver | Logging only (stdout) |
Alert Rules¶
AKKO ships with pre-configured alert rules in monitoring/prometheus/rules/akko-alerts.yml:
| Alert | Severity | Condition | For |
|---|---|---|---|
| ServiceDown | Critical | up == 0 |
2 min |
| HighLatency | Warning | Response time > 2s | 5 min |
| HighMemoryUsage | Warning | Container memory > 90% of limit | 2 min |
| HighCPUUsage | Warning | Container CPU > 80% | 5 min |
| DiskSpaceRunningLow | Warning | Filesystem > 85% used | 5 min |
| PostgresConnectionsHigh | Warning | Connections > 80% of max | 5 min |
| StorageBucketEmpty | Info | Bucket has 0 objects | 10 min |
Alert routing is configured with severity-based grouping:
- Critical alerts: 5-second group wait, 15-minute repeat interval
- All other alerts: 10-second group wait, 1-hour repeat interval
- Inhibition: A critical alert suppresses warnings for the same service
Accessing Dashboards¶
- Open
https://metrics.akko.local - Log in with Keycloak SSO (e.g.,
alicewith the admin role) - Navigate to Dashboards in the left sidebar to view pre-built dashboards
- Use Explore to query Prometheus metrics or logs layer logs directly
Querying Logs in Dashboards¶
To view logs from a specific AKKO service:
- Go to Explore (compass icon in the left sidebar)
- Select the logs layer datasource
- Use a LogQL query:
Filter by log level:
Querying Metrics¶
Select the Prometheus datasource in Explore and use PromQL:
# Service availability (1 = up, 0 = down)
up
# JupyterHub active users
jupyterhub_running_servers
# object storage requests per second
rate(minio_http_requests_total[5m])
Adding Custom Alerts¶
Create a new YAML file in monitoring/prometheus/rules/:
groups:
- name: my-custom-alerts
rules:
- alert: SlowTrinoQueries
expr: trino_query_execution_time_seconds > 30
for: 5m
labels:
severity: warning
annotations:
summary: "Slow Trino queries detected"
description: "Queries taking longer than 30 seconds for 5+ minutes."
Prometheus automatically loads all *.yml files from the rules directory.
Adding Custom Dashboards¶
Place JSON dashboard files in monitoring/grafana/dashboards/. Dashboards's provisioning system detects new files and loads them automatically.
To export an existing dashboard:
- Open the dashboard in Dashboards
- Click the Share icon (top bar)
- Select Export > Save to file
- Place the JSON file in
monitoring/grafana/dashboards/
Configuring Notifications¶
The default Alertmanager configuration uses a logging-only receiver (alerts appear in stdout). To enable real notifications, edit monitoring/alertmanager/alertmanager.yml:
After editing, restart Alertmanager:
Configuration Files Reference¶
| File | Purpose |
|---|---|
monitoring/prometheus/prometheus.yml |
Prometheus scrape targets and global config |
monitoring/prometheus/rules/akko-alerts.yml |
Alert rule definitions |
monitoring/grafana/provisioning/datasources/datasource.yml |
Datasource auto-provisioning |
monitoring/grafana/provisioning/dashboards/dashboards.yml |
Dashboard provisioning config |
monitoring/grafana/dashboards/*.json |
Pre-built Dashboard definitions |
monitoring/loki/loki-config.yml |
logs layer storage and retention config |
monitoring/promtail/promtail-config.yml |
log shipper log scraping config |
Kubernetes Deployment
In Helm/k8s mode, Prometheus, Dashboards, and Alertmanager are deployed via the kube-prometheus-stack chart (key monitoring in values.yaml). logs layer and log shipper are deployed via a separate loki chart. log shipper collects pod logs from the Kubernetes node filesystem, not Docker socket.
| monitoring/alertmanager/alertmanager.yml | Alert routing and notification config |