Skip to content

Monitoring Stack

Metrics, logs, dashboards, and alerting.

Dashboards URL https://metrics.akko.local
Prometheus URL https://prometheus.akko.local
Authentication Keycloak SSO (Dashboards)
Helm sub-charts monitoring (kube-prometheus-stack), loki (logs layer Stack)

Overview

AKKO includes a full observability stack that provides metrics collection, log aggregation, visualization, and alerting. All five monitoring services start with the core profile -- no extra flags needed.

                    grafana.akko.local
                          |
                     Traefik (TLS)
                          |
                    Dashboards (:3000)
                   /       |       \
          Prometheus    logs layer    Alertmanager
           (:9090)    (:3100)    (:9093)
              |          |
         scrape       log shipper
         targets      (:9080)
              |          |
      +-------+---+   Docker
      |       |   |   container
    object storage  JHub  ...   logs

Components

Prometheus

Metrics collection engine. Scrapes targets every 15 seconds and evaluates alert rules every 15 seconds.

Setting Value
URL https://prometheus.akko.local
Internal port 9090
Config file monitoring/prometheus/prometheus.yml
Alert rules monitoring/prometheus/rules/akko-alerts.yml
Scrape interval 15s

Active scrape targets:

Job Target Metrics Path
prometheus localhost:9090 /metrics
alertmanager akko-alertmanager:9093 /metrics
minio akko-minio:9000 /minio/v2/metrics/cluster
jupyterhub akko-jupyterhub:8000 /hub/metrics

Trino and Airflow metrics

Trino and Airflow scrape targets are currently disabled because their health endpoints return JSON, not Prometheus format. To enable them:

  • Trino: Deploy JMX Exporter as a Java agent in the Trino image (port 9483)
  • Airflow: Deploy a statsd_exporter sidecar or install apache-airflow-providers-statsd

Dashboards

Visualization and dashboard platform. Pre-configured with three datasources and four dashboards.

Setting Value
URL https://metrics.akko.local
Internal port 3000
Authentication Keycloak SSO
Datasource config monitoring/grafana/provisioning/datasources/datasource.yml
Dashboard config monitoring/grafana/provisioning/dashboards/dashboards.yml

Pre-configured datasources:

Datasource Type URL Default
Prometheus prometheus http://akko-prometheus-server:9090 Yes
logs layer loki http://akko-loki:3100 No
Alertmanager alertmanager http://akko-alertmanager:9093 No

Pre-built dashboards:

Dashboard File Description
AKKO Overview akko-overview.json Platform-wide health and resource usage
AKKO object storage akko-minio.json object storage metrics (objects, bandwidth)
AKKO Trino akko-trino.json Trino query metrics (requires JMX Exporter)
AKKO JupyterHub akko-jupyterhub.json JupyterHub user and spawner metrics

logs layer

Log aggregation engine. Receives logs from log shipper and makes them queryable through Dashboards using LogQL.

Setting Value
Internal port 3100
Config file monitoring/loki/loki-config.yml
Max query lines 1000

log shipper

Log collection agent. Scrapes Docker container logs and forwards them to logs layer with labels for service name, container ID, and other metadata.

Setting Value
Internal port 9080
Config file monitoring/promtail/promtail-config.yml
Source Docker container logs

Alertmanager

Alert routing and notification engine. Receives alerts from Prometheus and routes them based on severity.

Setting Value
Internal port 9093
Config file monitoring/alertmanager/alertmanager.yml
Default receiver Logging only (stdout)

Alert Rules

AKKO ships with pre-configured alert rules in monitoring/prometheus/rules/akko-alerts.yml:

Alert Severity Condition For
ServiceDown Critical up == 0 2 min
HighLatency Warning Response time > 2s 5 min
HighMemoryUsage Warning Container memory > 90% of limit 2 min
HighCPUUsage Warning Container CPU > 80% 5 min
DiskSpaceRunningLow Warning Filesystem > 85% used 5 min
PostgresConnectionsHigh Warning Connections > 80% of max 5 min
StorageBucketEmpty Info Bucket has 0 objects 10 min

Alert routing is configured with severity-based grouping:

  • Critical alerts: 5-second group wait, 15-minute repeat interval
  • All other alerts: 10-second group wait, 1-hour repeat interval
  • Inhibition: A critical alert suppresses warnings for the same service

Accessing Dashboards

  1. Open https://metrics.akko.local
  2. Log in with Keycloak SSO (e.g., alice with the admin role)
  3. Navigate to Dashboards in the left sidebar to view pre-built dashboards
  4. Use Explore to query Prometheus metrics or logs layer logs directly

Querying Logs in Dashboards

To view logs from a specific AKKO service:

  1. Go to Explore (compass icon in the left sidebar)
  2. Select the logs layer datasource
  3. Use a LogQL query:
{container_name="akko-trino"}

Filter by log level:

{container_name="akko-airflow"} |= "ERROR"

Querying Metrics

Select the Prometheus datasource in Explore and use PromQL:

# Service availability (1 = up, 0 = down)
up

# JupyterHub active users
jupyterhub_running_servers

# object storage requests per second
rate(minio_http_requests_total[5m])

Adding Custom Alerts

Create a new YAML file in monitoring/prometheus/rules/:

monitoring/prometheus/rules/my-alerts.yml
groups:
  - name: my-custom-alerts
    rules:
      - alert: SlowTrinoQueries
        expr: trino_query_execution_time_seconds > 30
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Slow Trino queries detected"
          description: "Queries taking longer than 30 seconds for 5+ minutes."

Prometheus automatically loads all *.yml files from the rules directory.


Adding Custom Dashboards

Place JSON dashboard files in monitoring/grafana/dashboards/. Dashboards's provisioning system detects new files and loads them automatically.

To export an existing dashboard:

  1. Open the dashboard in Dashboards
  2. Click the Share icon (top bar)
  3. Select Export > Save to file
  4. Place the JSON file in monitoring/grafana/dashboards/

Configuring Notifications

The default Alertmanager configuration uses a logging-only receiver (alerts appear in stdout). To enable real notifications, edit monitoring/alertmanager/alertmanager.yml:

receivers:
  - name: default
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
        channel: '#akko-alerts'
        send_resolved: true
receivers:
  - name: default
    webhook_configs:
      - url: 'http://your-webhook-endpoint:5001/'
        send_resolved: true
global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'akko-alerts@example.com'
  smtp_auth_username: 'user'
  smtp_auth_password: 'pass'

receivers:
  - name: default
    email_configs:
      - to: 'team@example.com'
        send_resolved: true

After editing, restart Alertmanager:

kubectl rollout restart deploy/akko-alertmanager -n akko

Configuration Files Reference

File Purpose
monitoring/prometheus/prometheus.yml Prometheus scrape targets and global config
monitoring/prometheus/rules/akko-alerts.yml Alert rule definitions
monitoring/grafana/provisioning/datasources/datasource.yml Datasource auto-provisioning
monitoring/grafana/provisioning/dashboards/dashboards.yml Dashboard provisioning config
monitoring/grafana/dashboards/*.json Pre-built Dashboard definitions
monitoring/loki/loki-config.yml logs layer storage and retention config
monitoring/promtail/promtail-config.yml log shipper log scraping config

Kubernetes Deployment

In Helm/k8s mode, Prometheus, Dashboards, and Alertmanager are deployed via the kube-prometheus-stack chart (key monitoring in values.yaml). logs layer and log shipper are deployed via a separate loki chart. log shipper collects pod logs from the Kubernetes node filesystem, not Docker socket.

| monitoring/alertmanager/alertmanager.yml | Alert routing and notification config |