Skip to content

Admin Starter Kit

This guide covers everything an AKKO platform administrator needs to harden, operate, and maintain a production deployment.


Production Hardening Checklist

TLS Certificates

Traefik handles TLS termination. For production, replace the self-signed dev certificates:

# helm/akko/values.yaml
traefik:
  tls:
    certificates:
      - certFile: /certs/tls.crt
        keyFile: /certs/tls.key

Never expose services without TLS in production

All inter-service communication should use TLS or stay within the cluster network.

Secrets Management

  • Store all secrets in Kubernetes Secrets (not in values.yaml)
  • Use --set or --set-file at deploy time for sensitive values
  • Rotate secrets quarterly: PostgreSQL passwords, Keycloak admin, MinIO credentials
  • Never commit .env files or traefik/certs/ to version control

Resource Limits

Every pod must have resource requests and limits defined:

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

Critical services resource recommendations:

Service CPU Request Memory Request Memory Limit
PostgreSQL (infra) 250m 512Mi 1Gi
PostgreSQL (data) 500m 1Gi 2Gi
Keycloak 250m 512Mi 1Gi
OpenMetadata 500m 2Gi 2.5Gi
Spark Connect 500m 1Gi 2Gi
Trino 500m 1Gi 2Gi
JupyterHub 250m 512Mi 1Gi

Pod Disruption Budgets (PDBs)

For high-availability, define PDBs on stateful services:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: akko-postgresql-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: akko-postgresql

RBAC Management

AKKO uses Keycloak for SSO with 5 predefined roles:

Role Description Access
admin Platform administrator Full access to all services, user management, configuration
engineer Data/platform engineer Airflow DAGs, Spark jobs, Trino queries, MLflow, JupyterHub
analyst Data analyst Superset dashboards, Trino read-only queries, JupyterHub notebooks
steward Data steward OpenMetadata catalog, data quality, glossary, lineage management
viewer Read-only viewer Cockpit portal, Superset dashboards (view only), Grafana dashboards

Managing Users

  1. Log in to Keycloak Admin Console at https://keycloak.<domain>/admin
  2. Navigate to Users > Add user
  3. Assign the appropriate role under Role Mappings
  4. For LDAP integration, configure the LLDAP federation under User Federation

OPA Fine-Grained Access

Trino queries are governed by OPA policies. Edit policies in helm/akko/charts/akko-opa/:

# Example: restrict analyst role to SELECT only
allow {
    input.action.operation == "SelectFromColumns"
    input.context.identity.groups[_] == "analyst"
}

Backup & Restore

PostgreSQL Backups

AKKO uses two PostgreSQL instances. Back up both:

# Infrastructure database (Keycloak, Airflow, Superset, etc.)
kubectl exec deploy/akko-postgresql -- pg_dumpall -U postgres > infra-backup.sql

# Data database (analytics, geospatial, RAG)
kubectl exec deploy/akko-postgresql-data -- pg_dumpall -U postgres > data-backup.sql

Restore:

kubectl exec -i deploy/akko-postgresql -- psql -U postgres < infra-backup.sql
kubectl exec -i deploy/akko-postgresql-data -- psql -U postgres < data-backup.sql

MinIO Backups

Use the MinIO Client (mc) to mirror buckets:

mc alias set akko http://minio:9000 $MINIO_ROOT_USER $MINIO_ROOT_PASSWORD
mc mirror akko/lakehouse /backup/lakehouse
mc mirror akko/mlflow /backup/mlflow

Keycloak Realm Export

kubectl exec deploy/akko-keycloak -- /opt/keycloak/bin/kc.sh export \
  --dir /tmp/export --realm akko
kubectl cp akko-keycloak-0:/tmp/export/akko-realm.json ./realm-backup.json

Monitoring Setup

Prometheus Alerts

Key alerts to configure in monitoring/prometheus/alerts.yml:

  • PostgreSQL down — both infra and data instances
  • MinIO disk usage > 80%
  • Keycloak authentication failures > 10/min
  • Spark executor OOM kills
  • Airflow DAG failure rate > 5%

Grafana Dashboards

Pre-configured dashboards are available:

  • AKKO Platform Overview — all service health at a glance
  • PostgreSQL Metrics — connections, query latency, replication lag
  • Spark Metrics — job duration, executor utilization
  • Trino Metrics — query throughput, memory usage

Access Grafana at https://grafana.<domain> with your Keycloak SSO credentials.


Upgrade Procedure

Always pass --set-file for the Keycloak realm

The realm file is mandatory on every helm upgrade. Omitting it resets Keycloak configuration.

# 1. Review changes
helm diff upgrade akko helm/akko/ \
  -f helm/examples/values-dev.yaml \
  --set-file akko-keycloak.realm.data=helm/examples/realm-akko-k3d.json

# 2. Upgrade (two-phase for safety)
# Phase 1: deploy without hooks (ensures DBs exist)
helm upgrade akko helm/akko/ \
  -f helm/examples/values-dev.yaml \
  --set-file akko-keycloak.realm.data=helm/examples/realm-akko-k3d.json \
  --no-hooks

# Phase 2: run hooks (migrations, init jobs)
helm upgrade akko helm/akko/ \
  -f helm/examples/values-dev.yaml \
  --set-file akko-keycloak.realm.data=helm/examples/realm-akko-k3d.json

# 3. Verify all pods are healthy
kubectl get pods -l app.kubernetes.io/instance=akko

Rollback

helm rollback akko <revision>

Check revision history with helm history akko.