Admin Starter Kit¶
This guide covers everything an AKKO platform administrator needs to harden, operate, and maintain a production deployment.
Production Hardening Checklist¶
TLS Certificates¶
Traefik handles TLS termination. For production, replace the self-signed dev certificates:
# helm/akko/values.yaml
traefik:
tls:
certificates:
- certFile: /certs/tls.crt
keyFile: /certs/tls.key
Never expose services without TLS in production
All inter-service communication should use TLS or stay within the cluster network.
Secrets Management¶
- Store all secrets in Kubernetes Secrets (not in
values.yaml) - Use
--setor--set-fileat deploy time for sensitive values - Rotate secrets quarterly: PostgreSQL passwords, Keycloak admin, MinIO credentials
- Never commit
.envfiles ortraefik/certs/to version control
Resource Limits¶
Every pod must have resource requests and limits defined:
Critical services resource recommendations:
| Service | CPU Request | Memory Request | Memory Limit |
|---|---|---|---|
| PostgreSQL (infra) | 250m | 512Mi | 1Gi |
| PostgreSQL (data) | 500m | 1Gi | 2Gi |
| Keycloak | 250m | 512Mi | 1Gi |
| OpenMetadata | 500m | 2Gi | 2.5Gi |
| Spark Connect | 500m | 1Gi | 2Gi |
| Trino | 500m | 1Gi | 2Gi |
| JupyterHub | 250m | 512Mi | 1Gi |
Pod Disruption Budgets (PDBs)¶
For high-availability, define PDBs on stateful services:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: akko-postgresql-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: akko-postgresql
RBAC Management¶
AKKO uses Keycloak for SSO with 5 predefined roles:
| Role | Description | Access |
|---|---|---|
admin |
Platform administrator | Full access to all services, user management, configuration |
engineer |
Data/platform engineer | Airflow DAGs, Spark jobs, Trino queries, MLflow, JupyterHub |
analyst |
Data analyst | Superset dashboards, Trino read-only queries, JupyterHub notebooks |
steward |
Data steward | OpenMetadata catalog, data quality, glossary, lineage management |
viewer |
Read-only viewer | Cockpit portal, Superset dashboards (view only), Grafana dashboards |
Managing Users¶
- Log in to Keycloak Admin Console at
https://keycloak.<domain>/admin - Navigate to Users > Add user
- Assign the appropriate role under Role Mappings
- For LDAP integration, configure the LLDAP federation under User Federation
OPA Fine-Grained Access¶
Trino queries are governed by OPA policies. Edit policies in helm/akko/charts/akko-opa/:
# Example: restrict analyst role to SELECT only
allow {
input.action.operation == "SelectFromColumns"
input.context.identity.groups[_] == "analyst"
}
Backup & Restore¶
PostgreSQL Backups¶
AKKO uses two PostgreSQL instances. Back up both:
# Infrastructure database (Keycloak, Airflow, Superset, etc.)
kubectl exec deploy/akko-postgresql -- pg_dumpall -U postgres > infra-backup.sql
# Data database (analytics, geospatial, RAG)
kubectl exec deploy/akko-postgresql-data -- pg_dumpall -U postgres > data-backup.sql
Restore:
kubectl exec -i deploy/akko-postgresql -- psql -U postgres < infra-backup.sql
kubectl exec -i deploy/akko-postgresql-data -- psql -U postgres < data-backup.sql
MinIO Backups¶
Use the MinIO Client (mc) to mirror buckets:
mc alias set akko http://minio:9000 $MINIO_ROOT_USER $MINIO_ROOT_PASSWORD
mc mirror akko/lakehouse /backup/lakehouse
mc mirror akko/mlflow /backup/mlflow
Keycloak Realm Export¶
kubectl exec deploy/akko-keycloak -- /opt/keycloak/bin/kc.sh export \
--dir /tmp/export --realm akko
kubectl cp akko-keycloak-0:/tmp/export/akko-realm.json ./realm-backup.json
Monitoring Setup¶
Prometheus Alerts¶
Key alerts to configure in monitoring/prometheus/alerts.yml:
- PostgreSQL down — both infra and data instances
- MinIO disk usage > 80%
- Keycloak authentication failures > 10/min
- Spark executor OOM kills
- Airflow DAG failure rate > 5%
Grafana Dashboards¶
Pre-configured dashboards are available:
- AKKO Platform Overview — all service health at a glance
- PostgreSQL Metrics — connections, query latency, replication lag
- Spark Metrics — job duration, executor utilization
- Trino Metrics — query throughput, memory usage
Access Grafana at https://grafana.<domain> with your Keycloak SSO credentials.
Upgrade Procedure¶
Always pass --set-file for the Keycloak realm
The realm file is mandatory on every helm upgrade. Omitting it resets Keycloak configuration.
# 1. Review changes
helm diff upgrade akko helm/akko/ \
-f helm/examples/values-dev.yaml \
--set-file akko-keycloak.realm.data=helm/examples/realm-akko-k3d.json
# 2. Upgrade (two-phase for safety)
# Phase 1: deploy without hooks (ensures DBs exist)
helm upgrade akko helm/akko/ \
-f helm/examples/values-dev.yaml \
--set-file akko-keycloak.realm.data=helm/examples/realm-akko-k3d.json \
--no-hooks
# Phase 2: run hooks (migrations, init jobs)
helm upgrade akko helm/akko/ \
-f helm/examples/values-dev.yaml \
--set-file akko-keycloak.realm.data=helm/examples/realm-akko-k3d.json
# 3. Verify all pods are healthy
kubectl get pods -l app.kubernetes.io/instance=akko
Rollback¶
Check revision history with helm history akko.