Enterprise Deployment Guide¶
This guide walks you through deploying AKKO in a production or enterprise environment. It covers prerequisites, security hardening, TLS, DNS, and day-2 operations. Every command is copy-pasteable and every values key references the actual Helm chart.
1. Prerequisites¶
Kubernetes Cluster¶
| Requirement | Minimum | Recommended |
|---|---|---|
| Kubernetes version | 1.28+ | 1.30+ |
| Nodes | 3 (control plane HA) | 5+ (dedicated workload nodes) |
| CPU (total) | 12 cores | 24 cores |
| RAM (total) | 32 GB | 64 GB |
| Storage class | Any CSI provisioner | SSD-backed (e.g., gp3, csi-cinder-high-speed) |
| Ingress controller | Traefik (bundled) or existing | Traefik v3 (bundled) |
| Load balancer | MetalLB or cloud LB | Cloud-native LB |
AKKO is tested on: k3s, kubeadm, EKS, GKE, AKS, OVHcloud MKS, and OpenShift. Only standard Kubernetes APIs are used -- no distribution-specific CRDs.
Tools¶
Install these on your workstation before proceeding:
# Verify versions
kubectl version --client # 1.28+
helm version # 3.12+
docker version # 24+ (for building custom images)
DNS¶
You need a domain (e.g., akko.example.com) with either:
- A wildcard DNS record
*.akko.example.compointing to your load balancer IP, or - Individual A/CNAME records for each service (see Section 5)
TLS Certificates¶
Choose one of:
- cert-manager with Let's Encrypt (recommended for production)
- Manual wildcard certificate (e.g., from your CA)
- Self-signed (development only)
2. Planning Your Deployment¶
Deployment Profiles¶
AKKO supports three profiles, controlled by enabling or disabling sub-charts
in your values.yaml:
Minimal (~8 GB RAM, ~4 CPU)¶
Core analytics stack. Good for evaluation and small teams.
# Disable these in values-production.yaml
akko-ollama:
enabled: false
akko-litellm:
enabled: false
akko-lldap:
enabled: false
loki:
enabled: false
openmetadata:
enabled: false
akko-opensearch:
enabled: false
Services included: PostgreSQL, object storage, Polaris, Trino, Spark, Airflow, Superset, JupyterHub, Keycloak, OPA, Cockpit, Docs, Prometheus, Dashboards.
Standard (~20 GB RAM, ~12 CPU) -- Default¶
Full platform with AI and log aggregation. Recommended for most deployments.
Additionally enabled: Ollama (local LLM), LiteLLM (AI gateway), MLflow (experiment tracking), logs layer (logs), log shipper, Alertmanager.
Governance (~28 GB RAM, ~16 CPU)¶
Everything, including the data catalog and governance layer.
Additionally enabled: OpenMetadata Server, OpenSearch.
# Enable governance in values-production.yaml
openmetadata:
enabled: true
akko-opensearch:
enabled: true
Governance is resource-intensive
OpenMetadata + OpenSearch require an additional ~4 GB of RAM. Dedicate specific nodes or increase cluster capacity before enabling.
Storage Decisions¶
Every stateful service uses a PersistentVolumeClaim. Plan your storage:
| Service | Purpose | Recommended Size | Storage Type |
|---|---|---|---|
| PostgreSQL | Shared relational database | 50 Gi | SSD |
| object storage | S3 data lake (Iceberg tables) | 100 Gi+ | SSD |
| Prometheus | Metrics retention (15d default) | 50 Gi | SSD or HDD |
| logs layer | Log aggregation | 50 Gi | HDD |
| Ollama | LLM model files | 20 Gi | SSD |
| Dashboards | Dashboard persistence | 5 Gi | Any |
| OpenSearch | Governance catalog index | 30 Gi | SSD |
| Alertmanager | Alert state | 5 Gi | Any |
Set the storage class globally or per-service:
global:
storageClass: "gp3-ssd" # Cluster-wide default
# Or override per service
akko-postgres:
persistence:
storageClass: "fast-ssd"
size: 50Gi
Network Requirements¶
| Port | Protocol | Purpose |
|---|---|---|
| 80 | TCP | HTTP (redirects to 443) |
| 443 | TCP | HTTPS (all service ingresses) |
| 6443 | TCP | Kubernetes API (cluster admin only) |
All inter-service communication happens over the cluster network. No additional ports need to be exposed externally.
3. Changing Default Credentials (CRITICAL)¶
Do not deploy with default passwords
The default values.yaml leaves all passwords empty (auto-generated) or
uses dev defaults. In production, you must set every credential
explicitly in your values-production.yaml.
All credentials are set under global.auth in your values file. Generate
strong passwords using:
# Generate a random 32-character password
openssl rand -base64 32
# Generate a Fernet key for Airflow
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Generate a base64 cookie secret for OAuth2 Proxy
openssl rand -base64 32 | head -c 32 | base64
Complete Credentials Reference¶
Create your values-production.yaml with all credentials:
global:
auth:
# --- PostgreSQL ---
# Superuser password (used by init jobs)
postgresPassword: "<STRONG_PASSWORD_1>"
# Application user password (used by Trino, Airflow, Superset, etc.)
postgresAkkoPassword: "<STRONG_PASSWORD_2>"
# --- Object storage (S3) ---
storageRootUser: "akko-admin"
storageRootPassword: "<STRONG_PASSWORD_3>"
# --- Keycloak (Identity Provider) ---
keycloakAdminPassword: "<STRONG_PASSWORD_4>"
# --- Apache Polaris (Iceberg Catalog) ---
polarisRootSecret: "<STRONG_PASSWORD_5>"
# --- Dashboards ---
dashboardsAdminPassword: "<STRONG_PASSWORD_6>"
# --- Apache Superset (Data Visualization) ---
supersetAdminUser: "admin"
supersetAdminPassword: "<STRONG_PASSWORD_7>"
supersetSecretKey: "<RANDOM_HEX_32>"
# --- Apache Airflow (Pipeline Orchestration) ---
airflowAdminUser: "admin"
airflowAdminPassword: "<STRONG_PASSWORD_8>"
airflowFernetKey: "<FERNET_KEY>"
# --- OAuth2 Proxy (ForwardAuth for services without native OIDC) ---
oauth2ProxyCookieSecret: "<BASE64_32_BYTES>"
# --- Keycloak OIDC Client Secrets ---
# Each service that authenticates via Keycloak needs its own client secret.
# These must match the secrets configured in the Keycloak realm.
oauth2ProxyClientSecret: "<RANDOM_HEX_1>"
grafanaClientSecret: "<RANDOM_HEX_2>"
supersetClientSecret: "<RANDOM_HEX_3>"
jupyterhubClientSecret: "<RANDOM_HEX_4>"
trinoClientSecret: "<RANDOM_HEX_5>"
airflowClientSecret: "<RANDOM_HEX_6>"
openmetadataClientSecret: "<RANDOM_HEX_7>"
LiteLLM Master Key¶
The LiteLLM AI gateway has its own master key, configured separately:
JupyterHub Database URL¶
The JupyterHub hub connects directly to PostgreSQL. Update the password in the connection string:
jupyterhub:
hub:
db:
type: postgres
url: "postgresql+psycopg2://postgres:<POSTGRES_PASSWORD>@akko-postgresql:5432/jupyterhub"
Airflow Metadata Connection¶
Airflow also requires an explicit database password:
airflow:
data:
metadataConnection:
protocol: postgresql
host: "akko-postgresql"
port: 5432
db: airflow_metadata
user: postgres
pass: "<POSTGRES_PASSWORD>"
Superset Database Connection¶
superset:
supersetNode:
connections:
db_host: "akko-postgresql"
db_port: "5432"
db_user: "postgres"
db_pass: "<POSTGRES_PASSWORD>"
db_name: "superset"
Use Kubernetes Secrets for additional security
For maximum security, use existingSecret references instead of
plaintext passwords in values. Many sub-charts support this pattern.
See Section 9 for details.
4. TLS Certificate Configuration¶
Option A: cert-manager with Let's Encrypt (Recommended)¶
Install cert-manager if not already present:
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager --create-namespace \
--set crds.enabled=true
Create a ClusterIssuer:
# cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
ingressClassName: traefik
Configure AKKO to use it:
Option B: Manual Certificates (Wildcard or Per-Service)¶
If you have a wildcard certificate from your organization's CA:
# Create the TLS secret from your certificate files
kubectl create secret tls akko-tls \
--cert=wildcard.crt \
--key=wildcard.key \
-n akko
Configure AKKO to use the existing secret:
global:
tls:
enabled: true
issuer: "" # Empty = no cert-manager
secretName: akko-tls # Must match the secret name above
Option C: Self-Signed (Development Only)¶
For development or internal testing only:
Self-signed certificates cause browser warnings
Users will see certificate warnings. Some services (JupyterHub OIDC
callbacks, Superset OAuth) may require tls_verify: false settings,
which are insecure for production.
Traefik TLS Configuration¶
Traefik is the bundled ingress controller. TLS termination happens at Traefik, and backend services communicate over HTTP within the cluster.
traefik:
enabled: true
service:
type: LoadBalancer
annotations:
# Cloud-specific annotations (e.g., AWS NLB)
# service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
ports:
web:
exposedPort: 80
websecure:
exposedPort: 443
If you already have an ingress controller (e.g., nginx-ingress on RKE2), disable Traefik and set the ingress class:
traefik:
enabled: false
# Set the ingress class for all services
# (Each service's ingress.className will use this)
5. DNS Configuration¶
Required DNS Entries¶
Every AKKO service exposes an ingress hostname under your base domain. The
global.domain value determines the suffix.
This generates the following hostnames:
| Service | Hostname | Purpose |
|---|---|---|
| Cockpit | akko.example.com |
Platform portal (landing page) |
| Documentation | docs.akko.example.com |
Platform documentation |
| Keycloak | keycloak.akko.example.com |
Identity & access management |
| Trino | trino.akko.example.com |
Distributed SQL engine |
| Spark | spark.akko.example.com |
Spark master UI |
| Object storage console | minio.akko.example.com |
Object storage console |
| object storage API | minio-api.akko.example.com |
S3 API endpoint |
| Superset | superset.akko.example.com |
Data visualization |
| JupyterHub | jupyter.akko.example.com |
Notebook environment |
| Airflow | airflow.akko.example.com |
Pipeline orchestration |
| Dashboards | grafana.akko.example.com |
Monitoring dashboards |
| Polaris | polaris.akko.example.com |
Iceberg REST catalog |
| MLflow | mlflow.akko.example.com |
ML experiment tracking |
| Ollama | ollama.akko.example.com |
Local LLM API |
| LiteLLM | litellm.akko.example.com |
AI gateway |
| Prometheus | prometheus.akko.example.com |
Metrics |
| Alertmanager | alertmanager.akko.example.com |
Alert routing |
| OpenMetadata | catalog.akko.example.com |
Data catalog (governance) |
| directory service | lldap.akko.example.com |
LDAP admin (optional) |
| Traefik | traefik.akko.example.com |
Ingress dashboard |
Wildcard DNS (Recommended)¶
The simplest approach is a single wildcard record pointing to your load balancer:
Split-Horizon DNS¶
For environments where internal and external access use different IPs:
| Zone | Record | Target |
|---|---|---|
| External | *.akko.example.com |
Public load balancer IP |
| Internal | *.akko.example.com |
Internal/private load balancer IP |
This ensures that in-office users route directly to the internal network while external users go through the public endpoint.
6. Step-by-Step Installation¶
Step 1: Clone the Repository¶
Step 2: Create Your Production Values File¶
Start from the production defaults and customize:
Edit values-production.yaml with your settings. Here is a minimal
production template:
# =============================================================================
# AKKO — Production Values Template
# =============================================================================
global:
domain: akko.example.com
storageClass: "gp3-ssd" # Your cluster's SSD storage class
tls:
enabled: true
issuer: "letsencrypt-prod" # cert-manager ClusterIssuer
secretName: akko-tls
auth:
postgresPassword: "<CHANGE_ME>"
postgresAkkoPassword: "<CHANGE_ME>"
minioRootUser: "akko-admin"
minioRootPassword: "<CHANGE_ME>"
keycloakAdminPassword: "<CHANGE_ME>"
polarisRootSecret: "<CHANGE_ME>"
grafanaAdminPassword: "<CHANGE_ME>"
supersetAdminPassword: "<CHANGE_ME>"
supersetSecretKey: "<CHANGE_ME>"
airflowAdminPassword: "<CHANGE_ME>"
airflowFernetKey: "<CHANGE_ME>"
oauth2ProxyCookieSecret: "<CHANGE_ME>"
# OIDC client secrets (generate unique values)
oauth2ProxyClientSecret: "<CHANGE_ME>"
grafanaClientSecret: "<CHANGE_ME>"
supersetClientSecret: "<CHANGE_ME>"
jupyterhubClientSecret: "<CHANGE_ME>"
trinoClientSecret: "<CHANGE_ME>"
airflowClientSecret: "<CHANGE_ME>"
image:
registry: "registry.example.com/akko/" # Your private registry
pullPolicy: IfNotPresent
pullSecrets:
- name: registry-credentials
# --- Infrastructure ---
traefik:
enabled: true
service:
type: LoadBalancer
akko-postgres:
persistence:
size: 50Gi
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
akko-keycloak:
hostname: "https://keycloak.akko.example.com"
ingress:
host: "keycloak.akko.example.com"
# --- Data Lake ---
minio:
mode: distributed # HA: distributed with 4 replicas
replicas: 4
persistence:
size: 100Gi
consoleIngress:
hosts:
- minio.akko.example.com
ingress:
hosts:
- minio-api.akko.example.com
# --- Compute ---
trino:
server:
workers: 3 # Production: 3+ workers
config:
query:
maxMemory: "8GB"
maxMemoryPerNode: "2GB"
coordinator:
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "4"
memory: 8Gi
worker:
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "4"
memory: 8Gi
akko-spark:
worker:
replicaCount: 3 # Production: 3+ workers
memory: "4G"
cores: 4
# --- Orchestration ---
airflow:
executor: KubernetesExecutor # Production: KubernetesExecutor
config:
core:
executor: KubernetesExecutor
# --- AI / LLM ---
akko-ollama:
gpu:
enabled: true # Enable if GPU nodes available
# nvidia.com/gpu: 1
models:
- qwen2.5-coder:7b
- qwen2.5:3b
- nomic-embed-text
akko-litellm:
config:
general_settings:
master_key: "<CHANGE_ME>"
# --- Monitoring ---
monitoring:
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 50Gi
grafana:
persistence:
enabled: true
size: 10Gi
alertmanager:
enabled: true
loki:
enabled: true
loki:
persistence:
size: 50Gi
# --- Governance (optional) ---
openmetadata:
enabled: false # Set to true if needed
akko-opensearch:
enabled: false
Step 3: Build and Push Custom Images¶
AKKO includes 12 custom Docker images that must be built and pushed to your container registry:
# Set your registry and tag
export AKKO_REGISTRY="registry.example.com/akko"
export AKKO_TAG="2026.03"
# Build all 12 custom images
bash helm/scripts/build-images.sh
This builds and pushes:
| Image | Source | Purpose |
|---|---|---|
akko-postgres |
docker/postgres/Dockerfile |
PostgreSQL with PostGIS + pgvector + pgaudit |
akko-spark |
docker/spark/Dockerfile |
Spark with Iceberg + AWS + Connect JARs |
akko-notebook |
docker/jupyterhub/Dockerfile.notebook |
JupyterLab + R + Julia + Scala/Almond + Quarto + Mermaid + ML libs |
akko-mlflow |
docker/mlflow/Dockerfile |
MLflow tracking server with S3/PostgreSQL backends |
akko-cockpit |
branding/cockpit/Dockerfile |
Portal (nginx + 8 pages + health proxies) |
akko-trino |
docker/trino-ai-functions/Dockerfile |
Trino with AI functions plugin (14 scalar functions) |
akko-ai-service |
docker/ai-service/Dockerfile |
FastAPI AI service (sentiment, classify, summarize, etc.) |
akko-airflow |
docker/airflow/Dockerfile |
Airflow 3.1.7 + trino/mlflow/boto3/psycopg2/PyJWT |
akko-dbt |
docker/dbt/Dockerfile |
dbt Core + dbt-trino adapter (semantic layer) |
akko-mcp-trino |
docker/mcp-trino/Dockerfile |
MCP Server for Trino SQL + AI functions |
akko-mcp-openmetadata |
docker/mcp-openmetadata/Dockerfile |
MCP Server for OpenMetadata (catalog discovery, lineage) |
Update your values-production.yaml to reference the images in your registry:
global:
image:
registry: "registry.example.com/akko/"
akko-postgres:
image:
repository: registry.example.com/akko/akko-postgres
tag: "2026.03"
akko-spark:
image:
repository: registry.example.com/akko/akko-spark
tag: "2026.03"
jupyterhub:
singleuser:
image:
name: registry.example.com/akko/akko-notebook
tag: "2026.03"
akko-mlflow:
image:
repository: registry.example.com/akko/akko-mlflow
tag: "2026.03"
akko-cockpit:
image:
repository: registry.example.com/akko/akko-cockpit
tag: "2026.03"
If your registry requires authentication, create the pull secret:
kubectl create namespace akko
kubectl create secret docker-registry registry-credentials \
--docker-server=registry.example.com \
--docker-username=<USERNAME> \
--docker-password=<PASSWORD> \
-n akko
Step 4: Update Helm Dependencies¶
This downloads all 29 chart dependencies (Traefik, object storage, Trino, Airflow, Superset, JupyterHub, kube-prometheus-stack, logs layer, OpenMetadata, and AKKO custom sub-charts).
Step 5: Prepare the Keycloak Realm¶
AKKO ships with a pre-configured Keycloak realm containing OAuth2 clients for
every service. For production, you must update the client secrets to match
your values-production.yaml:
-
Copy the realm template:
-
In
realm-production.json, update every"secret"field for each client (jupyterhub,superset,airflow,grafana,trino,oauth2-proxy,openmetadata) to match the correspondingglobal.auth.*ClientSecretvalues in yourvalues-production.yaml. -
Update all URLs from
akko.localto your production domain (akko.example.com).
Step 6: Install AKKO¶
A single command deploys all services, init jobs, and seeds. Helm's
--wait and --wait-for-jobs flags ensure that databases are ready before
dependent services start:
helm upgrade --install akko helm/akko/ -n akko --create-namespace \
--wait --wait-for-jobs --timeout 20m \
-f values-production.yaml \
--set-file akko-keycloak.realm.data=realm-production.json
Realm file is required on every upgrade
The --set-file akko-keycloak.realm.data=... flag is mandatory on
every helm upgrade. Omitting it will cause Keycloak to lose its realm
configuration.
Step 7: Verify the Deployment¶
# Check all pods are running
kubectl get pods -n akko
# Check services have endpoints
kubectl get svc -n akko
# Check ingress resources
kubectl get ingress -n akko
# Check PersistentVolumeClaims are bound
kubectl get pvc -n akko
# Check for any failed pods
kubectl get pods -n akko --field-selector=status.phase!=Running,status.phase!=Succeeded
All pods should reach Running status within 5-10 minutes. The Ollama
model-pull init job may take longer depending on network speed (models are
several GB).
Step 8: First Login¶
- Open the Cockpit at
https://akko.example.com - Log in to Keycloak admin at
https://keycloak.example.comwith the admin credentials you set inglobal.auth.keycloakAdminPassword - Verify SSO by clicking any service link in the Cockpit -- you should be redirected to Keycloak for authentication
7. Post-Installation¶
Create Users in Keycloak¶
- Go to
https://keycloak.akko.example.com/admin - Select the akko realm
- Navigate to Users and click Create new user
- After creating the user, go to the Credentials tab to set a password
- Go to the Role mapping tab to assign a platform role
AKKO RBAC Roles¶
The Keycloak realm defines 5 platform roles that propagate to all services:
| Role | Trino | Airflow | Superset | Dashboards | Description |
|---|---|---|---|---|---|
akko-admin |
Full access | Admin | Admin | Admin | Platform administrator |
akko-engineer |
Create tables, write | User | Engineer | Editor | Data engineer |
akko-analyst |
Read-only | User | Analyst | Editor | Business analyst |
akko-user |
Read basic tables | User | Viewer | Viewer | Default authenticated user |
akko-viewer |
Dashboards only | Viewer | Viewer | Viewer | Executive / read-only |
Verify SSO for Each Service¶
After creating a test user, verify SSO works by logging in to each service:
https://federation.akko.example.com -- Trino Web UI
https://orchestrator.akko.example.com -- Airflow 3 API Server
https://bi.akko.example.com -- Superset dashboards
https://lab.akko.example.com -- JupyterHub notebooks
https://metrics.akko.example.com -- Dashboards
https://catalog.akko.example.com -- OpenMetadata (if enabled)
Each should redirect to Keycloak for authentication and then back to the service with the correct role.
Import Demo Data (Optional)¶
AKKO includes demo notebooks and DAGs that create sample datasets:
- Log in to JupyterHub as a user with the
akko-adminrole - Open the
akko-banking-demo.ipynbnotebook from thework/directory - Run all cells to create the banking demo dataset in the Iceberg lakehouse
The Airflow DAGs (akko_e2e_pipeline, akko_data_quality_dag,
akko_catalog_sync_dag) are pre-loaded and can be triggered from the
Airflow UI.
Configure Monitoring Alerts¶
Alertmanager is deployed with the monitoring stack. Configure notification receivers:
monitoring:
alertmanager:
config:
receivers:
- name: "team-email"
email_configs:
- to: "ops@example.com"
from: "akko-alerts@example.com"
smarthost: "smtp.example.com:587"
- name: "slack"
slack_configs:
- api_url: "https://hooks.slack.com/services/..."
channel: "#akko-alerts"
route:
receiver: "team-email"
routes:
- match:
severity: critical
receiver: "slack"
8. Day-2 Operations¶
Upgrading AKKO¶
To upgrade to a new AKKO version:
# Pull the latest chart
cd AKKO
git pull origin main
# Update dependencies
cd helm/akko
helm dependency update .
# Rebuild custom images (if Dockerfiles changed)
export AKKO_REGISTRY="registry.example.com/akko"
export AKKO_TAG="2026.04" # New version tag
bash helm/scripts/build-images.sh
# Apply the upgrade (realm file is mandatory!)
helm upgrade akko ./helm/akko \
--namespace akko \
-f ../../values-production.yaml \
--set-file akko-keycloak.realm.data=../../realm-production.json \
--timeout 15m
Rolling updates
Kubernetes performs rolling updates by default. Services with multiple replicas (Trino workers, Spark workers, object storage distributed) will be updated one pod at a time with zero downtime.
Backup and Restore¶
PostgreSQL¶
# Backup all databases
kubectl exec -n akko statefulset/akko-postgresql -- \
pg_dumpall -U postgres > akko-pg-backup-$(date +%Y%m%d).sql
# Restore
cat akko-pg-backup-20260313.sql | \
kubectl exec -i -n akko statefulset/akko-postgresql -- \
psql -U postgres
object storage (S3 Data Lake)¶
# Use any S3-compatible CLI (the AWS CLI is fine).
aws --endpoint-url https://storage.akko.example.com s3 \
sync s3://akko-warehouse /backup/akko-warehouse/
Keycloak Realm¶
# Export the realm (includes users if requested)
kubectl exec -n akko deploy/akko-akko-keycloak -- \
/opt/keycloak/bin/kc.sh export --realm akko --dir /tmp/export
kubectl cp akko/$(kubectl get pod -n akko -l app=akko-keycloak -o name | head -1 | cut -d/ -f2):/tmp/export ./keycloak-export/
Scaling Services¶
Scale individual services by updating replica counts in your values file:
# Scale Trino workers
trino:
server:
workers: 5
# Scale Spark workers
akko-spark:
worker:
replicaCount: 5
# Scale the object storage layer to distributed mode
storage:
mode: distributed
replicas: 4
Apply with:
helm upgrade akko ./helm/akko -n akko \
-f ../../values-production.yaml \
--set-file akko-keycloak.realm.data=../../realm-production.json
For on-demand scaling with Horizontal Pod Autoscalers (HPA), configure directly in Kubernetes:
Adding Users and Managing Access¶
All user management is centralized in Keycloak:
- Create user: Keycloak Admin Console > Users > Create
- Assign role: User > Role mapping > Assign
akko-admin,akko-engineer, etc. - LDAP federation: Keycloak > User federation > Add LDAP provider (connect to Active Directory or directory service)
- Group mapping: Create Keycloak groups, assign roles to groups, then add users to groups for bulk role assignment
Monitoring and Alerting¶
Access the monitoring stack:
- Dashboards:
https://metrics.akko.example.com-- pre-built dashboards for cluster, Trino, Spark, Airflow, PostgreSQL, object storage - Prometheus:
https://prometheus.akko.example.com-- raw metrics and PromQL queries - Alertmanager:
https://alertmanager.akko.example.com-- alert routing and silencing
Key metrics to monitor:
| Metric | Alert Threshold | Service |
|---|---|---|
| Pod restart count | > 3 in 10 min | All |
| PostgreSQL connection pool | > 80% | akko-postgresql |
| object storage disk usage | > 85% | akko-minio |
| Trino query failures | > 10% rate | akko-trino |
| Airflow DAG failure rate | > 5% | akko-airflow |
| Node memory pressure | > 90% | Cluster |
9. Security Hardening Checklist¶
Use this checklist before going live:
Credentials¶
- [ ] All
global.auth.*passwords are unique, strong, and not defaults - [ ] LiteLLM master key is set (
akko-litellm.config.general_settings.master_key) - [ ] All Keycloak OIDC client secrets match between realm JSON and values
- [ ] Keycloak admin password is strong and stored securely
- [ ] PostgreSQL volume is clean (no leftover dev passwords)
TLS¶
- [ ] TLS is enabled (
global.tls.enabled: true) - [ ] cert-manager is installed with a valid ClusterIssuer
- [ ] All ingress hostnames serve valid certificates
- [ ] Internal services do not expose HTTP externally
- [ ]
tls_skip_verify_insecuresettings are removed or set tofalse
Network¶
- [ ] Traefik LoadBalancer has appropriate firewall rules
- [ ] Only ports 80 and 443 are exposed externally
- [ ] Kubernetes API (6443) is not publicly accessible
- [ ] Network policies restrict pod-to-pod traffic (enable with
global.networkPolicies: true) - [ ] JupyterHub single-user pods have controlled egress
Pod Security¶
- [ ] Pods run as non-root (
global.podSecurityContext.runAsNonRoot: true-- default) - [ ] No containers run with
privileged: true - [ ] All containers drop
ALLcapabilities - [ ] Pod Security Standards are enforced at the namespace level:
Secrets Management¶
For enterprise deployments, consider external secrets management:
# Example: External Secrets Operator with HashiCorp Vault
# Install the operator, then create ExternalSecret resources
# that sync from Vault to Kubernetes Secrets
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: akko-postgresql
namespace: akko
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: akko-postgresql
data:
- secretKey: postgres-password
remoteRef:
key: akko/postgresql
property: password
Audit Logging¶
- [ ] Keycloak audit events are enabled (login, token exchange, admin actions)
- [ ] Kubernetes audit logging is configured at the cluster level
- [ ] Airflow task logs are persisted (not ephemeral)
- [ ] Dashboards and Superset track user login events
Image Security¶
- [ ] All images use pinned version tags (no
:latest) - [ ] Custom images are scanned for vulnerabilities before deployment
- [ ] Private registry uses TLS and authentication
- [ ]
global.image.pullSecretsis configured for private registries
Further Reading¶
- Kubernetes Deployment Guide -- quick start and platform-specific guides (k3s, OVHcloud, OpenShift)
- Architecture Overview -- how services connect
- RBAC Administration -- detailed role configuration
- Configuration Reference -- environment variables and tuning
- Troubleshooting Guide -- common issues and fixes
- Adding a Service -- extending the platform