Skip to content

Enterprise Deployment Guide

This guide walks you through deploying AKKO in a production or enterprise environment. It covers prerequisites, security hardening, TLS, DNS, and day-2 operations. Every command is copy-pasteable and every values key references the actual Helm chart.


1. Prerequisites

Kubernetes Cluster

Requirement Minimum Recommended
Kubernetes version 1.28+ 1.30+
Nodes 3 (control plane HA) 5+ (dedicated workload nodes)
CPU (total) 12 cores 24 cores
RAM (total) 32 GB 64 GB
Storage class Any CSI provisioner SSD-backed (e.g., gp3, csi-cinder-high-speed)
Ingress controller Traefik (bundled) or existing Traefik v3 (bundled)
Load balancer MetalLB or cloud LB Cloud-native LB

AKKO is tested on: k3s, kubeadm, EKS, GKE, AKS, OVHcloud MKS, and OpenShift. Only standard Kubernetes APIs are used -- no distribution-specific CRDs.

Tools

Install these on your workstation before proceeding:

# Verify versions
kubectl version --client    # 1.28+
helm version                # 3.12+
docker version              # 24+ (for building custom images)

DNS

You need a domain (e.g., akko.example.com) with either:

  • A wildcard DNS record *.akko.example.com pointing to your load balancer IP, or
  • Individual A/CNAME records for each service (see Section 5)

TLS Certificates

Choose one of:

  • cert-manager with Let's Encrypt (recommended for production)
  • Manual wildcard certificate (e.g., from your CA)
  • Self-signed (development only)

2. Planning Your Deployment

Deployment Profiles

AKKO supports three profiles, controlled by enabling or disabling sub-charts in your values.yaml:

Minimal (~8 GB RAM, ~4 CPU)

Core analytics stack. Good for evaluation and small teams.

# Disable these in values-production.yaml
akko-ollama:
  enabled: false
akko-litellm:
  enabled: false
akko-lldap:
  enabled: false
loki:
  enabled: false
openmetadata:
  enabled: false
akko-opensearch:
  enabled: false

Services included: PostgreSQL, object storage, Polaris, Trino, Spark, Airflow, Superset, JupyterHub, Keycloak, OPA, Cockpit, Docs, Prometheus, Dashboards.

Standard (~20 GB RAM, ~12 CPU) -- Default

Full platform with AI and log aggregation. Recommended for most deployments.

Additionally enabled: Ollama (local LLM), LiteLLM (AI gateway), MLflow (experiment tracking), logs layer (logs), log shipper, Alertmanager.

Governance (~28 GB RAM, ~16 CPU)

Everything, including the data catalog and governance layer.

Additionally enabled: OpenMetadata Server, OpenSearch.

# Enable governance in values-production.yaml
openmetadata:
  enabled: true
akko-opensearch:
  enabled: true

Governance is resource-intensive

OpenMetadata + OpenSearch require an additional ~4 GB of RAM. Dedicate specific nodes or increase cluster capacity before enabling.

Storage Decisions

Every stateful service uses a PersistentVolumeClaim. Plan your storage:

Service Purpose Recommended Size Storage Type
PostgreSQL Shared relational database 50 Gi SSD
object storage S3 data lake (Iceberg tables) 100 Gi+ SSD
Prometheus Metrics retention (15d default) 50 Gi SSD or HDD
logs layer Log aggregation 50 Gi HDD
Ollama LLM model files 20 Gi SSD
Dashboards Dashboard persistence 5 Gi Any
OpenSearch Governance catalog index 30 Gi SSD
Alertmanager Alert state 5 Gi Any

Set the storage class globally or per-service:

global:
  storageClass: "gp3-ssd"    # Cluster-wide default

# Or override per service
akko-postgres:
  persistence:
    storageClass: "fast-ssd"
    size: 50Gi

Network Requirements

Port Protocol Purpose
80 TCP HTTP (redirects to 443)
443 TCP HTTPS (all service ingresses)
6443 TCP Kubernetes API (cluster admin only)

All inter-service communication happens over the cluster network. No additional ports need to be exposed externally.


3. Changing Default Credentials (CRITICAL)

Do not deploy with default passwords

The default values.yaml leaves all passwords empty (auto-generated) or uses dev defaults. In production, you must set every credential explicitly in your values-production.yaml.

All credentials are set under global.auth in your values file. Generate strong passwords using:

# Generate a random 32-character password
openssl rand -base64 32

# Generate a Fernet key for Airflow
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# Generate a base64 cookie secret for OAuth2 Proxy
openssl rand -base64 32 | head -c 32 | base64

Complete Credentials Reference

Create your values-production.yaml with all credentials:

global:
  auth:
    # --- PostgreSQL ---
    # Superuser password (used by init jobs)
    postgresPassword: "<STRONG_PASSWORD_1>"

    # Application user password (used by Trino, Airflow, Superset, etc.)
    postgresAkkoPassword: "<STRONG_PASSWORD_2>"

    # --- Object storage (S3) ---
    storageRootUser: "akko-admin"
    storageRootPassword: "<STRONG_PASSWORD_3>"

    # --- Keycloak (Identity Provider) ---
    keycloakAdminPassword: "<STRONG_PASSWORD_4>"

    # --- Apache Polaris (Iceberg Catalog) ---
    polarisRootSecret: "<STRONG_PASSWORD_5>"

    # --- Dashboards ---
    dashboardsAdminPassword: "<STRONG_PASSWORD_6>"

    # --- Apache Superset (Data Visualization) ---
    supersetAdminUser: "admin"
    supersetAdminPassword: "<STRONG_PASSWORD_7>"
    supersetSecretKey: "<RANDOM_HEX_32>"

    # --- Apache Airflow (Pipeline Orchestration) ---
    airflowAdminUser: "admin"
    airflowAdminPassword: "<STRONG_PASSWORD_8>"
    airflowFernetKey: "<FERNET_KEY>"

    # --- OAuth2 Proxy (ForwardAuth for services without native OIDC) ---
    oauth2ProxyCookieSecret: "<BASE64_32_BYTES>"

    # --- Keycloak OIDC Client Secrets ---
    # Each service that authenticates via Keycloak needs its own client secret.
    # These must match the secrets configured in the Keycloak realm.
    oauth2ProxyClientSecret: "<RANDOM_HEX_1>"
    grafanaClientSecret: "<RANDOM_HEX_2>"
    supersetClientSecret: "<RANDOM_HEX_3>"
    jupyterhubClientSecret: "<RANDOM_HEX_4>"
    trinoClientSecret: "<RANDOM_HEX_5>"
    airflowClientSecret: "<RANDOM_HEX_6>"
    openmetadataClientSecret: "<RANDOM_HEX_7>"

LiteLLM Master Key

The LiteLLM AI gateway has its own master key, configured separately:

akko-litellm:
  config:
    general_settings:
      master_key: "<STRONG_LITELLM_KEY>"

JupyterHub Database URL

The JupyterHub hub connects directly to PostgreSQL. Update the password in the connection string:

jupyterhub:
  hub:
    db:
      type: postgres
      url: "postgresql+psycopg2://postgres:<POSTGRES_PASSWORD>@akko-postgresql:5432/jupyterhub"

Airflow Metadata Connection

Airflow also requires an explicit database password:

airflow:
  data:
    metadataConnection:
      protocol: postgresql
      host: "akko-postgresql"
      port: 5432
      db: airflow_metadata
      user: postgres
      pass: "<POSTGRES_PASSWORD>"

Superset Database Connection

superset:
  supersetNode:
    connections:
      db_host: "akko-postgresql"
      db_port: "5432"
      db_user: "postgres"
      db_pass: "<POSTGRES_PASSWORD>"
      db_name: "superset"

Use Kubernetes Secrets for additional security

For maximum security, use existingSecret references instead of plaintext passwords in values. Many sub-charts support this pattern. See Section 9 for details.


4. TLS Certificate Configuration

Install cert-manager if not already present:

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set crds.enabled=true

Create a ClusterIssuer:

# cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - http01:
          ingress:
            ingressClassName: traefik
kubectl apply -f cluster-issuer.yaml

Configure AKKO to use it:

global:
  tls:
    enabled: true
    issuer: "letsencrypt-prod"
    secretName: akko-tls

Option B: Manual Certificates (Wildcard or Per-Service)

If you have a wildcard certificate from your organization's CA:

# Create the TLS secret from your certificate files
kubectl create secret tls akko-tls \
  --cert=wildcard.crt \
  --key=wildcard.key \
  -n akko

Configure AKKO to use the existing secret:

global:
  tls:
    enabled: true
    issuer: ""              # Empty = no cert-manager
    secretName: akko-tls    # Must match the secret name above

Option C: Self-Signed (Development Only)

For development or internal testing only:

global:
  tls:
    enabled: true
    selfSigned: true
    issuer: ""
    secretName: akko-tls

Self-signed certificates cause browser warnings

Users will see certificate warnings. Some services (JupyterHub OIDC callbacks, Superset OAuth) may require tls_verify: false settings, which are insecure for production.

Traefik TLS Configuration

Traefik is the bundled ingress controller. TLS termination happens at Traefik, and backend services communicate over HTTP within the cluster.

traefik:
  enabled: true
  service:
    type: LoadBalancer
    annotations:
      # Cloud-specific annotations (e.g., AWS NLB)
      # service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
  ports:
    web:
      exposedPort: 80
    websecure:
      exposedPort: 443

If you already have an ingress controller (e.g., nginx-ingress on RKE2), disable Traefik and set the ingress class:

traefik:
  enabled: false

# Set the ingress class for all services
# (Each service's ingress.className will use this)

5. DNS Configuration

Required DNS Entries

Every AKKO service exposes an ingress hostname under your base domain. The global.domain value determines the suffix.

global:
  domain: akko.example.com    # Your production domain

This generates the following hostnames:

Service Hostname Purpose
Cockpit akko.example.com Platform portal (landing page)
Documentation docs.akko.example.com Platform documentation
Keycloak keycloak.akko.example.com Identity & access management
Trino trino.akko.example.com Distributed SQL engine
Spark spark.akko.example.com Spark master UI
Object storage console minio.akko.example.com Object storage console
object storage API minio-api.akko.example.com S3 API endpoint
Superset superset.akko.example.com Data visualization
JupyterHub jupyter.akko.example.com Notebook environment
Airflow airflow.akko.example.com Pipeline orchestration
Dashboards grafana.akko.example.com Monitoring dashboards
Polaris polaris.akko.example.com Iceberg REST catalog
MLflow mlflow.akko.example.com ML experiment tracking
Ollama ollama.akko.example.com Local LLM API
LiteLLM litellm.akko.example.com AI gateway
Prometheus prometheus.akko.example.com Metrics
Alertmanager alertmanager.akko.example.com Alert routing
OpenMetadata catalog.akko.example.com Data catalog (governance)
directory service lldap.akko.example.com LDAP admin (optional)
Traefik traefik.akko.example.com Ingress dashboard

The simplest approach is a single wildcard record pointing to your load balancer:

*.akko.example.com    A    <LOAD_BALANCER_IP>
akko.example.com      A    <LOAD_BALANCER_IP>

Split-Horizon DNS

For environments where internal and external access use different IPs:

Zone Record Target
External *.akko.example.com Public load balancer IP
Internal *.akko.example.com Internal/private load balancer IP

This ensures that in-office users route directly to the internal network while external users go through the public endpoint.


6. Step-by-Step Installation

Step 1: Clone the Repository

git clone https://github.com/AKKO-p/AKKO.git
cd AKKO

Step 2: Create Your Production Values File

Start from the production defaults and customize:

cp helm/akko/values.yaml values-production.yaml

Edit values-production.yaml with your settings. Here is a minimal production template:

# =============================================================================
# AKKO — Production Values Template
# =============================================================================

global:
  domain: akko.example.com
  storageClass: "gp3-ssd"         # Your cluster's SSD storage class

  tls:
    enabled: true
    issuer: "letsencrypt-prod"    # cert-manager ClusterIssuer
    secretName: akko-tls

  auth:
    postgresPassword: "<CHANGE_ME>"
    postgresAkkoPassword: "<CHANGE_ME>"
    minioRootUser: "akko-admin"
    minioRootPassword: "<CHANGE_ME>"
    keycloakAdminPassword: "<CHANGE_ME>"
    polarisRootSecret: "<CHANGE_ME>"
    grafanaAdminPassword: "<CHANGE_ME>"
    supersetAdminPassword: "<CHANGE_ME>"
    supersetSecretKey: "<CHANGE_ME>"
    airflowAdminPassword: "<CHANGE_ME>"
    airflowFernetKey: "<CHANGE_ME>"
    oauth2ProxyCookieSecret: "<CHANGE_ME>"
    # OIDC client secrets (generate unique values)
    oauth2ProxyClientSecret: "<CHANGE_ME>"
    grafanaClientSecret: "<CHANGE_ME>"
    supersetClientSecret: "<CHANGE_ME>"
    jupyterhubClientSecret: "<CHANGE_ME>"
    trinoClientSecret: "<CHANGE_ME>"
    airflowClientSecret: "<CHANGE_ME>"

  image:
    registry: "registry.example.com/akko/"  # Your private registry
    pullPolicy: IfNotPresent
    pullSecrets:
      - name: registry-credentials

# --- Infrastructure ---
traefik:
  enabled: true
  service:
    type: LoadBalancer

akko-postgres:
  persistence:
    size: 50Gi
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: "2"
      memory: 4Gi

akko-keycloak:
  hostname: "https://keycloak.akko.example.com"
  ingress:
    host: "keycloak.akko.example.com"

# --- Data Lake ---
minio:
  mode: distributed              # HA: distributed with 4 replicas
  replicas: 4
  persistence:
    size: 100Gi
  consoleIngress:
    hosts:
      - minio.akko.example.com
  ingress:
    hosts:
      - minio-api.akko.example.com

# --- Compute ---
trino:
  server:
    workers: 3                   # Production: 3+ workers
    config:
      query:
        maxMemory: "8GB"
        maxMemoryPerNode: "2GB"
  coordinator:
    resources:
      requests:
        cpu: "1"
        memory: 2Gi
      limits:
        cpu: "4"
        memory: 8Gi
  worker:
    resources:
      requests:
        cpu: "1"
        memory: 2Gi
      limits:
        cpu: "4"
        memory: 8Gi

akko-spark:
  worker:
    replicaCount: 3              # Production: 3+ workers
    memory: "4G"
    cores: 4

# --- Orchestration ---
airflow:
  executor: KubernetesExecutor   # Production: KubernetesExecutor
  config:
    core:
      executor: KubernetesExecutor

# --- AI / LLM ---
akko-ollama:
  gpu:
    enabled: true                # Enable if GPU nodes available
    # nvidia.com/gpu: 1
  models:
    - qwen2.5-coder:7b
    - qwen2.5:3b
    - nomic-embed-text

akko-litellm:
  config:
    general_settings:
      master_key: "<CHANGE_ME>"

# --- Monitoring ---
monitoring:
  prometheus:
    prometheusSpec:
      retention: 30d
      storageSpec:
        volumeClaimTemplate:
          spec:
            resources:
              requests:
                storage: 50Gi
  grafana:
    persistence:
      enabled: true
      size: 10Gi
  alertmanager:
    enabled: true

loki:
  enabled: true
  loki:
    persistence:
      size: 50Gi

# --- Governance (optional) ---
openmetadata:
  enabled: false                 # Set to true if needed
akko-opensearch:
  enabled: false

Step 3: Build and Push Custom Images

AKKO includes 12 custom Docker images that must be built and pushed to your container registry:

# Set your registry and tag
export AKKO_REGISTRY="registry.example.com/akko"
export AKKO_TAG="2026.03"

# Build all 12 custom images
bash helm/scripts/build-images.sh

This builds and pushes:

Image Source Purpose
akko-postgres docker/postgres/Dockerfile PostgreSQL with PostGIS + pgvector + pgaudit
akko-spark docker/spark/Dockerfile Spark with Iceberg + AWS + Connect JARs
akko-notebook docker/jupyterhub/Dockerfile.notebook JupyterLab + R + Julia + Scala/Almond + Quarto + Mermaid + ML libs
akko-mlflow docker/mlflow/Dockerfile MLflow tracking server with S3/PostgreSQL backends
akko-cockpit branding/cockpit/Dockerfile Portal (nginx + 8 pages + health proxies)
akko-trino docker/trino-ai-functions/Dockerfile Trino with AI functions plugin (14 scalar functions)
akko-ai-service docker/ai-service/Dockerfile FastAPI AI service (sentiment, classify, summarize, etc.)
akko-airflow docker/airflow/Dockerfile Airflow 3.1.7 + trino/mlflow/boto3/psycopg2/PyJWT
akko-dbt docker/dbt/Dockerfile dbt Core + dbt-trino adapter (semantic layer)
akko-mcp-trino docker/mcp-trino/Dockerfile MCP Server for Trino SQL + AI functions
akko-mcp-openmetadata docker/mcp-openmetadata/Dockerfile MCP Server for OpenMetadata (catalog discovery, lineage)

Update your values-production.yaml to reference the images in your registry:

global:
  image:
    registry: "registry.example.com/akko/"

akko-postgres:
  image:
    repository: registry.example.com/akko/akko-postgres
    tag: "2026.03"

akko-spark:
  image:
    repository: registry.example.com/akko/akko-spark
    tag: "2026.03"

jupyterhub:
  singleuser:
    image:
      name: registry.example.com/akko/akko-notebook
      tag: "2026.03"

akko-mlflow:
  image:
    repository: registry.example.com/akko/akko-mlflow
    tag: "2026.03"

akko-cockpit:
  image:
    repository: registry.example.com/akko/akko-cockpit
    tag: "2026.03"

If your registry requires authentication, create the pull secret:

kubectl create namespace akko

kubectl create secret docker-registry registry-credentials \
  --docker-server=registry.example.com \
  --docker-username=<USERNAME> \
  --docker-password=<PASSWORD> \
  -n akko

Step 4: Update Helm Dependencies

cd helm/akko
helm dependency update .

This downloads all 29 chart dependencies (Traefik, object storage, Trino, Airflow, Superset, JupyterHub, kube-prometheus-stack, logs layer, OpenMetadata, and AKKO custom sub-charts).

Step 5: Prepare the Keycloak Realm

AKKO ships with a pre-configured Keycloak realm containing OAuth2 clients for every service. For production, you must update the client secrets to match your values-production.yaml:

  1. Copy the realm template:

    cp helm/examples/realm-akko-k3d.json realm-production.json
    

  2. In realm-production.json, update every "secret" field for each client (jupyterhub, superset, airflow, grafana, trino, oauth2-proxy, openmetadata) to match the corresponding global.auth.*ClientSecret values in your values-production.yaml.

  3. Update all URLs from akko.local to your production domain (akko.example.com).

Step 6: Install AKKO

A single command deploys all services, init jobs, and seeds. Helm's --wait and --wait-for-jobs flags ensure that databases are ready before dependent services start:

helm upgrade --install akko helm/akko/ -n akko --create-namespace \
  --wait --wait-for-jobs --timeout 20m \
  -f values-production.yaml \
  --set-file akko-keycloak.realm.data=realm-production.json

Realm file is required on every upgrade

The --set-file akko-keycloak.realm.data=... flag is mandatory on every helm upgrade. Omitting it will cause Keycloak to lose its realm configuration.

Step 7: Verify the Deployment

# Check all pods are running
kubectl get pods -n akko

# Check services have endpoints
kubectl get svc -n akko

# Check ingress resources
kubectl get ingress -n akko

# Check PersistentVolumeClaims are bound
kubectl get pvc -n akko

# Check for any failed pods
kubectl get pods -n akko --field-selector=status.phase!=Running,status.phase!=Succeeded

All pods should reach Running status within 5-10 minutes. The Ollama model-pull init job may take longer depending on network speed (models are several GB).

Step 8: First Login

  1. Open the Cockpit at https://akko.example.com
  2. Log in to Keycloak admin at https://keycloak.example.com with the admin credentials you set in global.auth.keycloakAdminPassword
  3. Verify SSO by clicking any service link in the Cockpit -- you should be redirected to Keycloak for authentication

7. Post-Installation

Create Users in Keycloak

  1. Go to https://keycloak.akko.example.com/admin
  2. Select the akko realm
  3. Navigate to Users and click Create new user
  4. After creating the user, go to the Credentials tab to set a password
  5. Go to the Role mapping tab to assign a platform role

AKKO RBAC Roles

The Keycloak realm defines 5 platform roles that propagate to all services:

Role Trino Airflow Superset Dashboards Description
akko-admin Full access Admin Admin Admin Platform administrator
akko-engineer Create tables, write User Engineer Editor Data engineer
akko-analyst Read-only User Analyst Editor Business analyst
akko-user Read basic tables User Viewer Viewer Default authenticated user
akko-viewer Dashboards only Viewer Viewer Viewer Executive / read-only

Verify SSO for Each Service

After creating a test user, verify SSO works by logging in to each service:

https://federation.akko.example.com -- Trino Web UI
https://orchestrator.akko.example.com -- Airflow 3 API Server
https://bi.akko.example.com         -- Superset dashboards
https://lab.akko.example.com        -- JupyterHub notebooks
https://metrics.akko.example.com    -- Dashboards
https://catalog.akko.example.com    -- OpenMetadata (if enabled)

Each should redirect to Keycloak for authentication and then back to the service with the correct role.

Import Demo Data (Optional)

AKKO includes demo notebooks and DAGs that create sample datasets:

  1. Log in to JupyterHub as a user with the akko-admin role
  2. Open the akko-banking-demo.ipynb notebook from the work/ directory
  3. Run all cells to create the banking demo dataset in the Iceberg lakehouse

The Airflow DAGs (akko_e2e_pipeline, akko_data_quality_dag, akko_catalog_sync_dag) are pre-loaded and can be triggered from the Airflow UI.

Configure Monitoring Alerts

Alertmanager is deployed with the monitoring stack. Configure notification receivers:

monitoring:
  alertmanager:
    config:
      receivers:
        - name: "team-email"
          email_configs:
            - to: "ops@example.com"
              from: "akko-alerts@example.com"
              smarthost: "smtp.example.com:587"
        - name: "slack"
          slack_configs:
            - api_url: "https://hooks.slack.com/services/..."
              channel: "#akko-alerts"
      route:
        receiver: "team-email"
        routes:
          - match:
              severity: critical
            receiver: "slack"

8. Day-2 Operations

Upgrading AKKO

To upgrade to a new AKKO version:

# Pull the latest chart
cd AKKO
git pull origin main

# Update dependencies
cd helm/akko
helm dependency update .

# Rebuild custom images (if Dockerfiles changed)
export AKKO_REGISTRY="registry.example.com/akko"
export AKKO_TAG="2026.04"    # New version tag
bash helm/scripts/build-images.sh

# Apply the upgrade (realm file is mandatory!)
helm upgrade akko ./helm/akko \
  --namespace akko \
  -f ../../values-production.yaml \
  --set-file akko-keycloak.realm.data=../../realm-production.json \
  --timeout 15m

Rolling updates

Kubernetes performs rolling updates by default. Services with multiple replicas (Trino workers, Spark workers, object storage distributed) will be updated one pod at a time with zero downtime.

Backup and Restore

PostgreSQL

# Backup all databases
kubectl exec -n akko statefulset/akko-postgresql -- \
  pg_dumpall -U postgres > akko-pg-backup-$(date +%Y%m%d).sql

# Restore
cat akko-pg-backup-20260313.sql | \
  kubectl exec -i -n akko statefulset/akko-postgresql -- \
  psql -U postgres

object storage (S3 Data Lake)

# Use any S3-compatible CLI (the AWS CLI is fine).
aws --endpoint-url https://storage.akko.example.com s3 \
    sync s3://akko-warehouse /backup/akko-warehouse/

Keycloak Realm

# Export the realm (includes users if requested)
kubectl exec -n akko deploy/akko-akko-keycloak -- \
  /opt/keycloak/bin/kc.sh export --realm akko --dir /tmp/export
kubectl cp akko/$(kubectl get pod -n akko -l app=akko-keycloak -o name | head -1 | cut -d/ -f2):/tmp/export ./keycloak-export/

Scaling Services

Scale individual services by updating replica counts in your values file:

# Scale Trino workers
trino:
  server:
    workers: 5

# Scale Spark workers
akko-spark:
  worker:
    replicaCount: 5

# Scale the object storage layer to distributed mode
storage:
  mode: distributed
  replicas: 4

Apply with:

helm upgrade akko ./helm/akko -n akko \
  -f ../../values-production.yaml \
  --set-file akko-keycloak.realm.data=../../realm-production.json

For on-demand scaling with Horizontal Pod Autoscalers (HPA), configure directly in Kubernetes:

kubectl autoscale deployment akko-trino-worker -n akko \
  --min=2 --max=10 --cpu-percent=70

Adding Users and Managing Access

All user management is centralized in Keycloak:

  1. Create user: Keycloak Admin Console > Users > Create
  2. Assign role: User > Role mapping > Assign akko-admin, akko-engineer, etc.
  3. LDAP federation: Keycloak > User federation > Add LDAP provider (connect to Active Directory or directory service)
  4. Group mapping: Create Keycloak groups, assign roles to groups, then add users to groups for bulk role assignment

Monitoring and Alerting

Access the monitoring stack:

  • Dashboards: https://metrics.akko.example.com -- pre-built dashboards for cluster, Trino, Spark, Airflow, PostgreSQL, object storage
  • Prometheus: https://prometheus.akko.example.com -- raw metrics and PromQL queries
  • Alertmanager: https://alertmanager.akko.example.com -- alert routing and silencing

Key metrics to monitor:

Metric Alert Threshold Service
Pod restart count > 3 in 10 min All
PostgreSQL connection pool > 80% akko-postgresql
object storage disk usage > 85% akko-minio
Trino query failures > 10% rate akko-trino
Airflow DAG failure rate > 5% akko-airflow
Node memory pressure > 90% Cluster

9. Security Hardening Checklist

Use this checklist before going live:

Credentials

  • [ ] All global.auth.* passwords are unique, strong, and not defaults
  • [ ] LiteLLM master key is set (akko-litellm.config.general_settings.master_key)
  • [ ] All Keycloak OIDC client secrets match between realm JSON and values
  • [ ] Keycloak admin password is strong and stored securely
  • [ ] PostgreSQL volume is clean (no leftover dev passwords)

TLS

  • [ ] TLS is enabled (global.tls.enabled: true)
  • [ ] cert-manager is installed with a valid ClusterIssuer
  • [ ] All ingress hostnames serve valid certificates
  • [ ] Internal services do not expose HTTP externally
  • [ ] tls_skip_verify_insecure settings are removed or set to false

Network

  • [ ] Traefik LoadBalancer has appropriate firewall rules
  • [ ] Only ports 80 and 443 are exposed externally
  • [ ] Kubernetes API (6443) is not publicly accessible
  • [ ] Network policies restrict pod-to-pod traffic (enable with global.networkPolicies: true)
  • [ ] JupyterHub single-user pods have controlled egress

Pod Security

  • [ ] Pods run as non-root (global.podSecurityContext.runAsNonRoot: true -- default)
  • [ ] No containers run with privileged: true
  • [ ] All containers drop ALL capabilities
  • [ ] Pod Security Standards are enforced at the namespace level:
    kubectl label namespace akko \
      pod-security.kubernetes.io/enforce=restricted \
      pod-security.kubernetes.io/warn=restricted
    

Secrets Management

For enterprise deployments, consider external secrets management:

# Example: External Secrets Operator with HashiCorp Vault
# Install the operator, then create ExternalSecret resources
# that sync from Vault to Kubernetes Secrets
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: akko-postgresql
  namespace: akko
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: ClusterSecretStore
  target:
    name: akko-postgresql
  data:
    - secretKey: postgres-password
      remoteRef:
        key: akko/postgresql
        property: password

Audit Logging

  • [ ] Keycloak audit events are enabled (login, token exchange, admin actions)
  • [ ] Kubernetes audit logging is configured at the cluster level
  • [ ] Airflow task logs are persisted (not ephemeral)
  • [ ] Dashboards and Superset track user login events

Image Security

  • [ ] All images use pinned version tags (no :latest)
  • [ ] Custom images are scanned for vulnerabilities before deployment
  • [ ] Private registry uses TLS and authentication
  • [ ] global.image.pullSecrets is configured for private registries

Further Reading