Skip to content

Object storage

AKKO ships an S3-compatible object storage layer as the substrate for the Iceberg data lake, Spark event logs, MLflow artifacts, akko-rag PDF uploads, and any other blob storage need.

The public surface is purely the S3 API so the underlying engine can be swapped (or pointed at AWS S3, Wasabi, OVH cloud-object-storage, etc.) without any code change in the platform. The deployment lives in the akko-storage sub-chart and runs in the same akko namespace as the rest of the platform.


Architecture

  Polaris ──────┐
                │ S3 API (signature v4)
  Trino ────────┼──→ akko-storage:<s3-port> ──→ /data PVC
  Spark ────────┤
  MLflow ───────┘

  akko-rag ─────→ /uploads/* via boto3
  ADEN dashboards ─→ /reports/*

All compute engines (Trino, Spark, akko-rag, MLflow) reach the storage layer through the standard S3 API ; they don't know what the backing engine is. To swap to a different S3-compatible provider, change the endpoint URL + credentials in values.yaml — no consumer code change.


Ports

Port Purpose Exposed
8333 S3 API (REST, signature v4) ClusterIP
8888 Admin UI ClusterIP only — port-forward for ops
9091 Prometheus /metrics scraped by ServiceMonitor

External access through Traefik : https://storage.akko-ai.com (or the value of global.functionalAliases.storage in your overrides) — gated by oauth2-proxy + Keycloak SSO.


Buckets seeded at install

The post-install Job akko-storage-bucket-init creates the buckets the platform needs :

Bucket Owner service Purpose
akko-iceberg Polaris / Trino / Spark Iceberg table data + metadata
akko-spark-logs Spark master + workers Spark event logs
akko-mlflow MLflow Model registry artifacts
akko-rag akko-rag Uploaded PDFs / DOCX / chunks
akko-aden ADEN Persisted dashboard files
akko-airflow-logs Airflow Task logs

To add another bucket, append to helm/akko/charts/akko-storage/values.yaml :

storage:
  buckets:
    - name: my-team
      versioning: false

Then helm upgrade — the init Job is idempotent and creates only the missing buckets.


Credentials

The S3 access key + secret are stored in the akko-storage-creds Secret and projected into every consumer pod under the standard env vars :

AKKO_S3_ENDPOINT=http://akko-storage:8333
AKKO_S3_ACCESS_KEY=<from secret>
AKKO_S3_SECRET_KEY=<from secret>
AKKO_S3_REGION=akko-local

Quick reference — boto3 from a notebook

import os, boto3
s3 = boto3.client(
    "s3",
    endpoint_url=os.environ["AKKO_S3_ENDPOINT"],
    aws_access_key_id=os.environ["AKKO_S3_ACCESS_KEY"],
    aws_secret_access_key=os.environ["AKKO_S3_SECRET_KEY"],
    region_name=os.environ.get("AKKO_S3_REGION", "akko-local"),
)
# List buckets
print([b["Name"] for b in s3.list_buckets()["Buckets"]])
# Upload a CSV
s3.upload_file("./customers.csv", "akko-iceberg", "raw/customers.csv")
# Read it through Spark
df = spark.read.csv(
    "s3a://akko-iceberg/raw/customers.csv", header=True, inferSchema=True
)
df.show()

Quick reference — aws CLI

export AWS_ACCESS_KEY_ID=$AKKO_S3_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=$AKKO_S3_SECRET_KEY
export AWS_DEFAULT_REGION=akko-local

aws --endpoint-url $AKKO_S3_ENDPOINT s3 ls
aws --endpoint-url $AKKO_S3_ENDPOINT s3 cp ./report.pdf \
    s3://akko-rag/uploads/report.pdf

Smoke test

To verify the install end-to-end :

kubectl -n akko exec deploy/akko-akko-cockpit -- \
  curl -s http://akko-storage:8333/ | head
# Expect : ListAllMyBucketsResult XML

kubectl -n akko logs job/akko-storage-bucket-init --tail=20
# Expect : "[+] bucket <name> ready" for each seeded bucket

Operational notes

  • Replication — single-node deploy by default. To replicate, set storage.replication: "001" (one extra copy on a different rack — adapt to your cluster topology) and add at least 2 workers in storage.workerCount.
  • Encryption at rest — chart-level encryption is supported but off by default. Flip storage.encryptionAtRest.enabled: true and provision the KMS key per the encryption runbook.
  • Backup — recommended approach is a CronJob running aws s3 sync to a secondary off-site S3-compatible endpoint. The DR drill log captures the procedure end-to-end.

  • Apache Polaris — Iceberg catalog reading from akko-iceberg
  • Trino — federation engine using the Iceberg catalog
  • Spark Connect — writes Iceberg tables through Polaris
  • akko-rag — uploads PDFs to the akko-rag bucket