Object storage¶

AKKO ships an S3-compatible object storage layer as the substrate for the Iceberg data lake, Spark event logs, MLflow artifacts, akko-rag PDF uploads, and any other blob storage need.

The public surface is purely the S3 API so the underlying engine can be swapped (or pointed at AWS S3, Wasabi, OVH cloud-object-storage, etc.) without any code change in the platform. The deployment lives in the akko-storage sub-chart and runs in the same akko namespace as the rest of the platform.

Architecture¶

  Polaris ──────┐
                │ S3 API (signature v4)
  Trino ────────┼──→ akko-storage:<s3-port> ──→ /data PVC
                │
  Spark ────────┤
                │
  MLflow ───────┘

  akko-rag ─────→ /uploads/* via boto3
  ADEN dashboards ─→ /reports/*

All compute engines (Trino, Spark, akko-rag, MLflow) reach the storage layer through the standard S3 API ; they don't know what the backing engine is. To swap to a different S3-compatible provider, change the endpoint URL + credentials in values.yaml — no consumer code change.

Ports¶

Port	Purpose	Exposed
8333	S3 API (REST, signature v4)	ClusterIP
8888	Admin UI	ClusterIP only — port-forward for ops
9091	Prometheus `/metrics`	scraped by ServiceMonitor

External access through Traefik : https://storage.akko-ai.com (or the value of global.functionalAliases.storage in your overrides) — gated by oauth2-proxy + Keycloak SSO.

Buckets seeded at install¶

The post-install Job akko-storage-bucket-init creates the buckets the platform needs :

Bucket	Owner service	Purpose
`akko-iceberg`	Polaris / Trino / Spark	Iceberg table data + metadata
`akko-spark-logs`	Spark master + workers	Spark event logs
`akko-mlflow`	MLflow	Model registry artifacts
`akko-rag`	akko-rag	Uploaded PDFs / DOCX / chunks
`akko-aden`	ADEN	Persisted dashboard files
`akko-airflow-logs`	Airflow	Task logs

To add another bucket, append to helm/akko/charts/akko-storage/values.yaml :

storage:
  buckets:
    - name: my-team
      versioning: false

Then helm upgrade — the init Job is idempotent and creates only the missing buckets.

Credentials¶

The S3 access key + secret are stored in the akko-storage-creds Secret and projected into every consumer pod under the standard env vars :

AKKO_S3_ENDPOINT=http://akko-storage:8333
AKKO_S3_ACCESS_KEY=<from secret>
AKKO_S3_SECRET_KEY=<from secret>
AKKO_S3_REGION=akko-local

Quick reference — boto3 from a notebook¶

import os, boto3
s3 = boto3.client(
    "s3",
    endpoint_url=os.environ["AKKO_S3_ENDPOINT"],
    aws_access_key_id=os.environ["AKKO_S3_ACCESS_KEY"],
    aws_secret_access_key=os.environ["AKKO_S3_SECRET_KEY"],
    region_name=os.environ.get("AKKO_S3_REGION", "akko-local"),
)
# List buckets
print([b["Name"] for b in s3.list_buckets()["Buckets"]])
# Upload a CSV
s3.upload_file("./customers.csv", "akko-iceberg", "raw/customers.csv")
# Read it through Spark
df = spark.read.csv(
    "s3a://akko-iceberg/raw/customers.csv", header=True, inferSchema=True
)
df.show()

Quick reference — `aws` CLI¶

export AWS_ACCESS_KEY_ID=$AKKO_S3_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=$AKKO_S3_SECRET_KEY
export AWS_DEFAULT_REGION=akko-local

aws --endpoint-url $AKKO_S3_ENDPOINT s3 ls
aws --endpoint-url $AKKO_S3_ENDPOINT s3 cp ./report.pdf \
    s3://akko-rag/uploads/report.pdf

Smoke test¶

To verify the install end-to-end :

kubectl -n akko exec deploy/akko-akko-cockpit -- \
  curl -s http://akko-storage:8333/ | head
# Expect : ListAllMyBucketsResult XML

kubectl -n akko logs job/akko-storage-bucket-init --tail=20
# Expect : "[+] bucket <name> ready" for each seeded bucket

Operational notes¶

Replication — single-node deploy by default. To replicate, set storage.replication: "001" (one extra copy on a different rack — adapt to your cluster topology) and add at least 2 workers in storage.workerCount.
Encryption at rest — chart-level encryption is supported but off by default. Flip storage.encryptionAtRest.enabled: true and provision the KMS key per the encryption runbook.
Backup — recommended approach is a CronJob running aws s3 sync to a secondary off-site S3-compatible endpoint. The DR drill log captures the procedure end-to-end.

Apache Polaris — Iceberg catalog reading from akko-iceberg
Trino — federation engine using the Iceberg catalog
Spark Connect — writes Iceberg tables through Polaris
akko-rag — uploads PDFs to the akko-rag bucket