Object storage¶
AKKO ships an S3-compatible object storage layer as the substrate for the Iceberg data lake, Spark event logs, MLflow artifacts, akko-rag PDF uploads, and any other blob storage need.
The public surface is purely the S3 API so the underlying engine can
be swapped (or pointed at AWS S3, Wasabi, OVH cloud-object-storage,
etc.) without any code change in the platform. The deployment lives
in the akko-storage sub-chart and runs in the same akko namespace
as the rest of the platform.
Architecture¶
Polaris ──────┐
│ S3 API (signature v4)
Trino ────────┼──→ akko-storage:<s3-port> ──→ /data PVC
│
Spark ────────┤
│
MLflow ───────┘
akko-rag ─────→ /uploads/* via boto3
ADEN dashboards ─→ /reports/*
All compute engines (Trino, Spark, akko-rag, MLflow) reach the storage
layer through the standard S3 API ; they don't know what the backing
engine is. To swap to a different S3-compatible provider, change the
endpoint URL + credentials in values.yaml — no consumer code change.
Ports¶
| Port | Purpose | Exposed |
|---|---|---|
| 8333 | S3 API (REST, signature v4) | ClusterIP |
| 8888 | Admin UI | ClusterIP only — port-forward for ops |
| 9091 | Prometheus /metrics |
scraped by ServiceMonitor |
External access through Traefik :
https://storage.akko-ai.com
(or the value of global.functionalAliases.storage in your overrides) —
gated by oauth2-proxy + Keycloak SSO.
Buckets seeded at install¶
The post-install Job akko-storage-bucket-init creates the buckets the
platform needs :
| Bucket | Owner service | Purpose |
|---|---|---|
akko-iceberg |
Polaris / Trino / Spark | Iceberg table data + metadata |
akko-spark-logs |
Spark master + workers | Spark event logs |
akko-mlflow |
MLflow | Model registry artifacts |
akko-rag |
akko-rag | Uploaded PDFs / DOCX / chunks |
akko-aden |
ADEN | Persisted dashboard files |
akko-airflow-logs |
Airflow | Task logs |
To add another bucket, append to
helm/akko/charts/akko-storage/values.yaml :
Then helm upgrade — the init Job is idempotent and creates only the
missing buckets.
Credentials¶
The S3 access key + secret are stored in the akko-storage-creds
Secret and projected into every consumer pod under the standard env
vars :
AKKO_S3_ENDPOINT=http://akko-storage:8333
AKKO_S3_ACCESS_KEY=<from secret>
AKKO_S3_SECRET_KEY=<from secret>
AKKO_S3_REGION=akko-local
Quick reference — boto3 from a notebook¶
import os, boto3
s3 = boto3.client(
"s3",
endpoint_url=os.environ["AKKO_S3_ENDPOINT"],
aws_access_key_id=os.environ["AKKO_S3_ACCESS_KEY"],
aws_secret_access_key=os.environ["AKKO_S3_SECRET_KEY"],
region_name=os.environ.get("AKKO_S3_REGION", "akko-local"),
)
# List buckets
print([b["Name"] for b in s3.list_buckets()["Buckets"]])
# Upload a CSV
s3.upload_file("./customers.csv", "akko-iceberg", "raw/customers.csv")
# Read it through Spark
df = spark.read.csv(
"s3a://akko-iceberg/raw/customers.csv", header=True, inferSchema=True
)
df.show()
Quick reference — aws CLI¶
export AWS_ACCESS_KEY_ID=$AKKO_S3_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=$AKKO_S3_SECRET_KEY
export AWS_DEFAULT_REGION=akko-local
aws --endpoint-url $AKKO_S3_ENDPOINT s3 ls
aws --endpoint-url $AKKO_S3_ENDPOINT s3 cp ./report.pdf \
s3://akko-rag/uploads/report.pdf
Smoke test¶
To verify the install end-to-end :
kubectl -n akko exec deploy/akko-akko-cockpit -- \
curl -s http://akko-storage:8333/ | head
# Expect : ListAllMyBucketsResult XML
kubectl -n akko logs job/akko-storage-bucket-init --tail=20
# Expect : "[+] bucket <name> ready" for each seeded bucket
Operational notes¶
- Replication — single-node deploy by default. To replicate, set
storage.replication: "001"(one extra copy on a different rack — adapt to your cluster topology) and add at least 2 workers instorage.workerCount. - Encryption at rest — chart-level encryption is supported but
off by default. Flip
storage.encryptionAtRest.enabled: trueand provision the KMS key per the encryption runbook. - Backup — recommended approach is a
CronJobrunningaws s3 syncto a secondary off-site S3-compatible endpoint. The DR drill log captures the procedure end-to-end.
Related¶
- Apache Polaris — Iceberg catalog reading from
akko-iceberg - Trino — federation engine using the Iceberg catalog
- Spark Connect — writes Iceberg tables through Polaris
- akko-rag — uploads PDFs to the
akko-ragbucket