About AKKO¶
Vision¶
The modern data stack has become synonymous with cloud lock-in. Platforms like Databricks, Snowflake, and BigQuery offer undeniable convenience — but at the cost of sovereignty. Your data lives on someone else's infrastructure, governed by someone else's terms, priced by someone else's meter.
AKKO exists to prove that you do not have to choose between power and ownership. A single engineer, a small team, or an entire organization should be able to deploy a production-grade analytics stack — lakehouse storage, multi-engine compute, governance, orchestration, monitoring, and visualization — without surrendering control of a single component.
Philosophy: Own Every Layer¶
The name says it all: Analytics Kernel, Keep Ownership.
AKKO's design principle is that every layer of the analytics stack — from the bytes on disk to the pixels on screen — should be composed of open-source software that you can inspect, fork, and replace.
| Layer | AKKO Component | What It Replaces |
|---|---|---|
| Object Storage | MinIO | Amazon S3, Azure Blob, GCS |
| Table Format | Apache Iceberg | Delta Lake (Databricks), Hive |
| Catalog | Apache Polaris | Unity Catalog, AWS Glue, Hive Metastore |
| Distributed Compute | Spark Connect | Databricks Runtime, EMR |
| Query Federation | Trino | Snowflake, Athena, BigQuery |
| Notebooks | JupyterHub + code-server | Databricks Notebooks, Vertex AI Workbench |
| BI & Dashboards | Apache Superset | Looker, Tableau, Power BI |
| Orchestration | Apache Airflow | Databricks Workflows, Cloud Composer |
| Lineage | OpenLineage | Databricks Unity Catalog lineage |
| Governance | OpenMetadata | Alation, Collibra, Atlan |
| SSO & RBAC | Keycloak | Okta, Auth0, cloud IAM |
| Monitoring | Prometheus + Grafana + Loki | Datadog, CloudWatch, Splunk |
| Local AI | Ollama + LiteLLM + pgvector | OpenAI API, Vertex AI |
| Access Control | OPA (Open Policy Agent) | AWS Lake Formation, Unity Catalog policies |
| Directory | LLDAP | Active Directory, Azure AD |
| Reverse Proxy | Traefik | AWS ALB, Cloudflare |
No BSL (Business Source License) components. No "open core" traps. Apache 2.0 wherever possible.
Technology Choices¶
Every technology in AKKO was selected deliberately. Here is why:
Apache Iceberg over Delta Lake¶
Iceberg is a truly open table format governed by the Apache Software Foundation. Delta Lake, while open-source, is tightly coupled with the Databricks ecosystem. Iceberg's multi-engine support (Spark, Trino, Flink, DuckDB) and vendor-neutral governance make it the natural choice for a sovereign platform.
Apache Polaris over Hive Metastore¶
Polaris is the Apache-incubating Iceberg REST catalog donated by Snowflake. It provides a modern REST API, OAuth2 authentication, and fine-grained RBAC — replacing the aging Hive Metastore and proprietary alternatives like Unity Catalog and AWS Glue.
Trino over Presto¶
Trino (formerly PrestoSQL) is the continuation of the original Presto project by its creators. It has a more active community, faster release cadence, and better Iceberg integration. Its federated query engine connects Iceberg tables, PostgreSQL, and other sources through a single SQL interface.
Spark Connect over Classic Spark¶
Spark Connect provides a thin client/server architecture, allowing notebooks to submit Spark jobs over gRPC without running a full Spark driver in the notebook process. This reduces memory footprint and improves stability in multi-user environments.
JupyterHub over Managed Notebooks¶
JupyterHub spawns isolated, per-user notebook containers with Python, R, Julia, Quarto, and code-server pre-installed. Unlike managed notebook services, every dependency is explicit in a Dockerfile you control.
Keycloak over Cloud IAM¶
Keycloak provides enterprise-grade SSO with OpenID Connect across all 14+ AKKO services. Five RBAC roles (admin, engineer, analyst, steward, viewer) map to fine-grained permissions in Trino, Superset, Airflow, and OpenMetadata — without depending on any cloud identity provider.
OpenMetadata over Proprietary Catalogs¶
OpenMetadata is a fully open-source data governance platform with automated ingestion, lineage tracking, data quality testing, glossary management, and team-based ownership. Unlike Alation or Collibra, it has no enterprise paywall on core features.
Ollama over Cloud AI APIs¶
Ollama runs open-weight LLMs locally. Combined with pgvector in PostgreSQL and LangChain in notebooks, it enables RAG (Retrieval-Augmented Generation) pipelines where no data ever leaves your infrastructure.
AKKO vs. Cloud Platforms¶
A factual comparison of capabilities:
| Capability | AKKO | Databricks | Snowflake |
|---|---|---|---|
| Deployment | Self-hosted (Helm on Kubernetes) | Managed cloud | Managed cloud |
| Table Format | Apache Iceberg | Delta Lake | Iceberg (read), proprietary (native) |
| Catalog | Apache Polaris (REST) | Unity Catalog | Snowflake-managed |
| Compute Engines | Spark, Trino, DuckDB | Spark (Photon) | Snowflake engine |
| Notebooks | JupyterHub + code-server | Databricks Notebooks | Snowflake Notebooks |
| BI | Apache Superset | Databricks SQL / partner | Snowsight / partner |
| Orchestration | Apache Airflow | Databricks Workflows | Snowflake Tasks |
| Data Governance | OpenMetadata | Unity Catalog | Horizon |
| SSO | Keycloak (self-hosted) | Cloud IAM integration | Cloud IAM integration |
| Monitoring | Prometheus + Grafana + Loki | Cloud-native metrics | Query history |
| Local AI / LLMs | Ollama + LiteLLM (fully offline) | Mosaic AI | Cortex AI |
| Data Sovereignty | Full (your infra) | Cloud provider dependent | Cloud provider dependent |
| Pricing | Open-source (zero licensing costs) | Consumption-based (DBU) | Consumption-based (credits) |
| Vendor Lock-in | None | Moderate (Delta, Unity) | High (proprietary format) |
| License | Apache 2.0 | Apache 2.0 / BSL mix | Proprietary |
Fair comparison
Cloud platforms offer managed infrastructure, auto-scaling, and enterprise support that AKKO does not provide out of the box. AKKO's value proposition is sovereignty, transparency, and zero recurring cost — not competing with managed service SLAs.
License¶
AKKO is open-source software. The platform itself and all configuration, scripts, and documentation are released under an open-source license.
Every bundled component uses an open-source license, with a strong preference for Apache License 2.0. No BSL (Business Source License) components are included.
Own Every Layer.