Skip to content

About AKKO

Vision

The modern data stack has become synonymous with cloud lock-in. Platforms like Databricks, Snowflake, and BigQuery offer undeniable convenience — but at the cost of sovereignty. Your data lives on someone else's infrastructure, governed by someone else's terms, priced by someone else's meter.

AKKO exists to prove that you do not have to choose between power and ownership. A single engineer, a small team, or an entire organization should be able to deploy a production-grade analytics stack — lakehouse storage, multi-engine compute, governance, orchestration, monitoring, and visualization — without surrendering control of a single component.


Philosophy: Own Every Layer

The name says it all: Analytics Kernel, Keep Ownership.

AKKO's design principle is that every layer of the analytics stack — from the bytes on disk to the pixels on screen — should be composed of open-source software that you can inspect, fork, and replace.

Layer AKKO Component What It Replaces
Object Storage MinIO Amazon S3, Azure Blob, GCS
Table Format Apache Iceberg Delta Lake (Databricks), Hive
Catalog Apache Polaris Unity Catalog, AWS Glue, Hive Metastore
Distributed Compute Spark Connect Databricks Runtime, EMR
Query Federation Trino Snowflake, Athena, BigQuery
Notebooks JupyterHub + code-server Databricks Notebooks, Vertex AI Workbench
BI & Dashboards Apache Superset Looker, Tableau, Power BI
Orchestration Apache Airflow Databricks Workflows, Cloud Composer
Lineage OpenLineage Databricks Unity Catalog lineage
Governance OpenMetadata Alation, Collibra, Atlan
SSO & RBAC Keycloak Okta, Auth0, cloud IAM
Monitoring Prometheus + Grafana + Loki Datadog, CloudWatch, Splunk
Local AI Ollama + LiteLLM + pgvector OpenAI API, Vertex AI
Access Control OPA (Open Policy Agent) AWS Lake Formation, Unity Catalog policies
Directory LLDAP Active Directory, Azure AD
Reverse Proxy Traefik AWS ALB, Cloudflare

No BSL (Business Source License) components. No "open core" traps. Apache 2.0 wherever possible.


Technology Choices

Every technology in AKKO was selected deliberately. Here is why:

Apache Iceberg over Delta Lake

Iceberg is a truly open table format governed by the Apache Software Foundation. Delta Lake, while open-source, is tightly coupled with the Databricks ecosystem. Iceberg's multi-engine support (Spark, Trino, Flink, DuckDB) and vendor-neutral governance make it the natural choice for a sovereign platform.

Apache Polaris over Hive Metastore

Polaris is the Apache-incubating Iceberg REST catalog donated by Snowflake. It provides a modern REST API, OAuth2 authentication, and fine-grained RBAC — replacing the aging Hive Metastore and proprietary alternatives like Unity Catalog and AWS Glue.

Trino over Presto

Trino (formerly PrestoSQL) is the continuation of the original Presto project by its creators. It has a more active community, faster release cadence, and better Iceberg integration. Its federated query engine connects Iceberg tables, PostgreSQL, and other sources through a single SQL interface.

Spark Connect over Classic Spark

Spark Connect provides a thin client/server architecture, allowing notebooks to submit Spark jobs over gRPC without running a full Spark driver in the notebook process. This reduces memory footprint and improves stability in multi-user environments.

JupyterHub over Managed Notebooks

JupyterHub spawns isolated, per-user notebook containers with Python, R, Julia, Quarto, and code-server pre-installed. Unlike managed notebook services, every dependency is explicit in a Dockerfile you control.

Keycloak over Cloud IAM

Keycloak provides enterprise-grade SSO with OpenID Connect across all 14+ AKKO services. Five RBAC roles (admin, engineer, analyst, steward, viewer) map to fine-grained permissions in Trino, Superset, Airflow, and OpenMetadata — without depending on any cloud identity provider.

OpenMetadata over Proprietary Catalogs

OpenMetadata is a fully open-source data governance platform with automated ingestion, lineage tracking, data quality testing, glossary management, and team-based ownership. Unlike Alation or Collibra, it has no enterprise paywall on core features.

Ollama over Cloud AI APIs

Ollama runs open-weight LLMs locally. Combined with pgvector in PostgreSQL and LangChain in notebooks, it enables RAG (Retrieval-Augmented Generation) pipelines where no data ever leaves your infrastructure.


AKKO vs. Cloud Platforms

A factual comparison of capabilities:

Capability AKKO Databricks Snowflake
Deployment Self-hosted (Helm on Kubernetes) Managed cloud Managed cloud
Table Format Apache Iceberg Delta Lake Iceberg (read), proprietary (native)
Catalog Apache Polaris (REST) Unity Catalog Snowflake-managed
Compute Engines Spark, Trino, DuckDB Spark (Photon) Snowflake engine
Notebooks JupyterHub + code-server Databricks Notebooks Snowflake Notebooks
BI Apache Superset Databricks SQL / partner Snowsight / partner
Orchestration Apache Airflow Databricks Workflows Snowflake Tasks
Data Governance OpenMetadata Unity Catalog Horizon
SSO Keycloak (self-hosted) Cloud IAM integration Cloud IAM integration
Monitoring Prometheus + Grafana + Loki Cloud-native metrics Query history
Local AI / LLMs Ollama + LiteLLM (fully offline) Mosaic AI Cortex AI
Data Sovereignty Full (your infra) Cloud provider dependent Cloud provider dependent
Pricing Open-source (zero licensing costs) Consumption-based (DBU) Consumption-based (credits)
Vendor Lock-in None Moderate (Delta, Unity) High (proprietary format)
License Apache 2.0 Apache 2.0 / BSL mix Proprietary

Fair comparison

Cloud platforms offer managed infrastructure, auto-scaling, and enterprise support that AKKO does not provide out of the box. AKKO's value proposition is sovereignty, transparency, and zero recurring cost — not competing with managed service SLAs.



License

AKKO is open-source software. The platform itself and all configuration, scripts, and documentation are released under an open-source license.

Every bundled component uses an open-source license, with a strong preference for Apache License 2.0. No BSL (Business Source License) components are included.



AKKO — Analytics Kernel, Keep Ownership
Own Every Layer.