ADR-002: Use Apache Airflow as Orchestrator¶
Status¶
Accepted
Date¶
2026-03-09
Context¶
AKKO needs a workflow orchestration engine to schedule and monitor data pipelines (ETL/ELT, dbt runs, Spark jobs, data quality checks). The orchestrator must be open-source, self-hosted, and integrate with Iceberg/Trino/Spark. Options evaluated:
- Apache Airflow — Apache TLP, industry standard, 10K+ production deployments
- Dagster — Modern, asset-centric, Software-Defined Assets
- Prefect — Cloud-first, hybrid model
- Mage — Notebook-style, newer entrant
Decision¶
Use Apache Airflow as the orchestration engine for AKKO.
Why Airflow wins:
1. Ecosystem scale — 50x larger ecosystem than Dagster. 2,000+ provider packages (AWS, GCP, Spark, Trino, dbt, Kubernetes, Slack, etc.). Every tool AKKO uses has an Airflow provider.
2. Community — 10,000+ DAGs in production globally, massive Stack Overflow/blog coverage, easy to hire for.
3. Airflow 3.0 — Closes the DX gap with Dagster: TaskFlow API (Pythonic), dynamic task mapping, improved UI (Grid view), dataset-aware scheduling, multi-tenant support.
4. OpenLineage integration — Native lineage emission via openlineage-airflow provider, feeding directly into OpenMetadata.
5. Battle-tested — Used by Airbnb, Spotify, Twitter, Lyft, and thousands of enterprises. Production reliability is proven.
6. Self-hosted — No cloud dependency, runs entirely on customer infrastructure.
Alternatives Considered¶
Dagster¶
- Superior developer experience (Software-Defined Assets, type-safe, built-in testing)
- Asset-centric model is more intuitive than task-centric
- But: tiny ecosystem (~50 integrations vs Airflow's 2,000+), small community, fewer production deployments
- Enterprise customers asking "do you support Airflow?" — never "do you support Dagster?"
- Rejected: better DX does not outweigh ecosystem and market reality
Prefect¶
- Clean Python API, modern architecture
- Cloud-first business model (Prefect Cloud is the primary product)
- Self-hosted ("Prefect Server") is community-supported, not the focus
- Rejected: cloud-first conflicts with AKKO's sovereignty promise
Mage¶
- Notebook-style pipeline building, good for data engineers who think in notebooks
- Young project, small team, uncertain long-term viability
- Rejected: maturity and ecosystem too limited
Consequences¶
Positive¶
- Every AKKO customer already knows (or can easily learn) Airflow
- Rich provider ecosystem means plug-and-play integration with any tool
- Airflow 3.0 modernizes the developer experience significantly
- Massive hiring pool for Airflow skills
Negative¶
- More boilerplate than Dagster (DAG files, operator imports, connection setup)
- Airflow's scheduler can be resource-hungry at scale (mitigated by CeleryExecutor/KubernetesExecutor)
- Legacy patterns (BashOperator, XCom abuse) require discipline to avoid
Neutral¶
- DAG-as-code pattern requires Git discipline (which AKKO enforces anyway)
- Migration to Dagster in the future is possible but unlikely given Airflow 3.0 trajectory