Skip to content

ADR-002: Use Apache Airflow as Orchestrator

Status

Accepted

Date

2026-03-09

Context

AKKO needs a workflow orchestration engine to schedule and monitor data pipelines (ETL/ELT, dbt runs, Spark jobs, data quality checks). The orchestrator must be open-source, self-hosted, and integrate with Iceberg/Trino/Spark. Options evaluated:

  • Apache Airflow — Apache TLP, industry standard, 10K+ production deployments
  • Dagster — Modern, asset-centric, Software-Defined Assets
  • Prefect — Cloud-first, hybrid model
  • Mage — Notebook-style, newer entrant

Decision

Use Apache Airflow as the orchestration engine for AKKO.

Why Airflow wins: 1. Ecosystem scale — 50x larger ecosystem than Dagster. 2,000+ provider packages (AWS, GCP, Spark, Trino, dbt, Kubernetes, Slack, etc.). Every tool AKKO uses has an Airflow provider. 2. Community — 10,000+ DAGs in production globally, massive Stack Overflow/blog coverage, easy to hire for. 3. Airflow 3.0 — Closes the DX gap with Dagster: TaskFlow API (Pythonic), dynamic task mapping, improved UI (Grid view), dataset-aware scheduling, multi-tenant support. 4. OpenLineage integration — Native lineage emission via openlineage-airflow provider, feeding directly into OpenMetadata. 5. Battle-tested — Used by Airbnb, Spotify, Twitter, Lyft, and thousands of enterprises. Production reliability is proven. 6. Self-hosted — No cloud dependency, runs entirely on customer infrastructure.

Alternatives Considered

Dagster

  • Superior developer experience (Software-Defined Assets, type-safe, built-in testing)
  • Asset-centric model is more intuitive than task-centric
  • But: tiny ecosystem (~50 integrations vs Airflow's 2,000+), small community, fewer production deployments
  • Enterprise customers asking "do you support Airflow?" — never "do you support Dagster?"
  • Rejected: better DX does not outweigh ecosystem and market reality

Prefect

  • Clean Python API, modern architecture
  • Cloud-first business model (Prefect Cloud is the primary product)
  • Self-hosted ("Prefect Server") is community-supported, not the focus
  • Rejected: cloud-first conflicts with AKKO's sovereignty promise

Mage

  • Notebook-style pipeline building, good for data engineers who think in notebooks
  • Young project, small team, uncertain long-term viability
  • Rejected: maturity and ecosystem too limited

Consequences

Positive

  • Every AKKO customer already knows (or can easily learn) Airflow
  • Rich provider ecosystem means plug-and-play integration with any tool
  • Airflow 3.0 modernizes the developer experience significantly
  • Massive hiring pool for Airflow skills

Negative

  • More boilerplate than Dagster (DAG files, operator imports, connection setup)
  • Airflow's scheduler can be resource-hungry at scale (mitigated by CeleryExecutor/KubernetesExecutor)
  • Legacy patterns (BashOperator, XCom abuse) require discipline to avoid

Neutral

  • DAG-as-code pattern requires Git discipline (which AKKO enforces anyway)
  • Migration to Dagster in the future is possible but unlikely given Airflow 3.0 trajectory

References