Skip to content

ADR-004: Reserve Redpanda for Future Streaming Needs

Status

Proposed

Date

2026-03-09

Context

AKKO is currently a batch-first platform (Airflow + Spark + Trino). However, future use cases may require event streaming — real-time ingestion, CDC (Change Data Capture), event-driven architectures. When that time comes, AKKO needs a streaming backbone. Options evaluated:

  • Apache Kafka — Industry standard, massive ecosystem, JVM-based
  • Redpanda — Kafka-compatible, C++ implementation, no JVM/ZooKeeper
  • Apache Pulsar — Multi-tenant, tiered storage, separate compute/storage

Decision

When streaming is needed, use Redpanda as the event streaming platform. Currently not deployed — AKKO remains batch-first until a concrete streaming use case emerges.

Why Redpanda (when the time comes): 1. Kafka-compatible — Same protocol, same client libraries, same ecosystem. Kafka Connect, Schema Registry, all Kafka tools work unchanged. 2. No JVM — Written in C++, uses Seastar framework (same as ScyllaDB). 10x less memory, faster startup, predictable latency. 3. No ZooKeeper/KRaft complexity — Self-contained binary, built-in Raft consensus. Dramatically simpler operations. 4. Iceberg Topics — Redpanda's roadmap includes direct streaming-to-Iceberg, eliminating the Kafka Connect + Iceberg Sink Connector pipeline. 5. Resource-efficient — Critical for AKKO's distribution model where customers run on-premises with finite resources. 6. Sovereignty-friendly — Fully self-hosted, BSL-1.1 license (converts to Apache 2.0 after 4 years, source-available immediately).

Alternatives Considered

Apache Kafka

  • Industry standard, 80%+ market share in streaming
  • Massive ecosystem (Kafka Connect, Kafka Streams, ksqlDB, Schema Registry)
  • But: JVM-based, requires significant memory (broker + ZooKeeper/KRaft), operationally complex
  • KRaft mode (no ZooKeeper) is GA but still maturing
  • Confluent's commercial offerings create vendor gravity
  • Rejected for AKKO: resource overhead too high for on-premises distribution model. Redpanda provides same API with 10x fewer resources

Apache Pulsar

  • Multi-tenant by design, tiered storage built-in
  • Separate compute and storage (BookKeeper)
  • But: smaller ecosystem than Kafka, more complex architecture (broker + BookKeeper + ZooKeeper)
  • Not Kafka-compatible — different client libraries, different paradigm
  • Rejected: complexity and ecosystem size. Kafka compatibility is more valuable than Pulsar's architecture advantages

Consequences

Positive

  • No streaming infrastructure overhead until actually needed (batch-first saves resources)
  • When deployed, Redpanda slots in with minimal friction (Kafka-compatible)
  • Existing Kafka ecosystem knowledge transfers directly
  • Resource-efficient deployment aligns with on-premises distribution model

Negative

  • No real-time streaming capability until Redpanda is deployed
  • Redpanda uses BSL-1.1 license (not Apache 2.0) — source-available but with usage restrictions for competing products
  • Smaller community than Kafka (though growing rapidly)

Neutral

  • Batch pipelines via Airflow cover all current use cases
  • CDC can be approximated via scheduled incremental loads (Airflow + Spark) until streaming is deployed
  • Decision can be revisited — if Kafka ecosystem advantages outweigh resource concerns at deployment time, switching is low-cost (same protocol)

References