ADR-004: Reserve Redpanda for Future Streaming Needs¶
Status¶
Proposed
Date¶
2026-03-09
Context¶
AKKO is currently a batch-first platform (Airflow + Spark + Trino). However, future use cases may require event streaming — real-time ingestion, CDC (Change Data Capture), event-driven architectures. When that time comes, AKKO needs a streaming backbone. Options evaluated:
- Apache Kafka — Industry standard, massive ecosystem, JVM-based
- Redpanda — Kafka-compatible, C++ implementation, no JVM/ZooKeeper
- Apache Pulsar — Multi-tenant, tiered storage, separate compute/storage
Decision¶
When streaming is needed, use Redpanda as the event streaming platform. Currently not deployed — AKKO remains batch-first until a concrete streaming use case emerges.
Why Redpanda (when the time comes): 1. Kafka-compatible — Same protocol, same client libraries, same ecosystem. Kafka Connect, Schema Registry, all Kafka tools work unchanged. 2. No JVM — Written in C++, uses Seastar framework (same as ScyllaDB). 10x less memory, faster startup, predictable latency. 3. No ZooKeeper/KRaft complexity — Self-contained binary, built-in Raft consensus. Dramatically simpler operations. 4. Iceberg Topics — Redpanda's roadmap includes direct streaming-to-Iceberg, eliminating the Kafka Connect + Iceberg Sink Connector pipeline. 5. Resource-efficient — Critical for AKKO's distribution model where customers run on-premises with finite resources. 6. Sovereignty-friendly — Fully self-hosted, BSL-1.1 license (converts to Apache 2.0 after 4 years, source-available immediately).
Alternatives Considered¶
Apache Kafka¶
- Industry standard, 80%+ market share in streaming
- Massive ecosystem (Kafka Connect, Kafka Streams, ksqlDB, Schema Registry)
- But: JVM-based, requires significant memory (broker + ZooKeeper/KRaft), operationally complex
- KRaft mode (no ZooKeeper) is GA but still maturing
- Confluent's commercial offerings create vendor gravity
- Rejected for AKKO: resource overhead too high for on-premises distribution model. Redpanda provides same API with 10x fewer resources
Apache Pulsar¶
- Multi-tenant by design, tiered storage built-in
- Separate compute and storage (BookKeeper)
- But: smaller ecosystem than Kafka, more complex architecture (broker + BookKeeper + ZooKeeper)
- Not Kafka-compatible — different client libraries, different paradigm
- Rejected: complexity and ecosystem size. Kafka compatibility is more valuable than Pulsar's architecture advantages
Consequences¶
Positive¶
- No streaming infrastructure overhead until actually needed (batch-first saves resources)
- When deployed, Redpanda slots in with minimal friction (Kafka-compatible)
- Existing Kafka ecosystem knowledge transfers directly
- Resource-efficient deployment aligns with on-premises distribution model
Negative¶
- No real-time streaming capability until Redpanda is deployed
- Redpanda uses BSL-1.1 license (not Apache 2.0) — source-available but with usage restrictions for competing products
- Smaller community than Kafka (though growing rapidly)
Neutral¶
- Batch pipelines via Airflow cover all current use cases
- CDC can be approximated via scheduled incremental loads (Airflow + Spark) until streaming is deployed
- Decision can be revisited — if Kafka ecosystem advantages outweigh resource concerns at deployment time, switching is low-cost (same protocol)