ADR-003: Use Apache Polaris as Iceberg Catalog¶
Status¶
Accepted
Date¶
2026-03-09
Context¶
AKKO needs a catalog service to manage Iceberg tables — namespaces, table metadata, access control, and multi-engine coordination (Trino + Spark must see the same tables). The catalog must implement the Iceberg REST catalog specification. Options evaluated:
- Apache Polaris — Apache TLP (incubating Feb 2025, graduated 2026), donated by Snowflake
- Hive Metastore (HMS) — Legacy standard, Thrift-based
- Unity Catalog OSS — Donated by Databricks to Linux Foundation
- Project Nessie — Git-like catalog with branching/tagging
Decision¶
Use Apache Polaris as the Iceberg catalog for AKKO.
Why Polaris wins: 1. Native REST catalog — Implements the Iceberg REST catalog spec natively. No adapters, no bridges. Trino and Spark connect via standard REST endpoints. 2. Apache TLP — True community governance (Apache Software Foundation). No single-vendor control, unlike Unity Catalog (Databricks). 3. Multi-engine — Designed from day one for Trino + Spark + Flink + any REST-compatible engine. HMS was Hive-first, others bolted on. 4. PostgreSQL backend — Uses standard RDBMS for metadata persistence. No embedded Derby, no custom storage. 5. Built-in RBAC — Catalog roles, principal roles, privilege grants. Fine-grained access control at catalog/namespace/table level. 6. Credential vending — Can vend temporary S3 credentials to engines (with STS support). Centralizes storage access control. 7. Backed by Snowflake + Confluent — Strong industry backing, active development.
Alternatives Considered¶
Hive Metastore (HMS)¶
- Legacy standard, every Hadoop-era tool supports it
- Thrift-based protocol — old, complex, hard to extend
- No native REST catalog support (requires Iceberg's
HiveCatalogadapter) - No built-in RBAC, no credential vending
- Requires its own MySQL/PostgreSQL schema (separate from application data)
- Rejected: legacy technology, no REST catalog, adds unnecessary complexity
Unity Catalog OSS¶
- Donated by Databricks to Linux Foundation (2024)
- Supports Iceberg, Delta, and Hudi table formats
- Still early-stage as OSS project, community governance unclear
- Databricks retains significant influence over direction
- Rejected: too early, governance concerns, Databricks-centric
Project Nessie¶
- Git-like branching and tagging for data (branch = isolated namespace)
- Interesting for data versioning and CI/CD for data
- Smaller community, less production deployment experience
- Branching model adds complexity for simple lakehouse use cases
- Rejected: branching model is overkill for AKKO's current needs, may revisit for advanced use cases
Consequences¶
Positive¶
- Standard REST catalog means any Iceberg-compatible engine works out of the box
- RBAC provides governance at the catalog level (complementing OPA at the query level)
- PostgreSQL backend integrates cleanly with AKKO's existing PostgreSQL infrastructure
- Future-proof — REST catalog is the emerging standard, HMS is declining
Negative¶
- Polaris is relatively new (Apache TLP 2026) — fewer production references than HMS
- Credential vending requires STS support from object storage (MinIO's STS is limited — mitigated by
stsUnavailable: true) - Polaris Quarkus-based application has its own operational quirks (healthcheck on port 8182, bootstrap credentials on first start only)
Neutral¶
- Migration from HMS to Polaris is possible via table registration (no data movement)
- Polaris does not replace data governance tools (OpenMetadata handles business metadata, lineage, quality)