Skip to content

ADR-003: Use Apache Polaris as Iceberg Catalog

Status

Accepted

Date

2026-03-09

Context

AKKO needs a catalog service to manage Iceberg tables — namespaces, table metadata, access control, and multi-engine coordination (Trino + Spark must see the same tables). The catalog must implement the Iceberg REST catalog specification. Options evaluated:

  • Apache Polaris — Apache TLP (incubating Feb 2025, graduated 2026), donated by Snowflake
  • Hive Metastore (HMS) — Legacy standard, Thrift-based
  • Unity Catalog OSS — Donated by Databricks to Linux Foundation
  • Project Nessie — Git-like catalog with branching/tagging

Decision

Use Apache Polaris as the Iceberg catalog for AKKO.

Why Polaris wins: 1. Native REST catalog — Implements the Iceberg REST catalog spec natively. No adapters, no bridges. Trino and Spark connect via standard REST endpoints. 2. Apache TLP — True community governance (Apache Software Foundation). No single-vendor control, unlike Unity Catalog (Databricks). 3. Multi-engine — Designed from day one for Trino + Spark + Flink + any REST-compatible engine. HMS was Hive-first, others bolted on. 4. PostgreSQL backend — Uses standard RDBMS for metadata persistence. No embedded Derby, no custom storage. 5. Built-in RBAC — Catalog roles, principal roles, privilege grants. Fine-grained access control at catalog/namespace/table level. 6. Credential vending — Can vend temporary S3 credentials to engines (with STS support). Centralizes storage access control. 7. Backed by Snowflake + Confluent — Strong industry backing, active development.

Alternatives Considered

Hive Metastore (HMS)

  • Legacy standard, every Hadoop-era tool supports it
  • Thrift-based protocol — old, complex, hard to extend
  • No native REST catalog support (requires Iceberg's HiveCatalog adapter)
  • No built-in RBAC, no credential vending
  • Requires its own MySQL/PostgreSQL schema (separate from application data)
  • Rejected: legacy technology, no REST catalog, adds unnecessary complexity

Unity Catalog OSS

  • Donated by Databricks to Linux Foundation (2024)
  • Supports Iceberg, Delta, and Hudi table formats
  • Still early-stage as OSS project, community governance unclear
  • Databricks retains significant influence over direction
  • Rejected: too early, governance concerns, Databricks-centric

Project Nessie

  • Git-like branching and tagging for data (branch = isolated namespace)
  • Interesting for data versioning and CI/CD for data
  • Smaller community, less production deployment experience
  • Branching model adds complexity for simple lakehouse use cases
  • Rejected: branching model is overkill for AKKO's current needs, may revisit for advanced use cases

Consequences

Positive

  • Standard REST catalog means any Iceberg-compatible engine works out of the box
  • RBAC provides governance at the catalog level (complementing OPA at the query level)
  • PostgreSQL backend integrates cleanly with AKKO's existing PostgreSQL infrastructure
  • Future-proof — REST catalog is the emerging standard, HMS is declining

Negative

  • Polaris is relatively new (Apache TLP 2026) — fewer production references than HMS
  • Credential vending requires STS support from object storage (MinIO's STS is limited — mitigated by stsUnavailable: true)
  • Polaris Quarkus-based application has its own operational quirks (healthcheck on port 8182, bootstrap credentials on first start only)

Neutral

  • Migration from HMS to Polaris is possible via table registration (no data movement)
  • Polaris does not replace data governance tools (OpenMetadata handles business metadata, lineage, quality)

References