Skip to content

Apache Polaris

Overview

Apache Polaris 1.3.0 (incubating) serves as the Iceberg REST catalog for AKKO. It provides centralized metadata management for all Iceberg tables, with OAuth2 authentication, RBAC, and a PostgreSQL-backed persistence layer.

Architecture

  Trino ──────┐
              │  REST API + OAuth2
  Spark ──────┼──→  Polaris (:8181)  ──→  PostgreSQL
              │      (Quarkus)            (metadata)
  Notebooks ──┘          │
                     object storage (S3)
                    (table data)

Polaris is a Quarkus-based Java application that implements the Iceberg REST Catalog specification.

Ports

Port Purpose Exposed
8181 All APIs: Iceberg catalog, management, OAuth2 token endpoint Yes (via Traefik)
8182 Health checks (/q/health) and Prometheus metrics only Internal only

Single API Port

Unlike some configurations that separate catalog and management ports, Polaris 1.3.0 serves all APIs (catalog, management, OAuth2) on port 8181. Port 8182 is exclusively for health and metrics.

Configuration

polaris/application.properties
# Persistence - PostgreSQL backend
polaris.persistence.type=relational-jdbc

# S3/object storage file I/O defaults
polaris.catalog.io.default-filesystem=s3

# Disable credential vending (object storage does not support STS AssumeRole)
polaris.features."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"=true
polaris.features."ALLOW_EXTERNAL_CATALOG_CREDENTIAL_VENDING"=false

PostgreSQL connection details (host, port, database, credentials) are passed via environment variables in the Helm values, overriding the defaults.

OAuth2 Authentication

Polaris provides its own OAuth2 token endpoint. Both Trino and Spark authenticate using client credentials (the Polaris root principal).

Token Endpoint

POST http://akko-akko-polaris:8181/api/catalog/v1/oauth/tokens
Content-Type: application/x-www-form-urlencoded

grant_type=client_credentials
&client_id=root
&client_secret={POLARIS_ROOT_SECRET}
&scope=PRINCIPAL_ROLE:ALL

Client Configuration

Client Credential Scope
Trino root:{POLARIS_ROOT_SECRET} PRINCIPAL_ROLE:ALL
Spark Connect root:{POLARIS_ROOT_SECRET} PRINCIPAL_ROLE:ALL

Scope is Mandatory

The PRINCIPAL_ROLE:ALL scope must be specified explicitly. Without it, the default scope is rejected by Polaris with an invalid_scope error.

Management API

The management API allows programmatic administration of catalogs, principals, and roles.

Key Endpoints

Endpoint Method Purpose
/api/management/v1/catalogs GET, POST List/create catalogs
/api/management/v1/catalogs/{name} GET, DELETE Get/delete a catalog
/api/management/v1/principals GET, POST List/create principals
/api/management/v1/principal-roles GET, POST List/create principal roles
/api/management/v1/catalogs/{name}/catalog-roles GET, POST List/create catalog roles

RBAC Model

Polaris implements a two-level role model:

  Principal (root)
  Principal Role (ALL)
  Catalog Role (akko-warehouse-admin)
       │  CATALOG_MANAGE_CONTENT privilege
  Catalog (akko-warehouse)

Setup Sequence

The polaris-init sidecar performs this setup automatically on every deployment (helm install/helm upgrade):

  1. Create catalog akko-warehouse with S3/object storage configuration
  2. Create principal role ALL
  3. Assign principal role ALL to the root principal
  4. Create catalog role with CATALOG_MANAGE_CONTENT privilege
  5. Assign catalog role to the ALL principal role

Never Recreate the Catalog Manually

Always use the polaris-init job to create or recreate the catalog (in Kubernetes: kubectl delete job akko-polaris-init -n akko && helm upgrade ...). The init script contains the correct storageConfigInfo format, stsUnavailable flag, and full RBAC setup. Manual curl commands frequently get the format wrong.

Storage Configuration

Polaris manages Iceberg table metadata while actual data files live in object storage. The catalog's storage configuration must follow a specific format:

Correct storageConfigInfo Format

{
  "storageType": "S3",
  "allowedLocations": ["s3://akko-warehouse/"],
  "stsUnavailable": true,
  "endpoint": "http://akko-minio:9000",
  "pathStyleAccess": true,
  "region": "us-east-1"
}

Format Pitfalls

  • Fields must be top-level on storageConfigInfo -- NOT nested under an s3 object.
  • Do not use dot-notation keys like s3.endpoint -- they are silently ignored.
  • stsUnavailable: true is required because object storage does not support AWS STS AssumeRole.
WRONG -- do not use these formats
// WRONG: nested s3 object
{ "s3": { "endpoint": "http://minio:9000" } }

// WRONG: dot-notation
{ "s3.endpoint": "http://minio:9000" }

Credential Vending

Polaris normally uses AWS STS to vend temporary credentials to clients (Trino, Spark) for direct S3 access. Since object storage does not support STS AssumeRole, this feature is disabled via two settings:

Setting Value Purpose
SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION true Skip STS credential subscoping
ALLOW_EXTERNAL_CATALOG_CREDENTIAL_VENDING false Disable external credential vending
stsUnavailable (catalog) true Signal that STS is not available

Clients (Trino, Spark) use their own direct S3 credentials (object storage access key and secret key) instead of relying on Polaris-vended credentials.

Health Check

The health endpoint is on port 8182 (not 8181):

GET http://akko-akko-polaris:8182/q/health

This is a Quarkus health endpoint that returns UP when the service is ready.

Bootstrap Credentials

On first startup with an empty database, Polaris generates bootstrap credentials controlled by the POLARIS_BOOTSTRAP_CREDENTIALS environment variable. These credentials are the root principal's client ID and secret.

First Start Only

Bootstrap credentials are only applied when the database is empty (first helm install). If the database is recreated, the init sidecar must run again to re-establish the catalog and RBAC.

Known Issues

Important Gotchas

  • Bootstrap credentials persist: POLARIS_BOOTSTRAP_CREDENTIALS is only read on first startup. Changing it in .env has no effect on an existing database.
  • Catalog can disappear: After a database recreate, polaris-init may report "already exists" but the catalog is actually gone. Always verify via the management API and recreate if needed.
  • DROP TABLE forbidden: Polaris RBAC blocks DROP_TABLE_WITH_PURGE. Use CALL iceberg.system.unregister_table('schema', 'table') from Trino instead.
  • Healthcheck port: Use /q/health on port 8182, not 8181. Port 8181 serves the REST API only.
  • Restart after config changes: After changing Polaris configuration, redeploy with helm upgrade to apply changes.