Apache Polaris¶
Overview¶
Apache Polaris 1.3.0 (incubating) serves as the Iceberg REST catalog for AKKO. It provides centralized metadata management for all Iceberg tables, with OAuth2 authentication, RBAC, and a PostgreSQL-backed persistence layer.
Architecture¶
Trino ──────┐
│ REST API + OAuth2
Spark ──────┼──→ Polaris (:8181) ──→ PostgreSQL
│ (Quarkus) (metadata)
Notebooks ──┘ │
▼
object storage (S3)
(table data)
Polaris is a Quarkus-based Java application that implements the Iceberg REST Catalog specification.
Ports¶
| Port | Purpose | Exposed |
|---|---|---|
| 8181 | All APIs: Iceberg catalog, management, OAuth2 token endpoint | Yes (via Traefik) |
| 8182 | Health checks (/q/health) and Prometheus metrics only |
Internal only |
Single API Port
Unlike some configurations that separate catalog and management ports, Polaris 1.3.0 serves all APIs (catalog, management, OAuth2) on port 8181. Port 8182 is exclusively for health and metrics.
Configuration¶
# Persistence - PostgreSQL backend
polaris.persistence.type=relational-jdbc
# S3/object storage file I/O defaults
polaris.catalog.io.default-filesystem=s3
# Disable credential vending (object storage does not support STS AssumeRole)
polaris.features."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"=true
polaris.features."ALLOW_EXTERNAL_CATALOG_CREDENTIAL_VENDING"=false
PostgreSQL connection details (host, port, database, credentials) are passed via environment variables in the Helm values, overriding the defaults.
OAuth2 Authentication¶
Polaris provides its own OAuth2 token endpoint. Both Trino and Spark authenticate using client credentials (the Polaris root principal).
Token Endpoint¶
POST http://akko-akko-polaris:8181/api/catalog/v1/oauth/tokens
Content-Type: application/x-www-form-urlencoded
grant_type=client_credentials
&client_id=root
&client_secret={POLARIS_ROOT_SECRET}
&scope=PRINCIPAL_ROLE:ALL
Client Configuration¶
| Client | Credential | Scope |
|---|---|---|
| Trino | root:{POLARIS_ROOT_SECRET} |
PRINCIPAL_ROLE:ALL |
| Spark Connect | root:{POLARIS_ROOT_SECRET} |
PRINCIPAL_ROLE:ALL |
Scope is Mandatory
The PRINCIPAL_ROLE:ALL scope must be specified explicitly. Without it,
the default scope is rejected by Polaris with an invalid_scope error.
Management API¶
The management API allows programmatic administration of catalogs, principals, and roles.
Key Endpoints¶
| Endpoint | Method | Purpose |
|---|---|---|
/api/management/v1/catalogs |
GET, POST | List/create catalogs |
/api/management/v1/catalogs/{name} |
GET, DELETE | Get/delete a catalog |
/api/management/v1/principals |
GET, POST | List/create principals |
/api/management/v1/principal-roles |
GET, POST | List/create principal roles |
/api/management/v1/catalogs/{name}/catalog-roles |
GET, POST | List/create catalog roles |
RBAC Model¶
Polaris implements a two-level role model:
Principal (root)
│
▼
Principal Role (ALL)
│
▼
Catalog Role (akko-warehouse-admin)
│ CATALOG_MANAGE_CONTENT privilege
▼
Catalog (akko-warehouse)
Setup Sequence¶
The polaris-init sidecar performs this setup automatically on every
deployment (helm install/helm upgrade):
- Create catalog
akko-warehousewith S3/object storage configuration - Create principal role
ALL - Assign principal role
ALLto therootprincipal - Create catalog role with
CATALOG_MANAGE_CONTENTprivilege - Assign catalog role to the
ALLprincipal role
Never Recreate the Catalog Manually
Always use the polaris-init job to create or recreate the catalog (in Kubernetes: kubectl delete job akko-polaris-init -n akko && helm upgrade ...). The init script contains the correct storageConfigInfo format,
stsUnavailable flag, and full RBAC setup. Manual curl commands
frequently get the format wrong.
Storage Configuration¶
Polaris manages Iceberg table metadata while actual data files live in object storage. The catalog's storage configuration must follow a specific format:
Correct storageConfigInfo Format¶
{
"storageType": "S3",
"allowedLocations": ["s3://akko-warehouse/"],
"stsUnavailable": true,
"endpoint": "http://akko-minio:9000",
"pathStyleAccess": true,
"region": "us-east-1"
}
Format Pitfalls
- Fields must be top-level on
storageConfigInfo-- NOT nested under ans3object. - Do not use dot-notation keys like
s3.endpoint-- they are silently ignored. stsUnavailable: trueis required because object storage does not support AWS STS AssumeRole.
Credential Vending¶
Polaris normally uses AWS STS to vend temporary credentials to clients (Trino, Spark) for direct S3 access. Since object storage does not support STS AssumeRole, this feature is disabled via two settings:
| Setting | Value | Purpose |
|---|---|---|
SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION |
true |
Skip STS credential subscoping |
ALLOW_EXTERNAL_CATALOG_CREDENTIAL_VENDING |
false |
Disable external credential vending |
stsUnavailable (catalog) |
true |
Signal that STS is not available |
Clients (Trino, Spark) use their own direct S3 credentials (object storage access key and secret key) instead of relying on Polaris-vended credentials.
Health Check¶
The health endpoint is on port 8182 (not 8181):
This is a Quarkus health endpoint that returns UP when the service is ready.
Bootstrap Credentials¶
On first startup with an empty database, Polaris generates bootstrap credentials
controlled by the POLARIS_BOOTSTRAP_CREDENTIALS environment variable. These
credentials are the root principal's client ID and secret.
First Start Only
Bootstrap credentials are only applied when the database is empty (first
helm install). If the database is recreated, the init sidecar must
run again to re-establish the catalog and RBAC.
Known Issues¶
Important Gotchas
- Bootstrap credentials persist:
POLARIS_BOOTSTRAP_CREDENTIALSis only read on first startup. Changing it in.envhas no effect on an existing database. - Catalog can disappear: After a database recreate,
polaris-initmay report "already exists" but the catalog is actually gone. Always verify via the management API and recreate if needed. DROP TABLEforbidden: Polaris RBAC blocksDROP_TABLE_WITH_PURGE. UseCALL iceberg.system.unregister_table('schema', 'table')from Trino instead.- Healthcheck port: Use
/q/healthon port 8182, not 8181. Port 8181 serves the REST API only. - Restart after config changes: After changing Polaris configuration, redeploy with
helm upgradeto apply changes.