OpenMetadata¶

Data governance, catalog, lineage, and quality.


URL	`https://catalog.akko.local`
Authentication	Keycloak SSO (confidential client)
Helm sub-charts	`openmetadata` (community chart), `akko-opensearch` (custom)
Enabled	Enabled by default since Sprint 14 (requires 16 GB+ cluster memory)

Overview¶

OpenMetadata provides a centralized data governance layer for the AKKO platform. It serves as the metadata catalog where data engineers, analysts, and data owners can discover datasets, trace lineage, define glossaries, and monitor data quality.

Memory requirement -- 16 GB+ cluster memory

OpenMetadata and its dependencies (OpenSearch) are memory-intensive. OpenMetadata is enabled by default in AKKO (decision 2026-03-14), but the cluster must have at least 16 GB of available memory (Docker Desktop for k3d, node memory for k3s/EKS/AKS/GKE).

Deployment¶

OpenMetadata is enabled by default in helm/akko/values.yaml. To disable it on memory-constrained clusters:

# values-dev.yaml or your override file
openmetadata:
  enabled: false
akko-opensearch:
  enabled: false

Then:

helm upgrade akko helm/akko/ -n akko -f values-dev.yaml

This starts four additional services:

Service	Role	Memory
`openmetadata-migrate`	Runs database migrations on startup, then exits	Minimal
`openmetadata-server`	Main application server (API + UI)	~1 GB
`openmetadata-ingestion`	Metadata ingestion workers	~512 MB
`opensearch`	Search engine backend (replaces Elasticsearch)	~512 MB heap

Features¶

Data Catalog¶

Browse and search all datasets ingested from Trino, including Iceberg tables and PostgreSQL tables. Each asset includes:

Column-level schema with types
Tags and classifications
Ownership (team or individual)
Descriptions (table and column level)
Data profiling statistics

Lineage Visualization¶

Track how data flows through the platform:

Airflow DAG --> Iceberg tables --> Superset dashboard

Lineage is captured at the pipeline, table, and dashboard level, giving full visibility into upstream dependencies and downstream consumers.

Glossary¶

A shared business vocabulary with standardized term definitions. Pre-configured glossary terms cover the banking demo domain (e.g., customer segment, account type, transaction category).

Data Quality¶

Test suites and test cases that validate data integrity. Run automated checks on table freshness, row counts, column values, and custom SQL assertions.

Domains and Data Products¶

Organize assets into business domains and publishable data products with defined SLOs and ownership.

Pre-Configured Content¶

AKKO includes two enrichment scripts that populate OpenMetadata with demo content after ingestion:

V1 Enrichment (`openmetadata/enrich_catalog.py`)¶

Category	Count	Details
Classifications	3	PII, DataTier, Domain
Tags	8	Across all three classifications
Glossary terms	8	Banking domain vocabulary
Teams	3	With team-based ownership assignments
Users	3	Mapped to teams
Descriptions	--	Table and column-level descriptions

V2 Enrichment (`openmetadata/enrich_catalog_v2.py`)¶

Category	Count	Details
Domains	3	Business domains for asset organization
Data products	2	Publishable data product definitions
Dashboard ingestion	--	Superset dashboards registered in catalog
Pipeline ingestion	--	Airflow pipelines registered in catalog
Test suites	3	One per core table
Test cases	7	Freshness, row count, column value checks
Lineage	--	Pipeline to table to dashboard connections

Running the Enrichment Scripts¶

After OpenMetadata is running and initial ingestion is complete:

# V1: Tags, glossary, owners, descriptions
kubectl exec -n akko deploy/openmetadata-ingestion -- python /opt/scripts/enrich_catalog.py

# V2: Domains, data products, quality, lineage
kubectl exec -n akko deploy/openmetadata-ingestion -- python /opt/scripts/enrich_catalog_v2.py

Scripts are idempotent

Both enrichment scripts check for existing resources before creating new ones. They can be run multiple times safely.

Components Architecture¶

                    catalog.akko.local
                          |
                     Traefik (TLS)
                          |
               openmetadata-server (:8585)
                    |           |
           akko-opensearch (:9200)   akko-postgresql (metadata DB)
                                |
               openmetadata-ingestion
                    |
              +-----+-----+
              |     |     |
           Trino  Airflow  Superset

OpenMetadata stores its metadata in the shared PostgreSQL instance (akko-postgresql, database: openmetadata_db) and uses OpenSearch (akko-opensearch) for full-text search and indexing.

Authentication¶

OpenMetadata uses a confidential Keycloak client (server-side auth code flow, not SPA implicit flow). This avoids login loop issues with self-signed certificates in Safari.

Configuration is handled via environment variables in the Helm values:

AUTHENTICATION_CLIENT_TYPE: confidential
12 OIDC_* environment variables for pac4j auth code flow
Self-signed TLS certificate imported into Java truststore at container startup

Known Issues¶

OOM on 8 GB Docker Desktop

OpenMetadata Server (~1 GB) + OpenSearch (~512 MB heap + native memory) will cause OOM restart loops if Docker Desktop is configured with less than 16 GB of RAM. Always use the governance profile on a machine with sufficient resources.

Migrate command syntax

The OpenMetadata migration command is:

./bootstrap/openmetadata-ops.sh migrate

Not ./openmetadata.sh migrate (which does not exist). The openmetadata-migrate service handles this automatically at startup.

OpenSearch memory

OpenSearch requires -Xms512m -Xmx512m minimum. Setting the heap below this (e.g., 768m total container limit) causes restart loops because OpenSearch needs both heap and native memory (~1 GB total).

API serialization quirks

The OpenMetadata REST API has specific serialization requirements:

PUT endpoints expect FQN strings for service, testSuite, and domain fields -- not {id, type} objects
Data product assets must be added via PATCH after creation, not in the initial PUT
Test suites: POST to /testSuites/executable requires both basicEntityReference and executableEntityReference (table FQN)

Bot token encryption

The OpenMetadata bot token is Fernet-encrypted in the user_entity table. The fernet: prefix must be stripped before decrypting. The database name is openmetadata_db.