Skip to content

OpenMetadata

Data governance, catalog, lineage, and quality.

URL https://catalog.akko.local
Authentication Keycloak SSO (confidential client)
Helm sub-charts openmetadata (community chart), akko-opensearch (custom)
Enabled Enabled by default since Sprint 14 (requires 16 GB+ cluster memory)

Overview

OpenMetadata provides a centralized data governance layer for the AKKO platform. It serves as the metadata catalog where data engineers, analysts, and data owners can discover datasets, trace lineage, define glossaries, and monitor data quality.

Memory requirement -- 16 GB+ cluster memory

OpenMetadata and its dependencies (OpenSearch) are memory-intensive. OpenMetadata is enabled by default in AKKO (decision 2026-03-14), but the cluster must have at least 16 GB of available memory (Docker Desktop for k3d, node memory for k3s/EKS/AKS/GKE).


Deployment

OpenMetadata is enabled by default in helm/akko/values.yaml. To disable it on memory-constrained clusters:

# values-dev.yaml or your override file
openmetadata:
  enabled: false
akko-opensearch:
  enabled: false

Then:

helm upgrade akko helm/akko/ -n akko -f values-dev.yaml

This starts four additional services:

Service Role Memory
openmetadata-migrate Runs database migrations on startup, then exits Minimal
openmetadata-server Main application server (API + UI) ~1 GB
openmetadata-ingestion Metadata ingestion workers ~512 MB
opensearch Search engine backend (replaces Elasticsearch) ~512 MB heap

Features

Data Catalog

Browse and search all datasets ingested from Trino, including Iceberg tables and PostgreSQL tables. Each asset includes:

  • Column-level schema with types
  • Tags and classifications
  • Ownership (team or individual)
  • Descriptions (table and column level)
  • Data profiling statistics

Lineage Visualization

Track how data flows through the platform:

Airflow DAG --> Iceberg tables --> Superset dashboard

Lineage is captured at the pipeline, table, and dashboard level, giving full visibility into upstream dependencies and downstream consumers.

Glossary

A shared business vocabulary with standardized term definitions. Pre-configured glossary terms cover the banking demo domain (e.g., customer segment, account type, transaction category).

Data Quality

Test suites and test cases that validate data integrity. Run automated checks on table freshness, row counts, column values, and custom SQL assertions.

Domains and Data Products

Organize assets into business domains and publishable data products with defined SLOs and ownership.


Pre-Configured Content

AKKO includes two enrichment scripts that populate OpenMetadata with demo content after ingestion:

V1 Enrichment (openmetadata/enrich_catalog.py)

Category Count Details
Classifications 3 PII, DataTier, Domain
Tags 8 Across all three classifications
Glossary terms 8 Banking domain vocabulary
Teams 3 With team-based ownership assignments
Users 3 Mapped to teams
Descriptions -- Table and column-level descriptions

V2 Enrichment (openmetadata/enrich_catalog_v2.py)

Category Count Details
Domains 3 Business domains for asset organization
Data products 2 Publishable data product definitions
Dashboard ingestion -- Superset dashboards registered in catalog
Pipeline ingestion -- Airflow pipelines registered in catalog
Test suites 3 One per core table
Test cases 7 Freshness, row count, column value checks
Lineage -- Pipeline to table to dashboard connections

Running the Enrichment Scripts

After OpenMetadata is running and initial ingestion is complete:

# V1: Tags, glossary, owners, descriptions
kubectl exec -n akko deploy/openmetadata-ingestion -- python /opt/scripts/enrich_catalog.py

# V2: Domains, data products, quality, lineage
kubectl exec -n akko deploy/openmetadata-ingestion -- python /opt/scripts/enrich_catalog_v2.py

Scripts are idempotent

Both enrichment scripts check for existing resources before creating new ones. They can be run multiple times safely.


Components Architecture

                    catalog.akko.local
                          |
                     Traefik (TLS)
                          |
               openmetadata-server (:8585)
                    |           |
           akko-opensearch (:9200)   akko-postgresql (metadata DB)
                                |
               openmetadata-ingestion
                    |
              +-----+-----+
              |     |     |
           Trino  Airflow  Superset

OpenMetadata stores its metadata in the shared PostgreSQL instance (akko-postgresql, database: openmetadata_db) and uses OpenSearch (akko-opensearch) for full-text search and indexing.


Authentication

OpenMetadata uses a confidential Keycloak client (server-side auth code flow, not SPA implicit flow). This avoids login loop issues with self-signed certificates in Safari.

Configuration is handled via environment variables in the Helm values:

  • AUTHENTICATION_CLIENT_TYPE: confidential
  • 12 OIDC_* environment variables for pac4j auth code flow
  • Self-signed TLS certificate imported into Java truststore at container startup

Known Issues

OOM on 8 GB Docker Desktop

OpenMetadata Server (~1 GB) + OpenSearch (~512 MB heap + native memory) will cause OOM restart loops if Docker Desktop is configured with less than 16 GB of RAM. Always use the governance profile on a machine with sufficient resources.

Migrate command syntax

The OpenMetadata migration command is:

./bootstrap/openmetadata-ops.sh migrate
Not ./openmetadata.sh migrate (which does not exist). The openmetadata-migrate service handles this automatically at startup.

OpenSearch memory

OpenSearch requires -Xms512m -Xmx512m minimum. Setting the heap below this (e.g., 768m total container limit) causes restart loops because OpenSearch needs both heap and native memory (~1 GB total).

API serialization quirks

The OpenMetadata REST API has specific serialization requirements:

  • PUT endpoints expect FQN strings for service, testSuite, and domain fields -- not {id, type} objects
  • Data product assets must be added via PATCH after creation, not in the initial PUT
  • Test suites: POST to /testSuites/executable requires both basicEntityReference and executableEntityReference (table FQN)
Bot token encryption

The OpenMetadata bot token is Fernet-encrypted in the user_entity table. The fernet: prefix must be stripped before decrypting. The database name is openmetadata_db.