OpenMetadata¶
Data governance, catalog, lineage, and quality.
| URL | https://catalog.akko.local |
| Authentication | Keycloak SSO (confidential client) |
| Helm sub-charts | openmetadata (community chart), akko-opensearch (custom) |
| Enabled | Enabled by default since Sprint 14 (requires 16 GB+ cluster memory) |
Overview¶
OpenMetadata provides a centralized data governance layer for the AKKO platform. It serves as the metadata catalog where data engineers, analysts, and data owners can discover datasets, trace lineage, define glossaries, and monitor data quality.
Memory requirement -- 16 GB+ cluster memory
OpenMetadata and its dependencies (OpenSearch) are memory-intensive. OpenMetadata is enabled by default in AKKO (decision 2026-03-14), but the cluster must have at least 16 GB of available memory (Docker Desktop for k3d, node memory for k3s/EKS/AKS/GKE).
Deployment¶
OpenMetadata is enabled by default in helm/akko/values.yaml. To disable it on memory-constrained clusters:
# values-dev.yaml or your override file
openmetadata:
enabled: false
akko-opensearch:
enabled: false
Then:
This starts four additional services:
| Service | Role | Memory |
|---|---|---|
openmetadata-migrate |
Runs database migrations on startup, then exits | Minimal |
openmetadata-server |
Main application server (API + UI) | ~1 GB |
openmetadata-ingestion |
Metadata ingestion workers | ~512 MB |
opensearch |
Search engine backend (replaces Elasticsearch) | ~512 MB heap |
Features¶
Data Catalog¶
Browse and search all datasets ingested from Trino, including Iceberg tables and PostgreSQL tables. Each asset includes:
- Column-level schema with types
- Tags and classifications
- Ownership (team or individual)
- Descriptions (table and column level)
- Data profiling statistics
Lineage Visualization¶
Track how data flows through the platform:
Lineage is captured at the pipeline, table, and dashboard level, giving full visibility into upstream dependencies and downstream consumers.
Glossary¶
A shared business vocabulary with standardized term definitions. Pre-configured glossary terms cover the banking demo domain (e.g., customer segment, account type, transaction category).
Data Quality¶
Test suites and test cases that validate data integrity. Run automated checks on table freshness, row counts, column values, and custom SQL assertions.
Domains and Data Products¶
Organize assets into business domains and publishable data products with defined SLOs and ownership.
Pre-Configured Content¶
AKKO includes two enrichment scripts that populate OpenMetadata with demo content after ingestion:
V1 Enrichment (openmetadata/enrich_catalog.py)¶
| Category | Count | Details |
|---|---|---|
| Classifications | 3 | PII, DataTier, Domain |
| Tags | 8 | Across all three classifications |
| Glossary terms | 8 | Banking domain vocabulary |
| Teams | 3 | With team-based ownership assignments |
| Users | 3 | Mapped to teams |
| Descriptions | -- | Table and column-level descriptions |
V2 Enrichment (openmetadata/enrich_catalog_v2.py)¶
| Category | Count | Details |
|---|---|---|
| Domains | 3 | Business domains for asset organization |
| Data products | 2 | Publishable data product definitions |
| Dashboard ingestion | -- | Superset dashboards registered in catalog |
| Pipeline ingestion | -- | Airflow pipelines registered in catalog |
| Test suites | 3 | One per core table |
| Test cases | 7 | Freshness, row count, column value checks |
| Lineage | -- | Pipeline to table to dashboard connections |
Running the Enrichment Scripts¶
After OpenMetadata is running and initial ingestion is complete:
# V1: Tags, glossary, owners, descriptions
kubectl exec -n akko deploy/openmetadata-ingestion -- python /opt/scripts/enrich_catalog.py
# V2: Domains, data products, quality, lineage
kubectl exec -n akko deploy/openmetadata-ingestion -- python /opt/scripts/enrich_catalog_v2.py
Scripts are idempotent
Both enrichment scripts check for existing resources before creating new ones. They can be run multiple times safely.
Components Architecture¶
catalog.akko.local
|
Traefik (TLS)
|
openmetadata-server (:8585)
| |
akko-opensearch (:9200) akko-postgresql (metadata DB)
|
openmetadata-ingestion
|
+-----+-----+
| | |
Trino Airflow Superset
OpenMetadata stores its metadata in the shared PostgreSQL instance (akko-postgresql, database: openmetadata_db) and uses OpenSearch (akko-opensearch) for full-text search and indexing.
Authentication¶
OpenMetadata uses a confidential Keycloak client (server-side auth code flow, not SPA implicit flow). This avoids login loop issues with self-signed certificates in Safari.
Configuration is handled via environment variables in the Helm values:
AUTHENTICATION_CLIENT_TYPE: confidential- 12
OIDC_*environment variables for pac4j auth code flow - Self-signed TLS certificate imported into Java truststore at container startup
Known Issues¶
OOM on 8 GB Docker Desktop
OpenMetadata Server (~1 GB) + OpenSearch (~512 MB heap + native memory) will cause OOM restart loops if Docker Desktop is configured with less than 16 GB of RAM. Always use the governance profile on a machine with sufficient resources.
Migrate command syntax
The OpenMetadata migration command is:
Not./openmetadata.sh migrate (which does not exist). The openmetadata-migrate service handles this automatically at startup.
OpenSearch memory
OpenSearch requires -Xms512m -Xmx512m minimum. Setting the heap below this (e.g., 768m total container limit) causes restart loops because OpenSearch needs both heap and native memory (~1 GB total).
API serialization quirks
The OpenMetadata REST API has specific serialization requirements:
- PUT endpoints expect FQN strings for
service,testSuite, anddomainfields -- not{id, type}objects - Data product assets must be added via PATCH after creation, not in the initial PUT
- Test suites: POST to
/testSuites/executablerequires bothbasicEntityReferenceandexecutableEntityReference(table FQN)
Bot token encryption
The OpenMetadata bot token is Fernet-encrypted in the user_entity table. The fernet: prefix must be stripped before decrypting. The database name is openmetadata_db.