Federation — Cloudera Hive cross-Lakehouse JOIN¶
Persona path: alice (setup) → carol (analyst) · Catalogs:
cloudera(external) andpostgres_oltp_banking(internal) · Duration: ~20 min · Difficulty: star star starStatus note: this demo expects the Cloudera catalog provisioned by the demo perimeter
akko-demo-cloudera. If the catalog is not yet live, follow Admin / Catalog Manager Pro to add it manually.
This demo proves that AKKO can federate an external Cloudera Hive cluster and run a single SQL that joins a Cloudera table with an internal AKKO source. Carol does not need to copy data or change tooling.
What this proves¶
- An external Cloudera Hive cluster can be added as a catalog without code changes.
- A single SQL query joins the external Hive table with an internal Postgres source.
- The Query Engine pushes predicates down to both sides.
- Carol's role determines what columns and rows she sees on both sides.
Pre-requisites¶
- Demo URL:
https://demo.akko-ai.com - An external Cloudera CDP cluster reachable from the AKKO cluster (Kerberos enabled).
- Catalog
postgres_oltp_bankingalready federated. - 2 personas provisioned:
alice,carol.
Step 1 — Alice federates the Cloudera catalog¶
Sign in as alice. Navigate to Administration → Catalog Manager Pro.
Click Add catalog → Hive (Cloudera).
Fill the form:
- Catalog name:
cloudera - Metastore URI:
thrift://hms.cloudera.akko-demo:9083 - Default file format:
PARQUET - Auth: Kerberos
- Principal:
akko-query/_HOST@AKKO-DEMO.LOCAL - Keytab: upload
akko-query.keytab(encrypted at rest)
Click Test connection. Expected: Connection OK, 4 databases discovered.
Click Save. The Catalog auto-syncs metadata in the background.
Screenshot: tests/e2e/playwright/artefacts/demos/federation-cloudera/01-add-catalog.png
Step 2 — Alice verifies catalog visibility¶
Sign out, sign in as carol. Navigate to Governance → Catalog explorer.
Expected: a new catalog cloudera appears with the discovered databases. Click cloudera → banking_external.
Expected tables:
banking_external.partner_transactions (12 400 000 rows)
banking_external.partner_merchants (38 000 rows)
banking_external.partner_countries (240 rows)
Screenshot: tests/e2e/playwright/artefacts/demos/federation-cloudera/02-cloudera-catalog.png
Step 3 — Carol writes the cross-catalog JOIN¶
Open DevHub → SQL editor. Run:
SELECT
i.country_code,
count(DISTINCT i.transaction_id) AS internal_tx,
count(DISTINCT e.partner_tx_id) AS external_tx,
avg(e.exchange_rate) AS avg_rate
FROM postgres_oltp_banking.public.transactions i
JOIN cloudera.banking_external.partner_transactions e
ON e.merchant_id = i.merchant_id
AND e.tx_date = date(i.transaction_ts)
WHERE i.transaction_ts >= current_date - INTERVAL '30' DAY
GROUP BY i.country_code
ORDER BY internal_tx DESC
LIMIT 20;
Expected: 20 rows in ~4 s. Top 5:
| country | internal_tx | external_tx | avg_rate |
| FR | 84 120 | 78 940 | 1.0000 |
| DE | 41 320 | 39 870 | 1.0000 |
| US | 28 410 | 27 600 | 1.0820 |
| GB | 19 240 | 18 990 | 0.8612 |
| CH | 12 410 | 12 100 | 0.9714 |
Screenshot: tests/e2e/playwright/artefacts/demos/federation-cloudera/03-join-result.png
Step 4 — Carol inspects predicate pushdown¶
In the SQL editor, click Explain → Distributed plan. Expected: the plan shows two sources, with the date filter pushed to the Hive side and the country filter pushed to the Postgres side.
Output [country_code, internal_tx, external_tx, avg_rate]
Aggregate
InnerJoin [on merchant_id, tx_date]
RemoteSource postgres_oltp_banking (predicate: ts >= now - 30 days)
RemoteSource cloudera (predicate: tx_date >= 2026-04-17)
Screenshot: tests/e2e/playwright/artefacts/demos/federation-cloudera/04-explain-plan.png
Step 5 — Carol promotes the dashboard¶
Click Promote to dashboard. ADEN proposes a 4-tile dashboard:
- KPI — Internal tx total (30 d)
- KPI — External tx total (30 d)
- Bar — Internal vs External per country
- Table — Per country breakdown
+-------------------- AKKO Federation — Banking cross-source ----------+
| 384 200 internal tx | 372 100 external tx |
+----------------------------------------------------------------------+
| [ Bar: internal vs external per country ] [ Table top 20 ] |
+----------------------------------------------------------------------+
Screenshot: tests/e2e/playwright/artefacts/demos/federation-cloudera/05-dashboard.png
Cleanup¶
- Sign out.
- Optional: keep the catalog for the next demo cycle.
What this proves¶
- An external Cloudera cluster is a 1-form integration.
- A single SQL JOINs internal and external data without ETL.
- The Query Engine pushes predicates to both sides.
- Carol's role applies on both catalogs (e.g. masked PII on the internal side).
Files in the repo¶
| File | Role |
|---|---|
helm/akko/charts/akko-catalog-manager/templates/secret-cloudera.yaml |
Kerberos keytab Secret template |
trino/catalog/cloudera.properties |
Generated catalog file |