Skip to content

akko-demo-cloudera

Perimeter 2 of the AKKO demo cluster — simulates an on-prem Cloudera CDP environment (HDFS 3, Hive 3, Kerberos KDC). Runs in its own dedicated namespace akko-demo-cloudera, isolated from the akko platform.

Why

Bank/insurance prospects often have a legacy Hadoop environment with Kerberos. AKKO must show that it federates that legacy through Trino without forcing a migration : hive.cloudera_kerb.* tables show up in the OpenMetadata catalog next to native Iceberg tables.

Components

Service Role Image
kdc MIT Kerberos KDC localhost:30500/akko/akko-cloudera-kdc:2026.05
hdfs-namenode HDFS 3 NameNode localhost:30500/akko/akko-cloudera-hdfs:2026.05
hdfs-datanode DataNode (1 replica) idem
hive-metastore Hive 3 Metastore (Postgres) localhost:30500/akko/akko-cloudera-hive:2026.05
hiveserver2 HiveServer2 + Spark idem

All components communicate via Kerberos principals (hive/_HOST@AKKO.LOCAL, hdfs/_HOST@AKKO.LOCAL). Ports are NodePort-only inside the cluster — no Internet access.

Typical client flow

  1. The customer already has external Hive tables (PARTITIONED BY year).
  2. AKKO Trino receives a service keytab via the akko-trino-cloudera-keytab Secret (generated KDC-side).
  3. Trino federates hive.cloudera_kerb.* ; ADEN writes SQL that joins iceberg.banking_curated.transactions with hive.cloudera_kerb.kyc.
  4. OpenMetadata ingests the metastore as a separate Hive source.

Repo + deployment

  • Repo : https://github.com/AKKO-p/akko-demo-cloudera (private).
  • Helm release : akko-demo-cloudera in its own namespace.
  • Deployment : helm install akko-demo-cloudera ./helm/akko-demo-cloudera/ (separate from the umbrella akko chart).

Limitations

  • Demo only — no NameNode HA, HDFS replica = 1.
  • The KDC restarts fresh on every redeploy ; keytabs are regenerated.
  • No YARN — Spark runs in local mode inside the hiveserver2 pod.

Cross-references