Skip to content

First Notebook

This guide walks you through running the AKKO banking demo notebook, which creates Iceberg tables, generates synthetic data, and demonstrates Trino federation.

Open JupyterHub

  1. Navigate to https://lab.akko.local
  2. Log in as alice (alice123)
  3. Wait for your notebook server to spawn (first time takes ~30 seconds)

Open the Banking Demo

In the JupyterLab file browser, navigate to:

notebooks/getting-started/01-akko-banking-demo.ipynb

Notebooks are read-only

The notebooks/ directory is mounted read-only from the host. To edit a notebook, copy it to your home directory first:

!cp notebooks/getting-started/01-akko-banking-demo.ipynb ~/my-banking-demo.ipynb

Then open the copy from your home directory.

What the Notebook Does

The banking demo simulates a retail bank with 5 French branches, 200 customers, and 1000 transactions. It exercises the full AKKO stack in a single notebook:

Step 1 -- Spark Connect Session

Connects to Spark via the gRPC protocol (sc://spark-connect:15002). This is a remote Spark session -- no local Spark installation needed. The notebook creates the iceberg.analytics namespace.

spark = SparkSession.builder \
    .remote("sc://spark-connect:15002") \
    .getOrCreate()

Step 2 -- Create Iceberg Tables

Four tables are created in the iceberg.analytics namespace, stored as Iceberg format on object storage (S3-compatible storage), with the catalog managed by Apache Polaris:

Table Rows Partitioned By Description
advisors 15 -- Bank advisors with specialty and branch
customers 200 segment Customers (retail, business, premium)
accounts ~350 account_type Checking, savings, investment accounts
transactions 1000 months(transaction_date) 6 months of card, transfer, deposit operations

Step 3 -- Verify Data

The notebook prints row counts for all four tables and displays sample data for each.

Step 4 -- Iceberg Time-Travel

Demonstrates Iceberg's snapshot history. Each INSERT creates a new snapshot, and you can query any historical version of the data:

SELECT snapshot_id, committed_at, operation
FROM iceberg.analytics.transactions.snapshots
ORDER BY committed_at

Spark Connect limitation

Iceberg metadata tables (like .snapshots) must be queried with .show() instead of .collect() in Spark Connect mode, due to a SerializedLambda serialization issue.

Step 5 -- Trino Federation

The flagship query: a federated join across two different data sources in a single SQL statement:

  • Iceberg tables (customers, accounts, transactions) stored on object storage
  • PostGIS table (branches with geospatial coordinates) stored in PostgreSQL
SELECT
    b.name AS branch, b.city,
    COUNT(DISTINCT c.customer_id) AS customer_count,
    COUNT(t.transaction_id) AS transaction_count,
    ROUND(SUM(ABS(t.amount)), 2) AS total_volume
FROM postgresql.geospatial.branches b
JOIN iceberg.analytics.customers c ON c.branch_id = b.id
JOIN iceberg.analytics.accounts a ON a.customer_id = c.customer_id
JOIN iceberg.analytics.transactions t ON t.account_id = a.account_id
WHERE a.status = 'active'
GROUP BY b.name, b.city
ORDER BY total_volume DESC

This query is executed by Trino, which federates across the Iceberg catalog (via Polaris REST) and the PostgreSQL catalog transparently.

Step 6 -- SQL Queries for Superset

The notebook prints ready-to-use SQL queries that you can paste into Superset SQL Lab, including KPIs, monthly volume breakdowns, spending categories, and the federated branch revenue query.

Run It Cell by Cell

Select the first cell and press ++shift+enter++ to run it. Continue through each cell in order. The entire notebook takes about 2-3 minutes to complete.

Run cells in order

Each cell depends on the previous ones. Do not skip cells or run them out of order, as later cells reference tables created by earlier cells.

Expected output at the verification step:

========================================
  AKKO Banking -- Iceberg Tables
========================================
  advisors             :     15 rows
  customers            :    200 rows
  accounts             :   ~350 rows
  transactions         :   1000 rows
========================================

After Running the Notebook

Once the notebook completes successfully, the data is available everywhere in the platform:

  • Trino -- Query tables at iceberg.analytics.* via the Trino UI or any SQL client
  • Superset -- The auto-provisioned dashboard now displays live data. Navigate to Dashboards > AKKO Banking Overview and refresh
  • Other notebooks -- All notebooks that query iceberg.analytics.* will see the data
  • OpenMetadata -- If the governance profile is running, the catalog can ingest these tables for metadata management

Architecture Recap

  Notebook (Spark Connect)         PostgreSQL
         |                        +--------------------+
         | gRPC :15002            | geospatial         |
         v                        |   .branches (5)    |
  Spark Connect --> Polaris       +--------+-----------+
         |              |                  |
         v              v                  |
       object storage (S3)                          |
    +---------------+                      |
    |  analytics     |                     |
    |  .advisors     |                     |
    |  .customers    |                     |
    |  .accounts     |                     |
    |  .transactions |                     |
    +------+---------+                     |
           |                               |
    +------v-------------------------------v--+
    |              TRINO (federation)          |
    +------------------+-----------------------+
                       |
                +------v------+
                |   SUPERSET  |
                |  Dashboard  |
                +-------------+

Explore Other Notebooks

AKKO ships with 14 notebooks organized by category. After completing the banking demo, try these:

Getting Started

# Notebook Description
01 akko-banking-demo Banking data model, Spark Connect, Trino federation (this guide)
03 spark-iceberg-demo Deep dive into Iceberg features (schema evolution, partitioning, time-travel)

AI

# Notebook Description
02 rag-pipeline-demo RAG pipeline with Ollama, pgvector, and LangChain
13 akko-jupyter-ai-demo Jupyter AI integration with local LLMs via Ollama

Analytics

# Notebook Description
04 akko-duckdb-analytics In-process analytics with DuckDB on Iceberg data
07 akko-r-analytics R kernel: tidyverse analytics on banking data
08 akko-julia-dataframes Julia kernel: DataFrames.jl on banking data
11 akko-polars-analytics Polars DataFrame library for fast analytics

Engineering

# Notebook Description
05 akko-dbt-transforms dbt transformations on Iceberg tables
06 akko-data-quality Data quality checks and validation
10 akko-polaris-catalog-admin Polaris catalog administration via REST API

Visualization

# Notebook Description
09 akko-geospatial-analysis PostGIS geospatial analysis with branch locations
12 akko-altair-visualization Interactive Altair/Vega charts

Reports

File Description
04-akko-banking-report.qmd Quarto report rendered to HTML, served at https://docs.akko.local/reports/

Notebook numbering

Notebooks are numbered in suggested reading order. Start with 01 (this guide), then try 02 (RAG) or 03 (Iceberg deep dive) depending on your interest.

Next Steps