First Notebook¶

This guide walks you through running the AKKO banking demo notebook, which creates Iceberg tables, generates synthetic data, and demonstrates Trino federation.

Open JupyterHub¶

Navigate to https://lab.akko.local
Log in as alice (alice123)
Wait for your notebook server to spawn (first time takes ~30 seconds)

Open the Banking Demo¶

In the JupyterLab file browser, navigate to:

notebooks/getting-started/01-akko-banking-demo.ipynb

Notebooks are read-only

The notebooks/ directory is mounted read-only from the host. To edit a notebook, copy it to your home directory first:

!cp notebooks/getting-started/01-akko-banking-demo.ipynb ~/my-banking-demo.ipynb

Then open the copy from your home directory.

What the Notebook Does¶

The banking demo simulates a retail bank with 5 French branches, 200 customers, and 1000 transactions. It exercises the full AKKO stack in a single notebook:

Step 1 -- Spark Connect Session¶

Connects to Spark via the gRPC protocol (sc://spark-connect:15002). This is a remote Spark session -- no local Spark installation needed. The notebook creates the iceberg.analytics namespace.

spark = SparkSession.builder \
    .remote("sc://spark-connect:15002") \
    .getOrCreate()

Step 2 -- Create Iceberg Tables¶

Four tables are created in the iceberg.analytics namespace, stored as Iceberg format on object storage (S3-compatible storage), with the catalog managed by Apache Polaris:

Table	Rows	Partitioned By	Description
`advisors`	15	--	Bank advisors with specialty and branch
`customers`	200	`segment`	Customers (retail, business, premium)
`accounts`	~350	`account_type`	Checking, savings, investment accounts
`transactions`	1000	`months(transaction_date)`	6 months of card, transfer, deposit operations

Step 3 -- Verify Data¶

The notebook prints row counts for all four tables and displays sample data for each.

Step 4 -- Iceberg Time-Travel¶

Demonstrates Iceberg's snapshot history. Each INSERT creates a new snapshot, and you can query any historical version of the data:

SELECT snapshot_id, committed_at, operation
FROM iceberg.analytics.transactions.snapshots
ORDER BY committed_at

Spark Connect limitation

Iceberg metadata tables (like .snapshots) must be queried with .show() instead of .collect() in Spark Connect mode, due to a SerializedLambda serialization issue.

Step 5 -- Trino Federation¶

The flagship query: a federated join across two different data sources in a single SQL statement:

Iceberg tables (customers, accounts, transactions) stored on object storage
PostGIS table (branches with geospatial coordinates) stored in PostgreSQL

SELECT
    b.name AS branch, b.city,
    COUNT(DISTINCT c.customer_id) AS customer_count,
    COUNT(t.transaction_id) AS transaction_count,
    ROUND(SUM(ABS(t.amount)), 2) AS total_volume
FROM postgresql.geospatial.branches b
JOIN iceberg.analytics.customers c ON c.branch_id = b.id
JOIN iceberg.analytics.accounts a ON a.customer_id = c.customer_id
JOIN iceberg.analytics.transactions t ON t.account_id = a.account_id
WHERE a.status = 'active'
GROUP BY b.name, b.city
ORDER BY total_volume DESC

This query is executed by Trino, which federates across the Iceberg catalog (via Polaris REST) and the PostgreSQL catalog transparently.

Step 6 -- SQL Queries for Superset¶

The notebook prints ready-to-use SQL queries that you can paste into Superset SQL Lab, including KPIs, monthly volume breakdowns, spending categories, and the federated branch revenue query.

Run It Cell by Cell¶

Select the first cell and press ++shift+enter++ to run it. Continue through each cell in order. The entire notebook takes about 2-3 minutes to complete.

Run cells in order

Each cell depends on the previous ones. Do not skip cells or run them out of order, as later cells reference tables created by earlier cells.

Expected output at the verification step:

========================================
  AKKO Banking -- Iceberg Tables
========================================
  advisors             :     15 rows
  customers            :    200 rows
  accounts             :   ~350 rows
  transactions         :   1000 rows
========================================

After Running the Notebook¶

Once the notebook completes successfully, the data is available everywhere in the platform:

Trino -- Query tables at iceberg.analytics.* via the Trino UI or any SQL client
Superset -- The auto-provisioned dashboard now displays live data. Navigate to Dashboards > AKKO Banking Overview and refresh
Other notebooks -- All notebooks that query iceberg.analytics.* will see the data
OpenMetadata -- If the governance profile is running, the catalog can ingest these tables for metadata management

Architecture Recap¶

  Notebook (Spark Connect)         PostgreSQL
         |                        +--------------------+
         | gRPC :15002            | geospatial         |
         v                        |   .branches (5)    |
  Spark Connect --> Polaris       +--------+-----------+
         |              |                  |
         v              v                  |
       object storage (S3)                          |
    +---------------+                      |
    |  analytics     |                     |
    |  .advisors     |                     |
    |  .customers    |                     |
    |  .accounts     |                     |
    |  .transactions |                     |
    +------+---------+                     |
           |                               |
    +------v-------------------------------v--+
    |              TRINO (federation)          |
    +------------------+-----------------------+
                       |
                +------v------+
                |   SUPERSET  |
                |  Dashboard  |
                +-------------+

Explore Other Notebooks¶

AKKO ships with 14 notebooks organized by category. After completing the banking demo, try these:

Getting Started¶

#	Notebook	Description
01	`akko-banking-demo`	Banking data model, Spark Connect, Trino federation (this guide)
03	`spark-iceberg-demo`	Deep dive into Iceberg features (schema evolution, partitioning, time-travel)

AI¶

#	Notebook	Description
02	`rag-pipeline-demo`	RAG pipeline with Ollama, pgvector, and LangChain
13	`akko-jupyter-ai-demo`	Jupyter AI integration with local LLMs via Ollama

Analytics¶

#	Notebook	Description
04	`akko-duckdb-analytics`	In-process analytics with DuckDB on Iceberg data
07	`akko-r-analytics`	R kernel: tidyverse analytics on banking data
08	`akko-julia-dataframes`	Julia kernel: DataFrames.jl on banking data
11	`akko-polars-analytics`	Polars DataFrame library for fast analytics

Engineering¶

#	Notebook	Description
05	`akko-dbt-transforms`	dbt transformations on Iceberg tables
06	`akko-data-quality`	Data quality checks and validation
10	`akko-polaris-catalog-admin`	Polaris catalog administration via REST API

Visualization¶

#	Notebook	Description
09	`akko-geospatial-analysis`	PostGIS geospatial analysis with branch locations
12	`akko-altair-visualization`	Interactive Altair/Vega charts

Reports¶

File	Description
`04-akko-banking-report.qmd`	Quarto report rendered to HTML, served at `https://docs.akko.local/reports/`

Notebook numbering

Notebooks are numbered in suggested reading order. Start with 01 (this guide), then try 02 (RAG) or 03 (Iceberg deep dive) depending on your interest.

Next Steps¶

Explore the Superset dashboard with live data from the banking demo
Try the RAG pipeline notebook to build a retrieval-augmented generation system with Ollama and pgvector
Learn about Trino federation and how to add your own data sources