Skip to content

Notebook Catalog

AKKO ships with 14 demo notebooks organized into 6 categories. Together, they demonstrate every layer of the platform: data ingestion, transformation, analytics, visualization, AI, and reporting.

Getting Started

Run 01 - Banking Demo first. It creates the Iceberg tables (customers, accounts, transactions, advisors) and the PostGIS branches table that most other notebooks depend on.

Seed / Core

These notebooks create the foundational data and demonstrate the core lakehouse architecture.

# Notebook Components Kernel Prerequisite
01 akko-banking-demo Spark Connect, Polaris, MinIO, Trino, PostgreSQL Python AKKO started
03 spark-iceberg-demo Spark Connect, Iceberg, MinIO Python AKKO started

01 - Banking Demo: Creates 4 Iceberg tables (advisors, customers, accounts, transactions) via Spark Connect with realistic synthetic data for a retail bank with 5 branches and 200 customers. Demonstrates Iceberg time-travel, Trino federation (Iceberg + PostgreSQL), and provides SQL queries ready for Superset dashboards.

03 - Spark Iceberg Demo: A hands-on tutorial of the Spark + Iceberg + MinIO lakehouse architecture. Covers table CRUD (CREATE, INSERT, UPDATE, DELETE, MERGE INTO), schema evolution (ALTER TABLE ADD COLUMN), time-travel via snapshots, metadata table inspection, and Pandas/Plotly interop.

Engineering

Notebooks focused on data transformation, quality validation, and catalog administration.

# Notebook Components Kernel Prerequisite
05 dbt-transforms dbt-trino, Trino, Iceberg Python 01 Banking Demo
06 data-quality Great Expectations, Trino, Plotly Python 01 Banking Demo
10 polaris-catalog-admin Polaris REST API, OAuth2 Python AKKO started

05 - dbt Transforms: Builds a complete dbt project programmatically, defining staging models (stg_customers, stg_transactions) and mart models (dim_customer_360, fct_monthly_revenue). Runs the full dbt lifecycle: dbt seed, dbt run, dbt test, and dbt docs generate.

06 - Data Quality: Validates the banking dataset using Great Expectations. Defines and runs expectations across customers, accounts, and transactions (not-null, unique, accepted values, ranges). Includes an intentional failure demo with injected bad data.

10 - Polaris Catalog Admin: Explores the Polaris REST API directly from Python. Covers OAuth2 authentication (client_credentials flow), catalog and namespace listing, table metadata inspection (schema, partitions, storage locations), RBAC exploration (principals, principal roles, catalog roles), and health/metrics endpoints.

Analytics

In-process analytics using alternative engines and languages beyond Spark and Trino.

# Notebook Components Kernel Prerequisite
04 duckdb-analytics DuckDB, Trino, Arrow, Polars, PostgreSQL Python 01 Banking Demo
07 r-analytics tidyverse, ggplot2, sf, PostGIS, RPostgres R AKKO started
08 julia-dataframes DataFrames.jl, CSV.jl, Statistics Julia None (self-contained)
11 polars-analytics Polars, Trino, DuckDB, Arrow Python 01 Banking Demo

04 - DuckDB Analytics: Demonstrates DuckDB as AKKO's lightweight in-process analytics engine. Loads Iceberg data via Trino into Arrow tables, then queries with DuckDB SQL. Covers cross-source joins, zero-copy DuckDB/Polars Arrow interop, and performance benchmarks vs Pandas.

07 - R Analytics: Proves that R is a first-class citizen in AKKO. Connects to PostgreSQL via DBI + RPostgres, wrangles data with tidyverse/dplyr, creates ggplot2 charts, and performs PostGIS geospatial analysis with the sf package.

08 - Julia DataFrames: A self-contained introduction to Julia's DataFrames.jl ecosystem. Covers split-apply-combine with groupby + combine, joins, comprehensions, broadcasting, and risk scoring. No external service dependencies.

11 - Polars Analytics: Showcases Polars' Rust-powered DataFrame engine. Demonstrates LazyFrame query plan optimization, window functions with over(), performance benchmarks vs Pandas, and zero-copy Arrow interop with DuckDB.

Visualization

Specialized notebooks for geospatial analysis and declarative charting.

# Notebook Components Kernel Prerequisite
09 geospatial-analysis PostGIS, GeoPandas, Folium, scikit-learn, Trino Python 01 Banking Demo
12 altair-visualization Altair, Vega-Lite, Trino Python 01 Banking Demo

09 - Geospatial Analysis: Spatial analytics combining PostGIS queries (ST_Distance, ST_Buffer, ST_DWithin), GeoPandas, interactive Folium maps with revenue overlays, Trino federation, KMeans geographic clustering, and GeoJSON export.

12 - Altair Visualization: Explores Altair's declarative charting approach built on Vega-Lite. Covers bar charts, line trends, scatter plots, interactive linked selections, faceted small multiples, and composed multi-chart dashboards.

AI

Local LLM inference and retrieval-augmented generation with full data sovereignty.

# Notebook Components Kernel Prerequisite
02 rag-pipeline-demo Ollama, pgvector, LangChain, PostgreSQL Python AKKO started
13 jupyter-ai-demo jupyter-ai, Ollama, LangChain, Trino Python AKKO started + 01 Banking Demo

02 - RAG Pipeline Demo: Implements a full Retrieval-Augmented Generation pipeline using 100% sovereign infrastructure. Generates embeddings with Ollama nomic-embed-text, stores them in PostgreSQL pgvector with HNSW index, and answers questions using LangChain.

13 - Jupyter AI Demo: Demonstrates AI-assisted data analysis with jupyter-ai and Ollama. Covers code generation from natural language, data-driven analysis, SQL generation from schema descriptions. All inference runs locally.

Reporting

Quarto reports and pre-built SQL queries for Superset SQL Lab.

# Notebook Components Kernel Prerequisite
-- 04-akko-banking-report.qmd Quarto, Trino, Plotly Python 01 Banking Demo
-- akko-banking-queries.sql Trino SQL SQL 01 Banking Demo

04-akko-banking-report.qmd: A Quarto document that renders to a standalone HTML report. Queries the banking dataset via Trino and produces an executive summary with KPIs, charts, and segment analysis. Rendered output is served at https://docs.akko.local/reports/akko-banking-report.html.

akko-banking-queries.sql: Ready-to-use SQL queries for Superset SQL Lab. Includes premium customer filtering, balance aggregation by account type, monthly transaction volumes, and federated branch-level revenue queries.

Dependency Graph

01 Banking Demo (seed)
 |
 +--- 04 DuckDB Analytics
 +--- 05 dbt Transforms
 +--- 06 Data Quality
 +--- 09 Geospatial Analysis
 +--- 11 Polars Analytics
 +--- 12 Altair Visualization
 +--- 13 Jupyter AI Demo
 +--- 04-akko-banking-report.qmd
 +--- akko-banking-queries.sql

03 Spark Iceberg Demo .......... standalone
07 R Analytics ................. standalone (uses PostGIS only)
08 Julia DataFrames ............ standalone (fully self-contained)
10 Polaris Catalog Admin ....... standalone (uses Polaris API only)
02 RAG Pipeline Demo ........... standalone (uses Ollama + pgvector)

Kernels Available

All notebooks run inside akko-notebook containers spawned by JupyterHub. Three kernels are available: Python (default), R (IRkernel), and Julia (IJulia). code-server (VS Code in the browser) is also available for file editing and terminal access.