Notebook Catalog¶
AKKO ships with 14 demo notebooks organized into 6 categories. Together, they demonstrate every layer of the platform: data ingestion, transformation, analytics, visualization, AI, and reporting.
Getting Started
Run 01 - Banking Demo first. It creates the Iceberg tables (customers, accounts, transactions, advisors) and the PostGIS branches table that most other notebooks depend on.
Seed / Core¶
These notebooks create the foundational data and demonstrate the core lakehouse architecture.
| # | Notebook | Components | Kernel | Prerequisite |
|---|---|---|---|---|
| 01 | akko-banking-demo | Spark Connect, Polaris, MinIO, Trino, PostgreSQL | Python | AKKO started |
| 03 | spark-iceberg-demo | Spark Connect, Iceberg, MinIO | Python | AKKO started |
01 - Banking Demo: Creates 4 Iceberg tables (advisors, customers, accounts, transactions) via Spark Connect with realistic synthetic data for a retail bank with 5 branches and 200 customers. Demonstrates Iceberg time-travel, Trino federation (Iceberg + PostgreSQL), and provides SQL queries ready for Superset dashboards.
03 - Spark Iceberg Demo: A hands-on tutorial of the Spark + Iceberg + MinIO lakehouse architecture. Covers table CRUD (CREATE, INSERT, UPDATE, DELETE, MERGE INTO), schema evolution (ALTER TABLE ADD COLUMN), time-travel via snapshots, metadata table inspection, and Pandas/Plotly interop.
Engineering¶
Notebooks focused on data transformation, quality validation, and catalog administration.
| # | Notebook | Components | Kernel | Prerequisite |
|---|---|---|---|---|
| 05 | dbt-transforms | dbt-trino, Trino, Iceberg | Python | 01 Banking Demo |
| 06 | data-quality | Great Expectations, Trino, Plotly | Python | 01 Banking Demo |
| 10 | polaris-catalog-admin | Polaris REST API, OAuth2 | Python | AKKO started |
05 - dbt Transforms: Builds a complete dbt project programmatically, defining staging models (stg_customers, stg_transactions) and mart models (dim_customer_360, fct_monthly_revenue). Runs the full dbt lifecycle: dbt seed, dbt run, dbt test, and dbt docs generate.
06 - Data Quality: Validates the banking dataset using Great Expectations. Defines and runs expectations across customers, accounts, and transactions (not-null, unique, accepted values, ranges). Includes an intentional failure demo with injected bad data.
10 - Polaris Catalog Admin: Explores the Polaris REST API directly from Python. Covers OAuth2 authentication (client_credentials flow), catalog and namespace listing, table metadata inspection (schema, partitions, storage locations), RBAC exploration (principals, principal roles, catalog roles), and health/metrics endpoints.
Analytics¶
In-process analytics using alternative engines and languages beyond Spark and Trino.
| # | Notebook | Components | Kernel | Prerequisite |
|---|---|---|---|---|
| 04 | duckdb-analytics | DuckDB, Trino, Arrow, Polars, PostgreSQL | Python | 01 Banking Demo |
| 07 | r-analytics | tidyverse, ggplot2, sf, PostGIS, RPostgres | R | AKKO started |
| 08 | julia-dataframes | DataFrames.jl, CSV.jl, Statistics | Julia | None (self-contained) |
| 11 | polars-analytics | Polars, Trino, DuckDB, Arrow | Python | 01 Banking Demo |
04 - DuckDB Analytics: Demonstrates DuckDB as AKKO's lightweight in-process analytics engine. Loads Iceberg data via Trino into Arrow tables, then queries with DuckDB SQL. Covers cross-source joins, zero-copy DuckDB/Polars Arrow interop, and performance benchmarks vs Pandas.
07 - R Analytics: Proves that R is a first-class citizen in AKKO. Connects to PostgreSQL via DBI + RPostgres, wrangles data with tidyverse/dplyr, creates ggplot2 charts, and performs PostGIS geospatial analysis with the sf package.
08 - Julia DataFrames: A self-contained introduction to Julia's DataFrames.jl ecosystem. Covers split-apply-combine with groupby + combine, joins, comprehensions, broadcasting, and risk scoring. No external service dependencies.
11 - Polars Analytics: Showcases Polars' Rust-powered DataFrame engine. Demonstrates LazyFrame query plan optimization, window functions with over(), performance benchmarks vs Pandas, and zero-copy Arrow interop with DuckDB.
Visualization¶
Specialized notebooks for geospatial analysis and declarative charting.
| # | Notebook | Components | Kernel | Prerequisite |
|---|---|---|---|---|
| 09 | geospatial-analysis | PostGIS, GeoPandas, Folium, scikit-learn, Trino | Python | 01 Banking Demo |
| 12 | altair-visualization | Altair, Vega-Lite, Trino | Python | 01 Banking Demo |
09 - Geospatial Analysis: Spatial analytics combining PostGIS queries (ST_Distance, ST_Buffer, ST_DWithin), GeoPandas, interactive Folium maps with revenue overlays, Trino federation, KMeans geographic clustering, and GeoJSON export.
12 - Altair Visualization: Explores Altair's declarative charting approach built on Vega-Lite. Covers bar charts, line trends, scatter plots, interactive linked selections, faceted small multiples, and composed multi-chart dashboards.
AI¶
Local LLM inference and retrieval-augmented generation with full data sovereignty.
| # | Notebook | Components | Kernel | Prerequisite |
|---|---|---|---|---|
| 02 | rag-pipeline-demo | Ollama, pgvector, LangChain, PostgreSQL | Python | AKKO started |
| 13 | jupyter-ai-demo | jupyter-ai, Ollama, LangChain, Trino | Python | AKKO started + 01 Banking Demo |
02 - RAG Pipeline Demo: Implements a full Retrieval-Augmented Generation pipeline using 100% sovereign infrastructure. Generates embeddings with Ollama nomic-embed-text, stores them in PostgreSQL pgvector with HNSW index, and answers questions using LangChain.
13 - Jupyter AI Demo: Demonstrates AI-assisted data analysis with jupyter-ai and Ollama. Covers code generation from natural language, data-driven analysis, SQL generation from schema descriptions. All inference runs locally.
Reporting¶
Quarto reports and pre-built SQL queries for Superset SQL Lab.
| # | Notebook | Components | Kernel | Prerequisite |
|---|---|---|---|---|
| -- | 04-akko-banking-report.qmd | Quarto, Trino, Plotly | Python | 01 Banking Demo |
| -- | akko-banking-queries.sql | Trino SQL | SQL | 01 Banking Demo |
04-akko-banking-report.qmd: A Quarto document that renders to a standalone HTML report. Queries the banking dataset via Trino and produces an executive summary with KPIs, charts, and segment analysis. Rendered output is served at https://docs.akko.local/reports/akko-banking-report.html.
akko-banking-queries.sql: Ready-to-use SQL queries for Superset SQL Lab. Includes premium customer filtering, balance aggregation by account type, monthly transaction volumes, and federated branch-level revenue queries.
Dependency Graph¶
01 Banking Demo (seed)
|
+--- 04 DuckDB Analytics
+--- 05 dbt Transforms
+--- 06 Data Quality
+--- 09 Geospatial Analysis
+--- 11 Polars Analytics
+--- 12 Altair Visualization
+--- 13 Jupyter AI Demo
+--- 04-akko-banking-report.qmd
+--- akko-banking-queries.sql
03 Spark Iceberg Demo .......... standalone
07 R Analytics ................. standalone (uses PostGIS only)
08 Julia DataFrames ............ standalone (fully self-contained)
10 Polaris Catalog Admin ....... standalone (uses Polaris API only)
02 RAG Pipeline Demo ........... standalone (uses Ollama + pgvector)
Kernels Available
All notebooks run inside akko-notebook containers spawned by JupyterHub. Three kernels are available: Python (default), R (IRkernel), and Julia (IJulia). code-server (VS Code in the browser) is also available for file editing and terminal access.