ADEN vs akko-rag — when to use which¶

Founder review 2026-04-26 surfaced the question : "dans ADEN y'a une partie unstructured data PDF, c'est quoi la différence avec ce qui est dans RAG ?"

Both services answer questions over documents, but they sit at different layers of the stack and serve different audiences. This page exists so a CISO/operator/prospect can read one paragraph and pick the right tool.

TL;DR¶

Capability	ADEN unstructured	akko-rag
Lifetime of the document	One conversation	Persistent collection
Who uploads	The chatting user, mid-question	An admin, ahead of time
Storage backend	In-process / ephemeral	Postgres + vector store + Storage layer originals
Re-queryable	No — gone when the chat ends	Yes — every consumer can re-query
Multi-user shared knowledge	No	Yes (collection-level access via `allowed_roles`)
API surface	Embedded in ADEN's `/query` flow	Standalone FastAPI : `/collections`, `/upload`, `/query`
Cockpit page	Inside ADEN page	Dedicated RAG page
Best for	"Drop this PDF in, ask one question, move on"	"Build a corporate knowledge base, share with the team"

When to use ADEN unstructured¶

A user has a single PDF/CSV/DOCX in front of them, wants a quick answer, and won't need that document again.
The document is sensitive (a customer NDA, a draft proposal) and must not persist beyond the chat.
The conversation needs both structured SQL queries AND document context in the same flow — ADEN orchestrates the Query layer + the document together.
No admin overhead — the user just drops the file in the ADEN UI.

Example session :

User : uploads Q3-bank-loan-terms.pdf User : "Compare these loan terms to our customers.contracts table. Are any covenants violated by current customers ?" ADEN : (extracts terms from PDF context, runs SQL on Query engine (Trino), joins the result, produces a dashboard)

The PDF is gone when the user closes the conversation. Audit log keeps the question + the SQL but not the PDF body.

When to use akko-rag¶

The team will ask many questions against the same set of documents over time (a product manual, an internal wiki, regulatory texts).
Multiple users / roles need to query the same collection — akko-rag enforces allowed_roles per collection.
The collection deserves its own lifecycle (create / upload many docs / re-index / delete via the cockpit RAG page, see PR #63).
A non-AKKO consumer (a notebook, an external service) needs to call the same retrieval API — akko-rag is a standalone FastAPI exposing /collections/{slug}/query with cosine similarity over pgvector.

Example session :

Admin : creates a collection internal-policies, uploads 30 HR policy PDFs. Admin : assigns allowed_roles: [hr, akko-admin]. Employee Bob (role hr) : asks ADEN or any RAG-aware tool "what's the parental leave policy ?", retrieval hits internal-policies, answer cites the PDF. Employee Eve (role analyst) : same question, retrieval finds 0 hits because Eve isn't in the allowed_roles. ADEN replies "no relevant document in the collections you can read".

The collection persists across sessions. Other consumers (a notebook, a Slack bot, an external service) can call the same API.

Implementation under the hood¶

                        ┌──────────────────────────────────┐
                        │  ADEN (NL → SQL → Dashboard)     │
                        │  docker/aden/                    │
                        │                                  │
   uploads file ───────▶│  /query                          │
   (in-chat)            │     ├─ Trino SQL                 │
                        │     ├─ "unstructured" handler    │
                        │     │   └─ ephemeral text extract│
                        │     │      → context window      │
                        │     └─ akko-rag client (optional)│──┐
                        │        for persistent collections│  │
                        └──────────────────────────────────┘  │
                                                               │
                                                               ▼
                        ┌──────────────────────────────────┐
                        │  akko-rag (standalone)           │
                        │  docker/akko-rag/                │
                        │                                  │
   uploads doc ────────▶│  POST /collections/{slug}/documents
   (admin, ahead of     │       └─ pdf/csv/docx → chunks → embeddings
    time)               │           └─ pgvector + SeaweedFS │
                        │                                  │
   query ──────────────▶│  POST /collections/{slug}/query  │
                        │       └─ vector similarity → top-k│
                        └──────────────────────────────────┘

ADEN unstructured¶

Code lives in docker/aden/ next to the SQL pipeline.
The PDF/CSV/DOCX bytes get parsed in-process (pypdf, python-docx, pandas for CSV) and the extracted text becomes part of the LLM context window for THAT question.
No persistent table holds the document. The audit log keeps the question text + the SQL ADEN generated, not the document body.
Soft 50 MB upload limit (max_upload_bytes).

akko-rag¶

Standalone FastAPI service (docker/akko-rag/).
Postgres schema akko_rag with collections, documents, chunks, query_log tables (see postgres/init/11-akko-rag.sql).
Each upload chunks the document, embeds chunks via the configured embedder (AI gateway (LiteLLM) by default, on-prem Local LLM runtime (Ollama) in air-gapped mode), stores embeddings in pgvector.
Originals stored in Object storage (SeaweedFS) bucket akko-rag/originals/.
/query does cosine similarity, returns top-k chunks with citation metadata (file, page).
Hard 50 MB upload limit (maxUploadBytes chart value).

When to MERGE them (and why we haven't)¶

Founder asked whether to merge the two. The current judgment is no, keep them separate, but make ADEN's unstructured handler call akko-rag under the hood for the persistent path.

Pros of keeping separate :

ADEN unstructured is the right UX for "drop a file, ask one question, move on". Forcing every drop-in PDF through a collection create + name + roles + index step would kill the conversational flow.
akko-rag has consumers beyond ADEN (notebooks, external services, a future Slack/Teams bot). It needs its own API and lifecycle.
The two services have different security models : ADEN unstructured trusts the chatting user implicitly (single session); akko-rag enforces allowed_roles per collection.

The future merge is one feature : "save this conversation's PDF to a collection" — a button in ADEN that POSTs the ephemeral document to akko-rag with a chosen collection + roles. Sprint 54 follow-up, not a P0.

Quick reference — endpoints and pages¶

Action	Where
Drop a PDF and ask one question	demo.akko-ai.com → ADEN page → upload widget
Create a persistent collection	demo.akko-ai.com → RAG page → "+ New collection"
Add a document to a collection	RAG page → select collection → upload doc
Delete a collection	RAG page → trash icon on the collection (PR #63)
Query a collection from a notebook	`POST http://akko-akko-rag:8080/collections/{slug}/query`