akko-rag — RAG document intelligence¶
AKKO's retrieval service. Clients upload documents into collections;
akko-rag extracts the text, chunks it, computes 768-dim embeddings via
the sovereign LiteLLM gateway, and stores everything in pgvector
inside akko-postgresql-data. Downstream callers (ADEN, cockpit, an
agent, a notebook) then issue semantic queries against a collection
and receive ranked chunks with full provenance.
Phase 0 — tier-1 backend only
This service is scoped to grow in three phases, all behind the same API contract. Phase 0 (shipped) uses pgvector for collections of up to ~1 k documents. Phases 2-3 swap in OpenSearch hybrid search and Spark/Iceberg for big-data tiers without breaking any caller. Full plan: rag-document-intelligence.
Architecture¶
flowchart LR
subgraph Clients
CK[Cockpit /#rag]
ADEN[ADEN]
NB[Notebook]
AG[Agent]
end
subgraph RAG["akko-rag (FastAPI)"]
API[/collections<br/>/documents<br/>/query/]
EX[Extractor<br/>pypdf / txt / md]
CHK[Chunker<br/>word-window]
EMB[Embedder<br/>3× retry]
end
subgraph Backends
PG[(akko-postgresql-data<br/>akko_rag schema<br/>pgvector HNSW + GIN)]
LL[akko-litellm<br/>embed + chat]
end
Clients --> API
API --> EX --> CHK --> EMB --> LL
EMB --> PG
API <--> PG
API (Phase 0)¶
| Method | Path | Purpose |
|---|---|---|
| GET | /health |
liveness |
| GET | /ready |
DB reachability |
| GET | /metrics |
Prometheus counters + histogram |
| POST | /collections |
create logical grouping with allowed_roles |
| GET | /collections |
list with document + chunk counts |
| POST | /collections/{slug}/documents |
upload → extract → chunk → embed → store |
| GET | /collections/{slug}/documents |
list documents |
| POST | /collections/{slug}/query |
top-k retrieval by cosine similarity |
| GET | /audit/queries?limit=N |
recent retrievals (audit trail) |
Identity: trust the upstream ingress header X-Trino-User (fallback
X-User-Id), same convention as ADEN and the Trino AI plugin. Phase 1
replaces this with Keycloak JWT verification.
Example¶
# 1. create a collection
curl -s -X POST https://rag.akko-ai.com/collections \
-H "X-Trino-User: alice" -H "Content-Type: application/json" \
-d '{"slug":"kb","name":"Knowledge base",
"allowed_roles":["akko-admin","akko-engineer"]}'
# 2. upload a PDF
curl -s -X POST https://rag.akko-ai.com/collections/kb/documents \
-H "X-Trino-User: alice" \
-F "file=@/path/to/policy.pdf"
# 3. query
curl -s -X POST https://rag.akko-ai.com/collections/kb/query \
-H "X-Trino-User: alice" -H "Content-Type: application/json" \
-d '{"question":"how do I refund a charge?","top_k":5}'
Response:
{
"question": "how do I refund a charge?",
"collection": "kb",
"chunks": [
{
"chunk_id": "…",
"document_id": "…",
"filename": "policy.pdf",
"page": 4,
"text": "Refunds are issued within 14 days …",
"score": 0.87
}
],
"latency_ms": 38
}
Configuration¶
Every setting resolves from either AKKO_<NAME> or the bare <NAME>
env, so values-dev, values-netcup and local docker runs share the same
defaults. Full list:
| Env var | Default | Purpose |
|---|---|---|
AKKO_PG_HOST |
akko-postgresql-data |
PostgreSQL host |
AKKO_PG_PORT |
5432 |
|
AKKO_PG_DATABASE |
akko |
|
AKKO_PG_USER |
akko |
|
AKKO_PG_PASSWORD |
(Secret) | from akko-postgresql-data Secret |
AKKO_EMBED_URL |
http://akko-akko-litellm:4000 |
LiteLLM OpenAI-compat |
AKKO_EMBED_MODEL |
akko-embed |
LiteLLM routing alias |
AKKO_EMBED_DIM |
768 |
matches nomic-embed-text |
AKKO_CHAT_URL |
http://akko-akko-litellm:4000 |
Phase 1 answer generation |
AKKO_CHAT_MODEL |
akko-chat |
Phase 1 |
AKKO_CHUNK_SIZE_TOKENS |
400 |
word window |
AKKO_CHUNK_OVERLAP_TOKENS |
60 |
overlap in words |
AKKO_MAX_UPLOAD_BYTES |
52428800 |
50 MiB hard cap |
AKKO_DEFAULT_TOP_K |
5 |
default retrieval size |
AKKO_MAX_TOP_K |
50 |
hard ceiling |
Enabling the service¶
Disabled by default in the umbrella chart. Flip it on:
The schema is provisioned by postgres/init/11-akko-rag.sql at first
boot of akko-postgresql-data. On an already-initialised cluster,
re-apply the file manually against akko-postgresql-data:
kubectl -n akko exec deploy/akko-akko-postgresql-data -- \
psql -U akko -d akko -f /docker-entrypoint-initdb.d/11-akko-rag.sql
Phases ahead¶
| Phase | Delivers | Backend |
|---|---|---|
| 0 (shipped) | Ingest + chunk + embed + pgvector top-k | pgvector HNSW |
| 1 | Chat answer with citations, JWT auth, cockpit UI, OPA filtering | pgvector |
| 2 | Hybrid BM25 + dense search, Airflow watchfolder, HPA, ServiceMonitor | OpenSearch knn |
| 3 | Spark embed UDF, Iceberg VECTOR column, Trino hybrid SQL | Spark + Iceberg |
See also¶
- Architecture / Unified data + AI
- AI / Trino AI Functions —
akko_ai_searchcovers in-catalog retrieval;akko-ragcovers document-centric retrieval with citations and audit - Services / AI Service — sibling service for text/multimodal generation