akko-rag — RAG document intelligence¶

AKKO's retrieval service. Clients upload documents into collections; akko-rag extracts the text, chunks it, computes 768-dim embeddings via the sovereign LiteLLM gateway, and stores everything in pgvector inside akko-postgresql-data. Downstream callers (ADEN, cockpit, an agent, a notebook) then issue semantic queries against a collection and receive ranked chunks with full provenance.

Phase 0 — tier-1 backend only

This service is scoped to grow in three phases, all behind the same API contract. Phase 0 (shipped) uses pgvector for collections of up to ~1 k documents. Phases 2-3 swap in OpenSearch hybrid search and Spark/Iceberg for big-data tiers without breaking any caller. Full plan: rag-document-intelligence.

Architecture¶

flowchart LR
    subgraph Clients
        CK[Cockpit /#rag]
        ADEN[ADEN]
        NB[Notebook]
        AG[Agent]
    end
    subgraph RAG["akko-rag (FastAPI)"]
        API[/collections<br/>/documents<br/>/query/]
        EX[Extractor<br/>pypdf / txt / md]
        CHK[Chunker<br/>word-window]
        EMB[Embedder<br/>3× retry]
    end
    subgraph Backends
        PG[(akko-postgresql-data<br/>akko_rag schema<br/>pgvector HNSW + GIN)]
        LL[akko-litellm<br/>embed + chat]
    end
    Clients --> API
    API --> EX --> CHK --> EMB --> LL
    EMB --> PG
    API <--> PG

API (Phase 0)¶

Method	Path	Purpose
GET	`/health`	liveness
GET	`/ready`	DB reachability
GET	`/metrics`	Prometheus counters + histogram
POST	`/collections`	create logical grouping with `allowed_roles`
GET	`/collections`	list with document + chunk counts
POST	`/collections/{slug}/documents`	upload → extract → chunk → embed → store
GET	`/collections/{slug}/documents`	list documents
POST	`/collections/{slug}/query`	top-k retrieval by cosine similarity
GET	`/audit/queries?limit=N`	recent retrievals (audit trail)

Identity: trust the upstream ingress header X-Trino-User (fallback X-User-Id), same convention as ADEN and the Trino AI plugin. Phase 1 replaces this with Keycloak JWT verification.

Example¶

# 1. create a collection
curl -s -X POST https://rag.akko-ai.com/collections \
     -H "X-Trino-User: alice" -H "Content-Type: application/json" \
     -d '{"slug":"kb","name":"Knowledge base",
          "allowed_roles":["akko-admin","akko-engineer"]}'

# 2. upload a PDF
curl -s -X POST https://rag.akko-ai.com/collections/kb/documents \
     -H "X-Trino-User: alice" \
     -F "file=@/path/to/policy.pdf"

# 3. query
curl -s -X POST https://rag.akko-ai.com/collections/kb/query \
     -H "X-Trino-User: alice" -H "Content-Type: application/json" \
     -d '{"question":"how do I refund a charge?","top_k":5}'

Response:

{
  "question": "how do I refund a charge?",
  "collection": "kb",
  "chunks": [
    {
      "chunk_id": "…",
      "document_id": "…",
      "filename": "policy.pdf",
      "page": 4,
      "text": "Refunds are issued within 14 days …",
      "score": 0.87
    }
  ],
  "latency_ms": 38
}

Configuration¶

Every setting resolves from either AKKO_<NAME> or the bare <NAME> env, so values-dev, values-netcup and local docker runs share the same defaults. Full list:

Env var	Default	Purpose
`AKKO_PG_HOST`	`akko-postgresql-data`	PostgreSQL host
`AKKO_PG_PORT`	`5432`
`AKKO_PG_DATABASE`	`akko`
`AKKO_PG_USER`	`akko`
`AKKO_PG_PASSWORD`	(Secret)	from `akko-postgresql-data` Secret
`AKKO_EMBED_URL`	`http://akko-akko-litellm:4000`	LiteLLM OpenAI-compat
`AKKO_EMBED_MODEL`	`akko-embed`	LiteLLM routing alias
`AKKO_EMBED_DIM`	`768`	matches `nomic-embed-text`
`AKKO_CHAT_URL`	`http://akko-akko-litellm:4000`	Phase 1 answer generation
`AKKO_CHAT_MODEL`	`akko-chat`	Phase 1
`AKKO_CHUNK_SIZE_TOKENS`	`400`	word window
`AKKO_CHUNK_OVERLAP_TOKENS`	`60`	overlap in words
`AKKO_MAX_UPLOAD_BYTES`	`52428800`	50 MiB hard cap
`AKKO_DEFAULT_TOP_K`	`5`	default retrieval size
`AKKO_MAX_TOP_K`	`50`	hard ceiling

Enabling the service¶

Disabled by default in the umbrella chart. Flip it on:

# values-dev.yaml (or values-netcup.yaml)
akko-rag:
  enabled: true

The schema is provisioned by postgres/init/11-akko-rag.sql at first boot of akko-postgresql-data. On an already-initialised cluster, re-apply the file manually against akko-postgresql-data:

kubectl -n akko exec deploy/akko-akko-postgresql-data -- \
    psql -U akko -d akko -f /docker-entrypoint-initdb.d/11-akko-rag.sql

Phases ahead¶

Phase	Delivers	Backend
0 (shipped)	Ingest + chunk + embed + pgvector top-k	pgvector HNSW
1	Chat answer with citations, JWT auth, cockpit UI, OPA filtering	pgvector
2	Hybrid BM25 + dense search, Airflow watchfolder, HPA, ServiceMonitor	OpenSearch knn
3	Spark embed UDF, Iceberg VECTOR column, Trino hybrid SQL	Spark + Iceberg