Skip to content

akko-rag — RAG document intelligence

AKKO's retrieval service. Clients upload documents into collections; akko-rag extracts the text, chunks it, computes 768-dim embeddings via the sovereign LiteLLM gateway, and stores everything in pgvector inside akko-postgresql-data. Downstream callers (ADEN, cockpit, an agent, a notebook) then issue semantic queries against a collection and receive ranked chunks with full provenance.

Phase 0 — tier-1 backend only

This service is scoped to grow in three phases, all behind the same API contract. Phase 0 (shipped) uses pgvector for collections of up to ~1 k documents. Phases 2-3 swap in OpenSearch hybrid search and Spark/Iceberg for big-data tiers without breaking any caller. Full plan: rag-document-intelligence.

Architecture

flowchart LR
    subgraph Clients
        CK[Cockpit /#rag]
        ADEN[ADEN]
        NB[Notebook]
        AG[Agent]
    end
    subgraph RAG["akko-rag (FastAPI)"]
        API[/collections<br/>/documents<br/>/query/]
        EX[Extractor<br/>pypdf / txt / md]
        CHK[Chunker<br/>word-window]
        EMB[Embedder<br/>3× retry]
    end
    subgraph Backends
        PG[(akko-postgresql-data<br/>akko_rag schema<br/>pgvector HNSW + GIN)]
        LL[akko-litellm<br/>embed + chat]
    end
    Clients --> API
    API --> EX --> CHK --> EMB --> LL
    EMB --> PG
    API <--> PG

API (Phase 0)

Method Path Purpose
GET /health liveness
GET /ready DB reachability
GET /metrics Prometheus counters + histogram
POST /collections create logical grouping with allowed_roles
GET /collections list with document + chunk counts
POST /collections/{slug}/documents upload → extract → chunk → embed → store
GET /collections/{slug}/documents list documents
POST /collections/{slug}/query top-k retrieval by cosine similarity
GET /audit/queries?limit=N recent retrievals (audit trail)

Identity: trust the upstream ingress header X-Trino-User (fallback X-User-Id), same convention as ADEN and the Trino AI plugin. Phase 1 replaces this with Keycloak JWT verification.

Example

# 1. create a collection
curl -s -X POST https://rag.akko-ai.com/collections \
     -H "X-Trino-User: alice" -H "Content-Type: application/json" \
     -d '{"slug":"kb","name":"Knowledge base",
          "allowed_roles":["akko-admin","akko-engineer"]}'

# 2. upload a PDF
curl -s -X POST https://rag.akko-ai.com/collections/kb/documents \
     -H "X-Trino-User: alice" \
     -F "file=@/path/to/policy.pdf"

# 3. query
curl -s -X POST https://rag.akko-ai.com/collections/kb/query \
     -H "X-Trino-User: alice" -H "Content-Type: application/json" \
     -d '{"question":"how do I refund a charge?","top_k":5}'

Response:

{
  "question": "how do I refund a charge?",
  "collection": "kb",
  "chunks": [
    {
      "chunk_id": "…",
      "document_id": "…",
      "filename": "policy.pdf",
      "page": 4,
      "text": "Refunds are issued within 14 days …",
      "score": 0.87
    }
  ],
  "latency_ms": 38
}

Configuration

Every setting resolves from either AKKO_<NAME> or the bare <NAME> env, so values-dev, values-netcup and local docker runs share the same defaults. Full list:

Env var Default Purpose
AKKO_PG_HOST akko-postgresql-data PostgreSQL host
AKKO_PG_PORT 5432
AKKO_PG_DATABASE akko
AKKO_PG_USER akko
AKKO_PG_PASSWORD (Secret) from akko-postgresql-data Secret
AKKO_EMBED_URL http://akko-akko-litellm:4000 LiteLLM OpenAI-compat
AKKO_EMBED_MODEL akko-embed LiteLLM routing alias
AKKO_EMBED_DIM 768 matches nomic-embed-text
AKKO_CHAT_URL http://akko-akko-litellm:4000 Phase 1 answer generation
AKKO_CHAT_MODEL akko-chat Phase 1
AKKO_CHUNK_SIZE_TOKENS 400 word window
AKKO_CHUNK_OVERLAP_TOKENS 60 overlap in words
AKKO_MAX_UPLOAD_BYTES 52428800 50 MiB hard cap
AKKO_DEFAULT_TOP_K 5 default retrieval size
AKKO_MAX_TOP_K 50 hard ceiling

Enabling the service

Disabled by default in the umbrella chart. Flip it on:

# values-dev.yaml (or values-netcup.yaml)
akko-rag:
  enabled: true

The schema is provisioned by postgres/init/11-akko-rag.sql at first boot of akko-postgresql-data. On an already-initialised cluster, re-apply the file manually against akko-postgresql-data:

kubectl -n akko exec deploy/akko-akko-postgresql-data -- \
    psql -U akko -d akko -f /docker-entrypoint-initdb.d/11-akko-rag.sql

Phases ahead

Phase Delivers Backend
0 (shipped) Ingest + chunk + embed + pgvector top-k pgvector HNSW
1 Chat answer with citations, JWT auth, cockpit UI, OPA filtering pgvector
2 Hybrid BM25 + dense search, Airflow watchfolder, HPA, ServiceMonitor OpenSearch knn
3 Spark embed UDF, Iceberg VECTOR column, Trino hybrid SQL Spark + Iceberg

See also