Skip to content

PostgreSQL Data — Analytics Database

PostgreSQL Data is the dedicated functional database for AKKO, separate from the infrastructure PostgreSQL instance. It runs the custom akko-postgres image with PostGIS (geospatial) and pgvector (vector embeddings for RAG), serving analytics workloads, geospatial queries, and AI/ML use cases.

Architecture

JupyterHub       Trino            Airflow DAGs
    \              |               /
     +-------------v--------------+
     |   PostgreSQL Data (:5432)   |
     |   akko-postgres image       |
     |   PostGIS + pgvector        |
     +-----------------------------+
     |  Databases:                 |
     |  - akko (analytics)         |
     |  - aden (NL→SQL engine)     |
     |  - geospatial data          |
     |  - RAG vector store         |
     +-----------------------------+
  • Separate from infra PostgreSQL — never mix business data with Keycloak/Airflow/Superset metadata
  • PostGIS enables geospatial queries (ST_Distance, ST_Contains, etc.)
  • pgvector enables vector similarity search for RAG pipelines (Ollama embeddings)
  • pgaudit provides audit logging for compliance

URLs

PostgreSQL Data is an internal-only service (no ingress). It is accessed by other services within the cluster on port 5432.

Service Connection
Trino jdbc:postgresql://akko-postgresql-data:5432/akko
JupyterHub postgresql://akko-postgresql-data:5432/akko
ADEN postgresql://akko-postgresql-data:5432/aden

Configuration (Helm values)

akko-postgresql-data:
  enabled: true
  image:
    repository: akko-postgres
    tag: "2026.03"
  service:
    port: 5432
  persistence:
    enabled: true
    size: 20Gi
  database: akko
  username: postgres
  config:
    maxConnections: 200
    sharedBuffers: 256MB
    effectiveCacheSize: 768MB
    workMem: 4MB
    sharedPreloadLibraries: pgaudit
  backup:
    enabled: false
    schedule: "0 2 * * *"
    retention: 7
  resources:
    requests:
      cpu: 250m
      memory: 512Mi
    limits:
      cpu: "1"
      memory: 1Gi

Health Check

PostgreSQL responds to TCP connections on port 5432:

livenessProbe:
  exec:
    command:
      - pg_isready
      - -U
      - postgres
  initialDelaySeconds: 15
  periodSeconds: 30
readinessProbe:
  exec:
    command:
      - pg_isready
      - -U
      - postgres
  initialDelaySeconds: 10
  periodSeconds: 10

RBAC (who can access)

PostgreSQL Data is accessible only within the cluster (no ingress). Access is controlled by:

  • Database credentials — stored in Kubernetes secrets
  • NetworkPolicy — restricts which pods can connect to port 5432
  • pgaudit — logs all SQL statements for compliance auditing

Key Features

Feature Description
PostGIS Geospatial data types and functions (geometry, geography, raster)
pgvector Vector similarity search for RAG/AI pipelines
pgaudit SQL audit logging for compliance
Automated backup Optional CronJob with pg_dumpall and configurable retention
Tunable config ConfigMap-based postgresql.conf tuning

Why Two PostgreSQL Instances?

AKKO uses two PostgreSQL instances by design:

Instance Purpose Data
akko-postgresql Infrastructure Keycloak, Airflow, Superset, Polaris, OpenMetadata, MLflow, JupyterHub
akko-postgresql-data Functional Analytics, geospatial, RAG vectors, ADEN, client data

This separation ensures:

  • Independent backup/restore strategies
  • Different scaling profiles (infra is lightweight, data can be large)
  • Security isolation (infra credentials never leak to analytics users)
  • Independent PostgreSQL tuning per workload

Resource Requirements

Component Minimum RAM Recommended
PostgreSQL Data 512 Mi 1 Gi

Troubleshooting

Pod Fails to Start (Permission Denied)

Symptoms: Pod enters CrashLoopBackOff. Logs show FATAL: data directory has wrong ownership.

Cause: The PVC has incorrect ownership. The akko-postgres image runs as UID 999.

Solution:

# Check pod logs
kubectl logs -n akko deploy/akko-postgresql-data --tail=50

# Verify the PVC exists and is bound
kubectl get pvc -n akko | grep postgresql-data

pgvector Extension Not Available

Symptoms: CREATE EXTENSION vector fails with could not open extension control file.

Cause: The pod is running the standard PostgreSQL image instead of the custom akko-postgres image.

Solution:

# Verify the correct image is running
kubectl describe pod -n akko -l app.kubernetes.io/name=akko-postgresql-data | grep Image

# The image should be akko-postgres:2026.03 (with PostGIS + pgvector)

Connection Refused from Trino

Symptoms: Trino queries against the postgresql catalog fail with Connection refused.

Cause: NetworkPolicy blocking the connection, or PostgreSQL is not ready.

Solution:

# Check PostgreSQL readiness
kubectl get pods -n akko -l app.kubernetes.io/name=akko-postgresql-data

# Test connectivity from the Trino pod
kubectl exec -n akko deploy/akko-trino -- pg_isready -h akko-postgresql-data -p 5432

# Check NetworkPolicy
kubectl get networkpolicy -n akko | grep postgresql-data