PostgreSQL Data — Analytics Database¶
PostgreSQL Data is the dedicated functional database for AKKO, separate from the infrastructure PostgreSQL instance. It runs the custom akko-postgres image with PostGIS (geospatial) and pgvector (vector embeddings for RAG), serving analytics workloads, geospatial queries, and AI/ML use cases.
Architecture¶
JupyterHub Trino Airflow DAGs
\ | /
+-------------v--------------+
| PostgreSQL Data (:5432) |
| akko-postgres image |
| PostGIS + pgvector |
+-----------------------------+
| Databases: |
| - akko (analytics) |
| - aden (NL→SQL engine) |
| - geospatial data |
| - RAG vector store |
+-----------------------------+
- Separate from infra PostgreSQL — never mix business data with Keycloak/Airflow/Superset metadata
- PostGIS enables geospatial queries (ST_Distance, ST_Contains, etc.)
- pgvector enables vector similarity search for RAG pipelines (Ollama embeddings)
- pgaudit provides audit logging for compliance
URLs¶
PostgreSQL Data is an internal-only service (no ingress). It is accessed by other services within the cluster on port 5432.
| Service | Connection |
|---|---|
| Trino | jdbc:postgresql://akko-postgresql-data:5432/akko |
| JupyterHub | postgresql://akko-postgresql-data:5432/akko |
| ADEN | postgresql://akko-postgresql-data:5432/aden |
Configuration (Helm values)¶
akko-postgresql-data:
enabled: true
image:
repository: akko-postgres
tag: "2026.03"
service:
port: 5432
persistence:
enabled: true
size: 20Gi
database: akko
username: postgres
config:
maxConnections: 200
sharedBuffers: 256MB
effectiveCacheSize: 768MB
workMem: 4MB
sharedPreloadLibraries: pgaudit
backup:
enabled: false
schedule: "0 2 * * *"
retention: 7
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: "1"
memory: 1Gi
Health Check¶
PostgreSQL responds to TCP connections on port 5432:
livenessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 15
periodSeconds: 30
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 10
periodSeconds: 10
RBAC (who can access)¶
PostgreSQL Data is accessible only within the cluster (no ingress). Access is controlled by:
- Database credentials — stored in Kubernetes secrets
- NetworkPolicy — restricts which pods can connect to port 5432
- pgaudit — logs all SQL statements for compliance auditing
Key Features¶
| Feature | Description |
|---|---|
| PostGIS | Geospatial data types and functions (geometry, geography, raster) |
| pgvector | Vector similarity search for RAG/AI pipelines |
| pgaudit | SQL audit logging for compliance |
| Automated backup | Optional CronJob with pg_dumpall and configurable retention |
| Tunable config | ConfigMap-based postgresql.conf tuning |
Why Two PostgreSQL Instances?¶
AKKO uses two PostgreSQL instances by design:
| Instance | Purpose | Data |
|---|---|---|
akko-postgresql |
Infrastructure | Keycloak, Airflow, Superset, Polaris, OpenMetadata, MLflow, JupyterHub |
akko-postgresql-data |
Functional | Analytics, geospatial, RAG vectors, ADEN, client data |
This separation ensures:
- Independent backup/restore strategies
- Different scaling profiles (infra is lightweight, data can be large)
- Security isolation (infra credentials never leak to analytics users)
- Independent PostgreSQL tuning per workload
Resource Requirements¶
| Component | Minimum RAM | Recommended |
|---|---|---|
| PostgreSQL Data | 512 Mi | 1 Gi |
Troubleshooting¶
Pod Fails to Start (Permission Denied)¶
Symptoms: Pod enters CrashLoopBackOff. Logs show FATAL: data directory has wrong ownership.
Cause: The PVC has incorrect ownership. The akko-postgres image runs as UID 999.
Solution:
# Check pod logs
kubectl logs -n akko deploy/akko-postgresql-data --tail=50
# Verify the PVC exists and is bound
kubectl get pvc -n akko | grep postgresql-data
pgvector Extension Not Available¶
Symptoms: CREATE EXTENSION vector fails with could not open extension control file.
Cause: The pod is running the standard PostgreSQL image instead of the custom akko-postgres image.
Solution:
# Verify the correct image is running
kubectl describe pod -n akko -l app.kubernetes.io/name=akko-postgresql-data | grep Image
# The image should be akko-postgres:2026.03 (with PostGIS + pgvector)
Connection Refused from Trino¶
Symptoms: Trino queries against the postgresql catalog fail with Connection refused.
Cause: NetworkPolicy blocking the connection, or PostgreSQL is not ready.
Solution:
# Check PostgreSQL readiness
kubectl get pods -n akko -l app.kubernetes.io/name=akko-postgresql-data
# Test connectivity from the Trino pod
kubectl exec -n akko deploy/akko-trino -- pg_isready -h akko-postgresql-data -p 5432
# Check NetworkPolicy
kubectl get networkpolicy -n akko | grep postgresql-data