Runbook: Trino slow queries / high latency¶
Alerte : TrinoHighQueryLatency (severity warning)
Trino P95 query latency > 5s
Diagnostic¶
1. Identifier les slow queries¶
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
# Via Trino UI
kubectl port-forward -n akko svc/akko-trino 8080:8080
# Open http://localhost:8080 → Query History
Ou via SQL :
-- Top 10 queries les plus lentes des dernières 24h
SELECT
user, query,
date_diff('second', create_time, end_time) AS duration_s,
total_bytes / 1024.0 / 1024.0 AS mb_scanned,
total_rows
FROM system.runtime.queries
WHERE create_time > NOW() - INTERVAL '24' HOUR
AND state = 'FINISHED'
ORDER BY duration_s DESC
LIMIT 10;
2. Vérifier la mémoire Trino coordinator + workers¶
Si proches des limites, queries timeout ou queue build up.
3. Stats Iceberg metadata¶
-- Iceberg: fichiers orphelins, manifestes non compactés
SHOW STATS FOR iceberg.banking.customers;
SELECT * FROM iceberg.banking."customers$files" LIMIT 10;
SELECT * FROM iceberg.banking."customers$manifests" LIMIT 10;
Causes + remediation¶
Cas 1 : Table Iceberg fragmentée (beaucoup de petits fichiers)¶
Fix : compaction Iceberg
ALTER TABLE iceberg.banking.customers EXECUTE optimize;
ALTER TABLE iceberg.banking.customers EXECUTE expire_snapshots(retention_threshold => '7d');
ALTER TABLE iceberg.banking.customers EXECUTE remove_orphan_files;
Cas 2 : Pas assez de workers¶
Fix : bump trino.server.worker.count dans values
Cas 3 : Query inefficace (full scan, cross-join)¶
Fix : analyser via EXPLAIN ANALYZE
Conseiller partitioning / bucketing sur les tables souvent filtrées.
Cas 4 : OPA policy latency (chaque query check OPA)¶
Vérifier :
Si OPA ralentit, revoir les policies (cache, decision log).
Lessons learned¶
Pas encore de L spécifique pour Trino perf. Ajouter dans _RULES.md
après 1er incident résolu.
Voir aussi¶
- Trino service docs
- OPA service docs
- Dashboard Dashboards "Trino SLO" (Sprint 22.1.4)