08 — Troubleshooting¶
Time : reference · Persona : any · Path : A or B
A pragmatic catalogue of the issues you are most likely to hit during onboarding, with the proven fix.
Pre-requisites
- You have at least one platform handy (
https://demo.akko-ai.comor your localhttps://demo.akko.local).
Quick health check¶
Run this first whenever something feels off :
# Live demo
curl -sk https://demo.akko-ai.com/api/health/all | jq
# Local k3d
curl -sk https://demo.akko.local/api/health/all | jq
Expected : every layer returns { "status": "ok" }.
If any layer is degraded, the rest of this page tells you which knob to turn.
1. "Service shows down" in the cockpit¶
| Cause | Fix |
|---|---|
| Sub-chart still rolling | kubectl get pods -n akko and wait for Running. |
| Sub-chart crash loop | kubectl logs -n akko <pod> → look for CrashLoopBackOff root cause. |
| Network policy blocks the health probe | kubectl describe pod -n akko <pod> → "egress denied". Update the NP. |
| Health endpoint moved in a new release | kubectl describe svc -n akko <svc> then update the cockpit health route map. |
| External dependency down | If your catalog source is offline, the source health goes red, not the AKKO layer. |
First reflex. Look at the cockpit Logs page filtered by the failing service before diving into
kubectl.
2. SSO redirect loop¶
You sign in, you bounce back to the login screen, you sign in again, you bounce again…
| Cause | Fix |
|---|---|
| Third-party cookies blocked | Allow cookies for *.akko-ai.com (or your local domain). |
| Safari ITP cookie partitioning | Open the cockpit in a non-private window; ensure "Prevent cross-site tracking" doesn't strip the SSO cookie. |
| Clock skew between browser and identity provider | Sync your laptop clock. |
| Cookies left over from a previous deploy | Clear cookies for the AKKO domain, sign in again. |
| Identity provider client redirect-uri mismatch | Run bash helm/scripts/keycloak-rest-patch-redirect-uris.sh (local) or wait for the redirect-uri sync hook (live). |
3. ADEN result is empty¶
| Cause | Fix |
|---|---|
| Catalog just reseeded | Wait 30 sec, click Generate again. |
| Scope excludes your tables | Switch scope chip to one your persona owns. |
| Row filter masked all rows | Try a less restrictive question (e.g. drop the region filter). |
| Semantic-layer match score very low | Reword the question with explicit table names. |
| Engine warmup | Second run is < 3 sec — retry. |
4. Playwright tests skip¶
You ran pytest tests/e2e/playwright/ and most tests reported SKIPPED.
| Cause | Fix |
|---|---|
Identity provider FQDN drift (e.g. auth.<domain> instead of identity.<domain>) |
The suite auto-resolves identity / keycloak / auth. If all three fail, check the frontal routing. |
| Cluster not reachable | curl -sk https://demo.akko-ai.com/ should return 302. If 000, check DNS / VPN. |
| Persona not seeded | bash helm/scripts/seed-keycloak-personas.sh. |
| Wrong domain env | AKKO_DOMAIN=demo.akko-ai.com pytest …. |
| Cold start, identity provider returning 504 | Wait 30 sec then re-run. |
Skip-cleanly is by design : the suite refuses to assert green on a half-up platform. Read the skip reason in the pytest output.
5. Helm upgrade failed¶
| Cause | Fix |
|---|---|
You used --reset-values |
NEVER use --reset-values. Always --reuse-values -f helm/examples/values-netcup.yaml. |
Sub-chart image cached as docker.io/library/akko-X:tag |
Containerd cache shadow. Re-tag with the local registry prefix. |
Realm file not passed via --set-file |
Add --set-file akko-keycloak.realm.data=helm/examples/realm-akko-k3d.json. |
Half-migration : _required("AKKO_FOO") in code without the matching helm env: name: AKKO_FOO |
Update both code and helm in the same PR. |
| Pod waits for a Secret that the chart didn't create | Check kubectl describe pod for FailedMount. |
Golden rule. Helm upgrades should never need a CLI mutation. If you reached for
kubectl set image, capture the gap as a chart fix and reissue the upgrade.
6. The cockpit shows the previous build¶
| Cause | Fix |
|---|---|
Image tag reused (e.g. akko-cockpit:2026.05) |
Run bash scripts/deploy-cockpit.sh — it timestamps the tag. |
| Browser cache | Hard refresh (Cmd-Shift-R). |
| Service worker holding the old bundle | DevTools → Application → Service Workers → Unregister, refresh. |
| Pod rollout pending | kubectl rollout status deploy/akko-cockpit -n akko. |
7. Local k3d : pods stay Pending¶
| Cause | Fix |
|---|---|
| Out of memory on Docker Desktop | Raise to 16 GB in Docker Desktop preferences. |
| Out of disk | docker system prune -f. |
imagePullBackOff on a custom image |
Run bash helm/scripts/build-images.sh to push to the k3d local registry. |
| PVC unbound | k3d local-path provisioner missing; re-run bash helm/scripts/k3d-create.sh. |
Links to deeper runbooks¶
- Pod CrashLoopBackOff runbook
- ImagePullBackOff runbook
- Keycloak SSO broken runbook
- Helm upgrade stuck runbook
- Smoke test failing runbook
- Trino slow queries runbook
When in doubt¶
- Check the cockpit Alerts page — it bubbles up actionable alerts before anyone has to ask.
- Open
DEMO_HOST/docs/admin/troubleshooting/for the full troubleshooting handbook. - File an issue on github.com/AKKO-p/AKKO with the curl health output attached.
Next : 09 — Next steps.