Aller au contenu

ADR-035 update — Caddy host-side reverse proxy on Netcup

Date : 2026-04-25 (Sprint 47 V2 close-out)

Discovery during implementation

After provisioning the wildcard cert via cert-manager + Let's Encrypt DNS-01, new sub-domains (lab, bi, federation, etc.) still returned tlsv1 alert internal error. Deep investigation revealed the actual TLS termination architecture on Netcup:

Internet → :443 → docker-proxy → Caddy 2 (host container) → :30080 → Traefik (k8s) → Service

The Caddy host-side container climscore-caddy-1 listens on port 443 of the node, handles its own ACME flow per-host (HTTP-01 to Let's Encrypt), and forwards the proxied traffic to Traefik on NodePort 30080 over plain HTTP.

This means: - Traefik's Ingress.tls.secretName and TLSStore.default are never reached for TLS termination on this cluster — they're only relevant if the operator hits Traefik directly (e.g. via another LoadBalancer in front). - The cert-manager wildcard cert (akko-wildcard-tls) is provisioned but unused on Netcup right now. - New sub-domains MUST be declared in /opt/climscore/Caddyfile for Caddy to obtain a Let's Encrypt cert and forward the traffic.

Sprint 47 V2 actual fix

The 16 new functional FQDNs were appended to the AKKO services block in the host Caddyfile:

lab.akko-ai.com,
bi.akko-ai.com,
orchestrator.akko-ai.com,
federation.akko-ai.com,
compute.akko-ai.com,
experiments.akko-ai.com,
llm.akko-ai.com,
directory.akko-ai.com,
metrics.akko-ai.com,
storage.akko-ai.com,
logs.akko-ai.com,
identity.akko-ai.com,
alerts.akko-ai.com,
mcp.akko-ai.com,
mcp-catalog.akko-ai.com,
tempo.akko-ai.com {
    reverse_proxy http://host.docker.internal:30080 { ... }
}

After caddy reload, Caddy auto-provisioned LE certs per-host on first HTTP hit. All 16 sub-domains now serve valid HTTPS.

Multi-infra portability

The cert-manager + wildcard cert infrastructure shipped in this sprint remains valid and pérenne for every other deploy target:

  • k3d / k3s lab : Traefik is the front-end → Ingress.tls or TLSStore picks up the wildcard cert directly.
  • EKS / AKS / GKE : NGINX-Ingress / GKE-LB → reads Ingress.tls.secretName natively.
  • OpenShift : Routes consume the wildcard Secret.
  • bare-metal : same as k3d.

Only Netcup-with-host-Caddy has this special path. The Caddyfile update is documented as a Netcup-specific operational step in docs/admin/deploy-netcup-tls.md (created in this commit).

Long-term recommendation

Replace the host-side Caddy with a vanilla Traefik LoadBalancer exposing port 443 directly, so the cert-manager wildcard cert becomes the single source of truth on Netcup too. ADR follow-up (Sprint 48 candidate).

Verification (2026-04-25)

$ for h in lab bi orchestrator federation experiments metrics directory llm; do
    curl -sI -k -m 5 -o /dev/null -w "$h: HTTP %{http_code}\n" "https://$h.akko-ai.com/"
  done
lab:           HTTP 302
bi:            HTTP 302
orchestrator:  HTTP 405
federation:    HTTP 303
experiments:   HTTP 302
metrics:       HTTP 405
directory:     HTTP 302
llm:           HTTP 302

All TLS handshakes succeed. Sprint 47 closed.