mTLS service mesh — Linkerd activation runbook¶
AKKO ships an akko-mtls sub-chart (ADR-037) that wraps Linkerd to
encrypt and mutually authenticate every pod-to-pod call inside the
akko namespace. Off by default — this page documents the
activation when you decide to flip it on.
Why mTLS¶
Today, every service-to-service call (Trino → Polaris, Airflow → PostgreSQL, ADEN → OPA, …) travels in plaintext over HTTP. NetworkPolicy gates who can talk to whom, but not what travels on the wire. mTLS adds two guarantees :
- Encryption — A compromised pod or a malicious operator with
kubectl execcan no longer sniff sensitive traffic via tcpdump. - Mutual authentication — Each pod proves its identity via a short-lived certificate issued by Linkerd. No more "blind trust" between services that share a namespace.
Compliance benefit : SOC2 / ISO27001 / PCI-DSS controls on data-in- transit are satisfied without per-service configuration.
Why Linkerd (and not Istio / Cilium)¶
See ADR-037. Short version :
- Linkerd is CNCF Graduated, Apache 2.0, Foundation-neutral
- ~10 MB RAM per sidecar (Istio is 15× larger; sinks the Netcup demo box)
- CNI-agnostic — works on existing k3s/EKS/GKE without disruption
- Single sub-chart, no Lego coupling explosion
Prerequisites¶
- Kubernetes 1.27+ (Linkerd 2.16 requirement)
- cert-manager already installed (re-used as the Linkerd identity issuer)
- Helm 3.13+
- ~500 MB free RAM cluster-wide for sidecars
- One-time pod restart cluster-wide (sidecar injection happens at admission)
Activation¶
Step 1 — Install Linkerd CRDs (one-time per cluster)¶
helm repo add linkerd https://helm.linkerd.io/stable
helm repo update
helm install linkerd-crds linkerd/linkerd-crds \
-n linkerd --create-namespace
Step 2 — Install Linkerd control plane¶
The trust anchor + issuer certificates are best provisioned via
cert-manager. AKKO ships an example Issuer aligned with the
existing wildcard ClusterIssuer :
# Generate a cluster-wide trust anchor (root CA, kept offline)
step certificate create root.linkerd.cluster.local \
ca.crt ca.key --profile root-ca \
--no-password --insecure
# Generate the per-cluster issuer cert
step certificate create identity.linkerd.cluster.local \
issuer.crt issuer.key --profile intermediate-ca \
--not-after 8760h --no-password --insecure \
--ca ca.crt --ca-key ca.key
# Install Linkerd control plane with the certs
helm install linkerd-control-plane linkerd/linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=ca.crt \
--set-file identity.issuer.tls.crtPEM=issuer.crt \
--set-file identity.issuer.tls.keyPEM=issuer.key
Step 3 — Activate the AKKO mTLS sub-chart¶
helm upgrade akko helm/akko -n akko \
--set akko-mtls.enabled=true \
--set akko-mtls.injectNamespace=true \
-f helm/examples/values-domain.yaml \
-f helm/examples/values-dev-secrets.yaml
Step 4 — Restart AKKO workloads to pick up sidecars¶
kubectl rollout restart deploy -l app.kubernetes.io/part-of=akko -n akko
kubectl rollout restart statefulset -l app.kubernetes.io/part-of=akko -n akko
This restarts ~50 deployments rolling — full cluster pickup typically ~5 min on the Netcup demo box.
Step 5 — Verify mTLS is active¶
Install Linkerd CLI :
Run the cluster check :
Expected : every section ✓.
Verify mTLS handshake on a sample call :
Look for "tls": "true" in the output.
Phased rollout (recommended)¶
- Sprint N (audit) —
injectNamespace=false. Pick one workload (e.g.akko-akko-aden), annotate its pod template withlinkerd.io/inject: enabled, restart, observelinkerd checkfor 24h. - Sprint N+1 (cluster-wide audit) —
injectNamespace=true, restart all workloads, watch Prometheus metricrequest_total{tls="true"}ratio reach ~100%. - Sprint N+2 (enforce) — annotate each workload with
config.linkerd.io/default-inbound-policy: cluster-authenticatedso plaintext requests are REJECTED.
Troubleshooting¶
Linkerd identity issuer cert expiry¶
Linkerd defaults to 24h cert lifetime with auto-rotation. If
linkerd check shows the issuer cert is expiring, run :
helm upgrade linkerd-control-plane linkerd/linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=ca.crt \
--set-file identity.issuer.tls.crtPEM=issuer-renewed.crt \
--set-file identity.issuer.tls.keyPEM=issuer-renewed.key \
--reuse-values
Pods stuck pending after restart¶
Sidecar injection adds ~50 MB memory request per pod. If Netcup hits
its 16 GB ceiling, lower per-pod memory requests via
linkerd-control-plane.proxy.resources.requests.memory.
Spark / Trino federation breaks¶
Spark Connect and Trino federation use HTTP/2 + gRPC. Linkerd handles
both transparently — but the first call after enabling the sidecar
can fail with "EOF" while the proxy negotiates the TLS handshake.
Retry once. If it persists, check linkerd viz stat deploy/akko-akko-spark-master.
Disable mTLS (rollback)¶
helm upgrade akko helm/akko -n akko \
--set akko-mtls.enabled=false \
--set akko-mtls.injectNamespace=false
kubectl rollout restart deploy -l app.kubernetes.io/part-of=akko -n akko
Per-service rollout (Sprint 60.1)¶
Two activation strategies are supported. Pick one — never both.
Strategy A : namespace-wide¶
Inject every pod in the akko namespace. Simplest, but rolls every service over at once.
Strategy B : per-service opt-in (recommended)¶
Activate one service at a time, validate the sidecar, then move to the next. Defaults are off so a fresh helm upgrade adds nothing.
global:
serviceMesh:
linkerdInject:
aden: true # phase 1 — internal API, easy rollback
keycloak: false # phase 2
postgres: false # phase 3 (StatefulSet — drain carefully)
trino: false # phase 4 — wide blast radius
akko-mtls:
enabled: true
injectNamespace: false # NOT namespace-wide — per-service only
After flipping a flag :
kubectl -n akko rollout restart deploy/akko-akko-aden # or sts/akko-akko-postgresql
kubectl -n akko get pods -l app.kubernetes.io/name=akko-aden -o jsonpath='{.items[*].spec.containers[*].name}'
# Expected output : "aden linkerd-proxy" ← sidecar present
End-to-end check : run the AKKO marathon spec ; the founder's marathon (B1..B14) covers every cross-service call ADEN makes (Trino, OPA, OpenMetadata, LiteLLM). All 13 must stay green post-injection.
Trino exception¶
Trino is a community Helm chart — its values surface dictates the annotation injection point. Flip via :
trino:
coordinator:
podAnnotations:
linkerd.io/inject: enabled
worker:
podAnnotations:
linkerd.io/inject: enabled
The umbrella global.serviceMesh.linkerdInject.trino is wired only
for documentation and parity with the other 3 services ; it does NOT
flow through to the community chart's pod template by itself.
Rollback¶
Flip the flag back to false and restart the workload. Linkerd's
identity certificates rotate every 24 h ; the proxy disappears with
the next rollout.
Reference¶
- ADR-037 — mTLS service mesh selection
helm/akko/charts/akko-mtls/— sub-chart source- Linkerd docs : https://linkerd.io/2-edge/getting-started/