Skip to content

mTLS service mesh — Linkerd activation runbook

AKKO ships an akko-mtls sub-chart (ADR-037) that wraps Linkerd to encrypt and mutually authenticate every pod-to-pod call inside the akko namespace. Off by default — this page documents the activation when you decide to flip it on.

Why mTLS

Today, every service-to-service call (Trino → Polaris, Airflow → PostgreSQL, ADEN → OPA, …) travels in plaintext over HTTP. NetworkPolicy gates who can talk to whom, but not what travels on the wire. mTLS adds two guarantees :

  1. Encryption — A compromised pod or a malicious operator with kubectl exec can no longer sniff sensitive traffic via tcpdump.
  2. Mutual authentication — Each pod proves its identity via a short-lived certificate issued by Linkerd. No more "blind trust" between services that share a namespace.

Compliance benefit : SOC2 / ISO27001 / PCI-DSS controls on data-in- transit are satisfied without per-service configuration.

Why Linkerd (and not Istio / Cilium)

See ADR-037. Short version :

  • Linkerd is CNCF Graduated, Apache 2.0, Foundation-neutral
  • ~10 MB RAM per sidecar (Istio is 15× larger; sinks the Netcup demo box)
  • CNI-agnostic — works on existing k3s/EKS/GKE without disruption
  • Single sub-chart, no Lego coupling explosion

Prerequisites

  • Kubernetes 1.27+ (Linkerd 2.16 requirement)
  • cert-manager already installed (re-used as the Linkerd identity issuer)
  • Helm 3.13+
  • ~500 MB free RAM cluster-wide for sidecars
  • One-time pod restart cluster-wide (sidecar injection happens at admission)

Activation

Step 1 — Install Linkerd CRDs (one-time per cluster)

helm repo add linkerd https://helm.linkerd.io/stable
helm repo update
helm install linkerd-crds linkerd/linkerd-crds \
  -n linkerd --create-namespace

Step 2 — Install Linkerd control plane

The trust anchor + issuer certificates are best provisioned via cert-manager. AKKO ships an example Issuer aligned with the existing wildcard ClusterIssuer :

# Generate a cluster-wide trust anchor (root CA, kept offline)
step certificate create root.linkerd.cluster.local \
  ca.crt ca.key --profile root-ca \
  --no-password --insecure

# Generate the per-cluster issuer cert
step certificate create identity.linkerd.cluster.local \
  issuer.crt issuer.key --profile intermediate-ca \
  --not-after 8760h --no-password --insecure \
  --ca ca.crt --ca-key ca.key

# Install Linkerd control plane with the certs
helm install linkerd-control-plane linkerd/linkerd-control-plane \
  -n linkerd \
  --set-file identityTrustAnchorsPEM=ca.crt \
  --set-file identity.issuer.tls.crtPEM=issuer.crt \
  --set-file identity.issuer.tls.keyPEM=issuer.key

Step 3 — Activate the AKKO mTLS sub-chart

helm upgrade akko helm/akko -n akko \
  --set akko-mtls.enabled=true \
  --set akko-mtls.injectNamespace=true \
  -f helm/examples/values-domain.yaml \
  -f helm/examples/values-dev-secrets.yaml

Step 4 — Restart AKKO workloads to pick up sidecars

kubectl rollout restart deploy -l app.kubernetes.io/part-of=akko -n akko
kubectl rollout restart statefulset -l app.kubernetes.io/part-of=akko -n akko

This restarts ~50 deployments rolling — full cluster pickup typically ~5 min on the Netcup demo box.

Step 5 — Verify mTLS is active

Install Linkerd CLI :

curl -sL https://run.linkerd.io/install | sh
export PATH=$PATH:$HOME/.linkerd2/bin

Run the cluster check :

linkerd check

Expected : every section ✓.

Verify mTLS handshake on a sample call :

linkerd viz tap deploy/akko-akko-cockpit -n akko --output json | head -5

Look for "tls": "true" in the output.

  1. Sprint N (audit)injectNamespace=false. Pick one workload (e.g. akko-akko-aden), annotate its pod template with linkerd.io/inject: enabled, restart, observe linkerd check for 24h.
  2. Sprint N+1 (cluster-wide audit)injectNamespace=true, restart all workloads, watch Prometheus metric request_total{tls="true"} ratio reach ~100%.
  3. Sprint N+2 (enforce) — annotate each workload with config.linkerd.io/default-inbound-policy: cluster-authenticated so plaintext requests are REJECTED.

Troubleshooting

Linkerd identity issuer cert expiry

Linkerd defaults to 24h cert lifetime with auto-rotation. If linkerd check shows the issuer cert is expiring, run :

helm upgrade linkerd-control-plane linkerd/linkerd-control-plane \
  -n linkerd \
  --set-file identityTrustAnchorsPEM=ca.crt \
  --set-file identity.issuer.tls.crtPEM=issuer-renewed.crt \
  --set-file identity.issuer.tls.keyPEM=issuer-renewed.key \
  --reuse-values

Pods stuck pending after restart

Sidecar injection adds ~50 MB memory request per pod. If Netcup hits its 16 GB ceiling, lower per-pod memory requests via linkerd-control-plane.proxy.resources.requests.memory.

Spark / Trino federation breaks

Spark Connect and Trino federation use HTTP/2 + gRPC. Linkerd handles both transparently — but the first call after enabling the sidecar can fail with "EOF" while the proxy negotiates the TLS handshake. Retry once. If it persists, check linkerd viz stat deploy/akko-akko-spark-master.

Disable mTLS (rollback)

helm upgrade akko helm/akko -n akko \
  --set akko-mtls.enabled=false \
  --set akko-mtls.injectNamespace=false

kubectl rollout restart deploy -l app.kubernetes.io/part-of=akko -n akko

Per-service rollout (Sprint 60.1)

Two activation strategies are supported. Pick one — never both.

Strategy A : namespace-wide

Inject every pod in the akko namespace. Simplest, but rolls every service over at once.

akko-mtls:
  enabled: true
  injectNamespace: true

Activate one service at a time, validate the sidecar, then move to the next. Defaults are off so a fresh helm upgrade adds nothing.

global:
  serviceMesh:
    linkerdInject:
      aden: true       # phase 1 — internal API, easy rollback
      keycloak: false  # phase 2
      postgres: false  # phase 3 (StatefulSet — drain carefully)
      trino: false     # phase 4 — wide blast radius

akko-mtls:
  enabled: true
  injectNamespace: false   # NOT namespace-wide — per-service only

After flipping a flag :

kubectl -n akko rollout restart deploy/akko-akko-aden    # or sts/akko-akko-postgresql
kubectl -n akko get pods -l app.kubernetes.io/name=akko-aden -o jsonpath='{.items[*].spec.containers[*].name}'
# Expected output : "aden linkerd-proxy"  ← sidecar present

End-to-end check : run the AKKO marathon spec ; the founder's marathon (B1..B14) covers every cross-service call ADEN makes (Trino, OPA, OpenMetadata, LiteLLM). All 13 must stay green post-injection.

Trino exception

Trino is a community Helm chart — its values surface dictates the annotation injection point. Flip via :

trino:
  coordinator:
    podAnnotations:
      linkerd.io/inject: enabled
  worker:
    podAnnotations:
      linkerd.io/inject: enabled

The umbrella global.serviceMesh.linkerdInject.trino is wired only for documentation and parity with the other 3 services ; it does NOT flow through to the community chart's pod template by itself.

Rollback

Flip the flag back to false and restart the workload. Linkerd's identity certificates rotate every 24 h ; the proxy disappears with the next rollout.

Reference

  • ADR-037 — mTLS service mesh selection
  • helm/akko/charts/akko-mtls/ — sub-chart source
  • Linkerd docs : https://linkerd.io/2-edge/getting-started/