DR drill execution log¶

Use this form after every quarterly DR drill (Drills 1–3 in dr-playbook.md). The signed log is the auditor's proof that recovery procedures were tested and the measured RTO/RPO match the targets. Without this log, the playbook is theatre.

Workflow:

Operator copies this template into a new file dr-drill-YYYY-Q[1-4].md in akko-confidential/dr-drills/ (private repo — drill outputs may contain operational details).

Operator + on-call witness execute each drill, fill in the log.

Compliance officer co-signs at the bottom.

The signed PDF is attached to the audit trail (cold storage in the akko-audit-cold bucket with object-lock retention ≥ 7 years).

Field	Value
Drill date	YYYY-MM-DD HH:MM TZ
Quarter	Q1 / Q2 / Q3 / Q4 — YYYY
Cluster	netcup-prod / aks-prod / eks-prod / scratch
AKKO version	(from `helm list -n akko`)
Operator	name + role
Witness	name + role (must be different person)
Compliance officer	name + role
Playbook revision	git SHA of `docs/docs/admin/dr-playbook.md`

Drill 1 — single-database PITR¶

See playbook §4.1. Restore one PostgreSQL DB to a point-in-time 5 min before "now". Reference DB: superset_metadata.

Step	Target	Measured	Pass / Fail
Locate latest base backup	< 5 min	__ min	☐ ☐
Restore to scratch PVC	< 15 min	__ min	☐ ☐
Replay WAL to T-5	< 5 min	__ min	☐ ☐
Verify row counts match expected	< 1 min	__ s	☐ ☐
Cut-over Service to restored DB	< 5 min	__ min	☐ ☐
Total RTO	20 min	__ min	☐ ☐
Measured RPO	5 min	__ s	☐ ☐

Issues encountered (if any):

(blank)

Action items raised (link to backlog tickets):

(blank)

Drill 2 — full-cluster restore on scratch hardware¶

See playbook §4.2. From a fresh k8s cluster, restore the entire AKKO namespace from Velero + ObjectStorage. Reference target RTO: 4 h.

Step	Target	Measured	Pass / Fail
Provision scratch cluster (k3s on cold-spare hardware)	< 30 min	__ min	☐ ☐
Install cert-manager + Traefik + Velero	< 15 min	__ min	☐ ☐
`velero restore create` from latest backup	< 90 min	__ min	☐ ☐
Restore SeaweedFS volumes from snapshot	< 60 min	__ min	☐ ☐
`helm upgrade akko` to reconcile state	< 15 min	__ min	☐ ☐
Verify cockpit + Trino + ADEN respond	< 5 min	__ min	☐ ☐
Run banking-end-to-end smoke (read-only path)	< 10 min	__ min	☐ ☐
Total RTO	4 h	__ h __ min	☐ ☐
Measured RPO	24 h (last nightly)	__ h	☐ ☐

Components that failed to restore cleanly (if any):

(blank)

Did the audit trail (decision_logs, query_log, admin_events) restore intact and tamper-free? ☐ Yes ☐ No — explain:

(blank)

Drill 3 — tabletop incident response¶

See playbook §4.3. No infrastructure changes — this is a paper exercise where the on-call team walks through the response to a named incident. Witness scores each participant's clarity + speed.

Scenario chosen (one of):

☐ A1: Active database lost (storage corruption)
☐ A2: Ransomware encrypts SeaweedFS volumes
☐ A3: Keycloak realm wiped (admin error)
☐ A4: Region-level cloud outage (provider down)
☐ A5: Insider threat — admin exfiltrates audit log
☐ A6: Custom — describe:

(blank)

Phase	Owner	Time-to-action	Observed
Detection (alert fires / user reports)	Monitoring	< 5 min	__
Triage (severity + scope)	On-call	< 10 min	__
Communication (Slack #incidents)	On-call	< 2 min	__
Containment decision	Security lead	< 15 min	__
Recovery start (which drill triggered)	Operator	< 30 min	__
Status update cadence	All-hands	every 30 min	__
All-clear declaration	Compliance	end of MTTR	__
Post-mortem within 7 d	Author TBD	within 7 d	__

Misses or hesitations observed (transparent — the post-mortem is the output of the drill, not the verdict on the team):

(blank)

Process changes raised:

(blank)

Backup-system spot-checks (run BEFORE drill 1)¶

# PostgreSQL — last basebackup must be < 4 h old
kubectl -n akko exec svc/akko-postgresql -- ls -lt /backups/ | head

# Velero — last successful backup
velero backup get --selector akko.scope=full | head -5

# SeaweedFS volume snapshot — last snapshot
kubectl -n akko exec deploy/akko-seaweedfs-master -- \
  weed shell <<< 'volume.list' | grep -E 'snapshot.*[0-9]+'

# Audit cold storage — object-lock still in force
aws s3api get-object-lock-configuration \
  --bucket akko-audit-cold | jq '.ObjectLockConfiguration.Rule.DefaultRetention'

# cert-manager — wildcard cert not expiring < 30 d
kubectl -n cert-manager get certificate -o json | \
  jq '.items[] | {name: .metadata.name, expiry: .status.notAfter}'

Record any anomalies above the drill section so a degraded backup state is captured even if the drills themselves pass.

Sign-off¶

Role	Name	Signature (initials)	Date
Operator
Witness
Compliance officer

Retention — store the signed PDF in cold storage with object-lock retention ≥ 7 years (DORA Art. 11 + GDPR Art. 30 records of processing). The git revision of the playbook used for this drill (recorded in the Header) lets future auditors prove that the procedures actually executed match the procedures documented at the time.