Canary Deploy
Safe system changes with pre-flight checks, validation, and automatic rollback.
The Problem
System changes can lock you out:
- SSH hardening breaks remote access
- Firewall rules block needed ports
- Kernel parameters cause instability
- Service restarts break dependencies
Recovery without physical access is painful or impossible.
Quick Start
Before any critical change
# Capture baseline (connectivity, services, ports)
bash scripts/canary-test.sh baseline
# Make your change
sudo nano /etc/ssh/sshd_config
# Validate change didn't break anything
bash scripts/canary-test.sh validate
# If validation fails:
bash scripts/canary-test.sh rollback
For automated changes
# Full pipeline: baseline → apply → validate → rollback-if-failed
bash scripts/critical-update.sh \
--name "SSH hardening" \
--backup "/etc/ssh/sshd_config" \
--command "sudo sed -i 's/PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config && sudo systemctl reload sshd" \
--validate "ssh -o ConnectTimeout=5 localhost echo ok"
Protocol A+B (Manual Workflow)
For interactive sessions where you want human-in-the-loop:
Protocol A: Test interactively
- Tell the human: "Open a second SSH session as backup"
- Apply change in the first session
- Ask: "Test connectivity from the second session"
- If it works → confirm
- If it fails → rollback from the backup session
Protocol B: Backup first
- Run
bash scripts/canary-test.sh baseline - Verify backup is valid
- Apply change
- Run
bash scripts/canary-test.sh validate - If validation fails →
bash scripts/canary-test.sh rollback
Always use both A + B together for maximum safety.
What Gets Checked
Baseline capture
- SSH connectivity (local + remote)
- Open ports (ss -tlnp)
- Running services (systemctl)
- Firewall rules (ufw/iptables)
- Network routes
- DNS resolution
- Config file checksums
Validation
- All baseline checks re-run
- Diff against baseline
- Any regression = FAIL
Critical Change Categories
| Category | Risk | Example | Recovery |
|---|---|---|---|
| SSH config | 🔴 HIGH | sshd_config changes | Backup session |
| Firewall | 🔴 HIGH | UFW/iptables rules | Pre-change snapshot |
| Network | 🔴 HIGH | Interface/routing changes | Console access |
| Services | 🟡 MEDIUM | systemd unit changes | systemctl restart |
| Kernel params | 🟡 MEDIUM | sysctl changes | Reboot to defaults |
| Packages | 🟢 LOW | apt install/upgrade | apt rollback |
References
See references/incident-report.md for the real incident that inspired this skill.