incident-commander

Guide incident response with structured communication, timeline tracking, severity assessment, stakeholder updates, and post-incident review — a virtual incident commander for on-call teams.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "incident-commander" with this command: npx skills add charlie-morrison/cm-incident-commander

Incident Commander

Guide teams through incident response with structured communication templates, timeline tracking, severity assessment, stakeholder updates, and post-incident review facilitation. Acts as a virtual incident commander to keep response organized and effective.

Usage

"We have a production incident — help me manage it"
"Draft an incident communication for stakeholders"
"Create a timeline for the current incident"
"Help me run a post-incident review"
"Assess the severity of this outage"

How It Works

1. Incident Declaration

When an incident is reported, establish structure:

Severity assessment:

  • SEV-1 (Critical): Complete service outage, data loss, security breach
    • Response: All-hands, exec notification, 15-min status updates
  • SEV-2 (High): Major feature degraded, significant user impact
    • Response: On-call team + leads, 30-min updates
  • SEV-3 (Medium): Minor feature degraded, workaround available
    • Response: On-call team, hourly updates
  • SEV-4 (Low): Cosmetic issue, minimal impact
    • Response: Normal ticket workflow

Information gathering:

  • What's broken? (symptom description)
  • When did it start? (first alert or user report)
  • Who's affected? (users, regions, services)
  • What changed recently? (deployments, config, infrastructure)
  • What's the blast radius? (single service vs cascading)

2. Role Assignment

Establish incident roles:

  • Incident Commander (IC): Coordinates response, makes decisions
  • Technical Lead: Drives investigation and fix
  • Communications Lead: Handles stakeholder updates
  • Scribe: Maintains timeline and documents actions

3. Communication Templates

Initial notification:

🔴 INCIDENT DECLARED — SEV-[1/2/3]

Impact: [what's broken, who's affected]
Started: [timestamp]
Status: Investigating
IC: [name]

Next update: [time]
War room: [link]

Status update:

🔄 INCIDENT UPDATE — SEV-[X] — [duration]

Status: [Investigating / Identified / Fixing / Monitoring]
Impact: [current impact description]
Root cause: [if identified]
Actions:
- [what's being done]
- [what's next]

Next update: [time]

Resolution:

✅ INCIDENT RESOLVED — SEV-[X] — Total duration: [time]

Root cause: [brief description]
Fix: [what was done]
Impact: [final impact assessment]
Monitoring: [what we're watching]

Post-incident review scheduled: [date]

4. Timeline Management

Maintain a precise incident timeline:

## Incident Timeline — INC-2026-0430

14:23 UTC — First alert: API latency >5s (PagerDuty)
14:25 UTC — On-call acknowledged, began investigation
14:28 UTC — Identified: database connection pool exhausted
14:30 UTC — IC declared SEV-2, notified engineering leads
14:32 UTC — Root cause: migration running without connection limit
14:35 UTC — Action: killed migration, restarted connection pools
14:38 UTC — API latency returning to normal
14:45 UTC — Confirmed: all services healthy
14:50 UTC — SEV-2 resolved, total duration: 27 minutes

5. Escalation Decisions

Guide escalation based on:

  • Duration exceeding SLA thresholds
  • Impact expanding to new services
  • Investigation stalled (>30 min without progress)
  • Customer-facing impact increasing
  • Data integrity concerns emerging

6. Post-Incident Review

Facilitate blameless post-incident review:

Template:

## Post-Incident Review — INC-2026-0430

### Summary
[1-2 sentence description of what happened]

### Timeline
[Key events with timestamps]

### Root Cause
[Technical root cause]
[Contributing factors]

### Impact
- Duration: [time]
- Users affected: [count/percentage]
- Revenue impact: [if applicable]
- SLA impact: [remaining budget]

### What Went Well
- [Fast detection, good communication, etc.]

### What Could Be Improved
- [Gaps in monitoring, slow escalation, etc.]

### Action Items
- [ ] [Specific action] — Owner: [name] — Due: [date]
- [ ] [Specific action] — Owner: [name] — Due: [date]

### Lessons Learned
[Key takeaways for the team]

7. Runbook Suggestions

Based on incident type, suggest relevant runbooks:

  • Database issues → connection pool reset, failover, backup restore
  • Deployment issues → rollback steps, canary analysis
  • Infrastructure → scaling, failover, DNS changes
  • Security → containment, communication, forensics

Output

Provides real-time incident management guidance tailored to severity and context.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Observability & Reliability Engineering

Complete observability & reliability engineering system. Use when designing monitoring, implementing structured logging, setting up distributed tracing, buil...

Registry SourceRecently Updated
5311Profile unavailable
Automation

Delx Ops Guardian

Automatically detects, assesses, and safely mitigates incidents in OpenClaw production agents, providing detailed reports and verified recovery.

Registry SourceRecently Updated
5130Profile unavailable
Automation

AI Agent Observability

Evaluate and monitor AI agent fleets across six key dimensions to score health, identify issues, and optimize performance for ops teams managing 1-100+ agents.

Registry SourceRecently Updated
5990Profile unavailable
Coding

ITIL Ops

ITIL-aligned incident, problem, and change management for AI agents. Use when: detecting service crashes, analyzing recurring failures, tracking incidents to...

Registry SourceRecently Updated
1480Profile unavailable