incident-hotfix

Use this skill for incident response and hotfix deployment. Invoke when production issues occur requiring immediate attention.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "incident-hotfix" with this command: npx skills add gentamura/dotfiles/gentamura-dotfiles-incident-hotfix

Incident Hotfix

Structured incident response, hotfix, and postmortem process.

Severity Levels

LevelImpactResponse TimeExamples
P0System downImmediateComplete outage, data loss
P1Major feature broken< 1 hourAuth broken, payments failing
P2Feature degraded< 4 hoursSlow performance, partial outage
P3Minor issue< 24 hoursUI bug, non-critical error

Incident Response

1. Assess

  • Identify the symptom
  • Determine severity level
  • Check monitoring dashboards
  • Review recent deployments
  • Communicate status

2. Isolate

  • Identify affected components
  • Check error logs
  • Review recent changes
  • Determine blast radius

3. Mitigate

Choose one:

Rollback (safest)

  • Revert to last known good state
  • Apply when cause is unclear

Hotfix (targeted)

  • Minimal change to fix issue
  • Apply when cause is clear and fix is simple

Feature Flag (quick)

  • Disable problematic feature
  • Apply when feature is isolatable

4. Verify

  • Issue resolved
  • Error rates normalized
  • Performance restored
  • No side effects

5. Communicate

  • Update status page
  • Notify stakeholders
  • Document timeline

Hotfix Process

1. Create Hotfix Branch

# From production/main branch
git checkout main
git pull origin main
git checkout -b hotfix/issue-description

2. Minimal Fix

Rules for hotfix code:

  • Smallest possible change
  • No refactoring
  • No unrelated changes
  • Must pass tests

3. Verify

bun run lint:fix
bun run build
bun run test

# Test the specific fix
# Verify in staging if time permits

4. Deploy

# Merge to main
git checkout main
git merge hotfix/issue-description

# Tag and deploy
git tag -a v1.2.4 -m "Hotfix: issue description"
git push origin main --tags

5. Backport

# Merge hotfix to develop branch
git checkout develop
git merge hotfix/issue-description
git push origin develop

Postmortem Template

Write a postmortem within 48 hours of resolution.

# Incident Postmortem: [Title]

## Summary

| Field | Value |
|-------|-------|
| Date | YYYY-MM-DD |
| Duration | X hours Y minutes |
| Severity | P0/P1/P2/P3 |
| Author | [Name] |

## Impact

- [Number of users affected]
- [Revenue impact if applicable]
- [Other business impact]

## Timeline (UTC)

| Time | Event |
|------|-------|
| HH:MM | Issue first detected |
| HH:MM | Team alerted |
| HH:MM | Root cause identified |
| HH:MM | Fix deployed |
| HH:MM | Issue resolved |

## Root Cause

[Clear, technical explanation of what caused the incident]

## Detection

How was the incident detected?
- [ ] Monitoring alert
- [ ] Customer report
- [ ] Internal discovery

Could we have detected it earlier?
- [Analysis]

## Resolution

What fixed the issue?
- [Description of fix]

Was it a rollback or hotfix?
- [Details]

## Lessons Learned

### What Went Well
- [Point 1]
- [Point 2]

### What Went Wrong
- [Point 1]
- [Point 2]

### Where We Got Lucky
- [Point 1]

## Action Items

| Action | Owner | Due Date | Status |
|--------|-------|----------|--------|
| [Action 1] | [Name] | YYYY-MM-DD | [ ] |
| [Action 2] | [Name] | YYYY-MM-DD | [ ] |

## Prevention

How do we prevent this class of issue?

- [ ] Add monitoring for [X]
- [ ] Add test for [Y]
- [ ] Improve process for [Z]

Communication Templates

Initial Alert

🚨 [P1] Investigating issues with [service]

Impact: [Brief description]
Status: Investigating
ETA: Unknown

Updates to follow.

Update

🔄 [P1] Update on [service] issue

Status: Root cause identified, fix in progress
Impact: [Updated impact]
ETA: [Time estimate]

Next update in [X] minutes.

Resolution

✅ [P1] Resolved: [service] issue

Duration: [X hours Y minutes]
Resolution: [Brief description]

Postmortem to follow within 48 hours.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

done

No summary provided by upstream source.

Repository SourceNeeds Review
General

pr-create

No summary provided by upstream source.

Repository SourceNeeds Review
General

architecture-adr

No summary provided by upstream source.

Repository SourceNeeds Review
General

release-runbook

No summary provided by upstream source.

Repository SourceNeeds Review