Skill: Documentation (ADR & Runbook)

Category: Documentation Version: 1.0.0 Used By: All agents, Phase 8

Overview

Create Architecture Decision Records (ADRs) and Runbooks for operational documentation.

Part 1: Architecture Decision Records (ADR)

When to Create ADR

Choosing between technologies
Significant architectural changes
New patterns or conventions
Deprecating existing approaches

ADR Template

ADR-[NUMBER]: [TITLE]

Status: [Proposed | Accepted | Deprecated | Superseded by ADR-XXX] Date: YYYY-MM-DD Deciders: [Names/Teams]

Context

[What is the issue? Why do we need to make a decision?]

Decision

[What is the change being proposed/decided?]

Options Considered

Option 1: [Name]

Pros: [Benefits]
Cons: [Drawbacks]

Option 2: [Name]

Pros: [Benefits]
Cons: [Drawbacks]

Option 3: [Name]

Pros: [Benefits]
Cons: [Drawbacks]

Consequences

Positive

[Benefit 1]
[Benefit 2]

Negative

[Tradeoff 1]
[Tradeoff 2]

Risks

[Risk 1] - Mitigation: [How to handle]

References

[Link to relevant docs/discussions]

ADR Example

ADR-001: Use PostgreSQL for Primary Database

Status: Accepted Date: 2025-01-15 Deciders: Backend Team, DevOps

Context

We need a relational database for our new application. The application requires ACID compliance, complex queries, and JSON support.

Decision

Use PostgreSQL 16 as the primary database.

Options Considered

Option 1: PostgreSQL

Pros: ACID, JSON support, excellent performance, open source
Cons: Requires more ops expertise than managed solutions

Option 2: MySQL

Pros: Familiar, widely supported
Cons: Weaker JSON support, licensing concerns

Option 3: MongoDB

Pros: Flexible schema, easy scaling
Cons: Not ideal for relational data, eventual consistency

Consequences

Positive

Full ACID compliance
Native JSON/JSONB support
Strong ecosystem and tooling

Negative

Team needs PostgreSQL training
More complex backup strategy

ADR Naming Convention

docs/adr/ ├── ADR-001-database-selection.md ├── ADR-002-authentication-strategy.md ├── ADR-003-api-versioning.md └── README.md (index)

Part 2: Runbook

When to Create Runbook

New service deployment
Common operational tasks
Incident response procedures
On-call handoff documentation

Runbook Template

Runbook: [Service/Task Name]

Service: [Service name] Owner: [Team/Person] Last Updated: YYYY-MM-DD On-Call: [Rotation/Contact]

Overview

[Brief description of what this runbook covers]

Prerequisites

Access to [system/tool]
Credentials for [service]
VPN connected (if applicable)

Common Operations

Start Service

# Command to start
systemctl start service-name

# Verify running
systemctl status service-name

Stop Service

# Graceful shutdown
systemctl stop service-name

# Force stop (if graceful fails)
systemctl kill service-name

Check Logs

# Recent logs
journalctl -u service-name -n 100

# Follow logs
journalctl -u service-name -f

# Search for errors
journalctl -u service-name | grep -i error

Health Check

# Endpoint check
curl -s http://localhost:8080/health | jq

# Expected response
# { "status": "healthy", "version": "1.0.0" }

Troubleshooting

Issue: Service Won't Start

Symptoms: Service fails to start, exits immediately

Diagnosis:

journalctl -u service-name -n 50

Common Causes:

- Missing environment variables → Check .env
 file

- Port already in use → lsof -i :8080

- Database connection failed → Check DB connectivity

Resolution:

# Fix env vars
source /etc/service-name/env

# Restart
systemctl restart service-name

Issue: High Memory Usage

Symptoms: Memory > 80% threshold

Diagnosis:

# Check memory
free -h
ps aux --sort=-%mem | head -10

Resolution:

# Restart service (temporary)
systemctl restart service-name

# Scale if needed
kubectl scale deployment service-name --replicas=3

Alerts &#x26; Escalation

Alert
Severity
Action
Escalate After

Service Down
Critical
Restart, check logs
5 min

High CPU
Warning
Monitor, scale if needed
15 min

High Memory
Warning
Restart if > 90%
10 min

Error Rate > 5%
Critical
Check logs, rollback
5 min

Contacts

Role
Name
Contact

Primary On-Call
[Name]
[Slack/Phone]

Secondary
[Name]
[Slack/Phone]

Team Lead
[Name]
[Slack/Phone]

Related Documentation

- Service Architecture

- Deployment Guide

- Monitoring Dashboard

### Runbook Naming Convention

docs/runbooks/
├── api-service.md
├── database-maintenance.md
├── deployment-rollback.md
├── incident-response.md
└── README.md (index)

---

## Documentation Checklist

### ADR Checklist
- [ ] Clear problem statement
- [ ] Options evaluated objectively
- [ ] Decision clearly stated
- [ ] Consequences documented
- [ ] Numbered and indexed

### Runbook Checklist
- [ ] Prerequisites listed
- [ ] Commands are copy-paste ready
- [ ] Common issues documented
- [ ] Escalation path defined
- [ ] Contacts current

---

## Best Practices

### Do's
- Keep ADRs immutable (supersede, don't edit)
- Test runbook commands before documenting
- Include "why" not just "what"
- Review runbooks after incidents
- Index all documentation

### Don'ts
- Delete old ADRs (mark superseded)
- Write runbooks without testing
- Assume reader knows context
- Let docs become stale
- Skip the troubleshooting section

---

**Version:** 1.0.0 | **Last Updated:** 2025-11-28

documentation

Safety Notice

Copy this and send it to your AI assistant to learn

ADR-[NUMBER]: [TITLE]

Context

Decision

Options Considered

Option 1: [Name]

Option 2: [Name]

Option 3: [Name]

Consequences

Positive

Negative

Risks

References

ADR-001: Use PostgreSQL for Primary Database

Context

Decision

Options Considered

Option 1: PostgreSQL

Option 2: MySQL

Option 3: MongoDB

Consequences

Positive

Negative

Runbook: [Service/Task Name]

Overview

Prerequisites

Common Operations

Start Service

Source Transparency

Related Skills

stitch-design

angular-expert

visual-pixel-perfect