toil-tracker

Identify, measure, and reduce operational toil — repetitive manual work that scales linearly with service growth. Categorize toil by type, estimate engineering time lost, prioritize automation candidates, and track reduction over time.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "toil-tracker" with this command: npx skills add charlie-morrison/toil-tracker

Toil Tracker

Find the manual work that's eating your engineering time. Toil is repetitive, automatable, tactical work that scales with service size and has no lasting value. Identify it, measure it, prioritize what to automate first, and track reduction over time.

Use when: "how much toil do we have", "what should we automate", "toil budget", "manual operational work", "repetitive tasks", "SRE toil reduction", or during quarterly planning to justify automation projects.

Commands

1. survey — Catalog Toil Sources

Step 1: Identify Toil Categories

Interview the team or analyze work tracking systems. Common toil categories:

CategoryExamplesSignal
DeploysManual deploy steps, config changes, rollbacks"Someone has to click..."
TicketsPassword resets, access requests, cert renewals"Every week we get..."
MonitoringFalse alerts, manual alert triage, dashboard watching"We page about this but..."
ScalingManual capacity adjustments, resource provisioning"When traffic spikes we..."
DataManual data fixes, migrations, backfills"Users file tickets to..."
MaintenanceDependency updates, cert rotations, key rotations"Every quarter we have to..."
OnboardingSetting up dev environments, granting access"New hire setup takes..."

Step 2: Quantify Each Toil Source

# Analyze ticket systems for repetitive patterns
# Jira/Linear — find recurring ticket types
# Example: count tickets by label/type in last quarter

# Analyze on-call alerts for noise
curl -s "https://api.pagerduty.com/incidents?since=2026-01-01&until=2026-04-01&statuses[]=resolved" \
  -H "Authorization: Token token=$PD_TOKEN" | python3 -c "
import json, sys, collections
incidents = json.load(sys.stdin)['incidents']
by_service = collections.Counter(i['service']['summary'] for i in incidents)
print('Incidents by service (potential toil):')
for service, count in by_service.most_common(10):
    print(f'  {count:>4}x  {service}')
"

For each toil source, estimate:

  • Frequency: How often does this happen? (daily, weekly, per-deploy)
  • Duration: How long does it take each time? (minutes, hours)
  • People involved: How many engineers touch this?
  • Scaling: Does it grow with service count, traffic, or team size?
  • Risk: What happens if someone does it wrong?

Step 3: Calculate Toil Budget

def calculate_toil_budget(toil_items, team_size, hours_per_quarter=520):
    """
    Google SRE recommends: max 50% of SRE time on toil.
    """
    total_toil_hours = 0

    for item in toil_items:
        quarterly_hours = item['frequency_per_quarter'] * item['hours_per_occurrence'] * item['people_involved']
        total_toil_hours += quarterly_hours
        item['quarterly_hours'] = quarterly_hours

    team_capacity = team_size * hours_per_quarter
    toil_percentage = (total_toil_hours / team_capacity) * 100

    return {
        'total_toil_hours': total_toil_hours,
        'team_capacity_hours': team_capacity,
        'toil_percentage': toil_percentage,
        'status': '🟢 Healthy' if toil_percentage < 30 else '🟡 Watch' if toil_percentage < 50 else '🔴 Over budget',
        'items_ranked': sorted(toil_items, key=lambda x: -x['quarterly_hours']),
    }

Step 4: Generate Report

# Toil Report — Q2 2026

## Summary
- Team size: 6 SREs
- Total toil: 420h/quarter (13.5h/person/week)
- Toil budget: 34% of capacity 🟡 (target: <30%)

## Top Toil Sources (ranked by hours)
| Rank | Category | Task | Freq | Duration | Hours/Q | Automatable? |
|------|----------|------|------|----------|---------|-------------|
| 1 | Tickets | Access requests | 20/week | 15 min | 65h | ✅ Self-serve portal |
| 2 | Deploys | Manual prod deploy | 3/week | 45 min | 58.5h | ✅ CI/CD pipeline |
| 3 | Monitoring | False alert triage | 10/week | 20 min | 43h | ✅ Tune thresholds |
| 4 | Data | Customer data fixes | 5/week | 30 min | 32.5h | ✅ Admin tool |
| 5 | Maintenance | Cert renewals | 12/quarter | 2h | 24h | ✅ auto-renew |

## Automation ROI
| Project | Est. Effort | Toil Saved/Q | Payback |
|---------|------------|-------------|---------|
| Self-serve access portal | 80h | 65h | 1.2 quarters |
| CD pipeline | 120h | 58.5h | 2.1 quarters |
| Alert tuning sprint | 20h | 43h | 0.5 quarters |
| Admin data tool | 60h | 32.5h | 1.8 quarters |
| Auto cert renewal | 8h | 24h | 0.3 quarters |

## Recommendation
Start with alert tuning (fastest ROI) and auto cert renewal (lowest effort). Then tackle self-serve access portal. Defer CD pipeline to Q3 (high effort but high payoff).

2. prioritize — Rank Automation Candidates

Score each toil source by:

  • Hours saved per quarter (impact)
  • Automation effort (cost)
  • Risk of manual error (safety)
  • Growth rate (will it get worse?)

Calculate ROI = hours_saved_per_quarter / automation_hours.

3. track — Monitor Toil Reduction Over Time

Compare toil hours quarter-over-quarter:

  • Total toil hours trending up or down?
  • Which automation projects delivered expected savings?
  • New toil sources appearing?
  • Toil percentage within SRE budget (< 50%)?

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

声音制作规范,Jiuge_Flow_Perfect_V1.skill

九歌传媒机器人的基础行为规范,约束文件发送、定时任务、文档生成三大核心行为。当需要发送文件时、安排定时任务时、生成文档时,必须遵循本规范。本规范优先级高于其他技能的具体指令。

Registry SourceRecently Updated
General

Report Expert

生成 HTML 报告页面并部署到 Cloudflare Pages 站点。涵盖设计系统、页面结构、索引管理、iframe 内嵌查看、自动部署全流程。触发词:写报告、发布报告、部署报告、生成报告页面、report publisher、报告专家、升级报告专家、更新报告技能、发布技能升级。

Registry SourceRecently Updated
General

Nexlink

🔗 NexLink — Enterprise Connector for Nextcloud, Exchange & YouTube. Built by Firma de AI. Email, calendar, tasks, file management, document understanding, t...

Registry SourceRecently Updated
General

Prompt Wizard

Generate high-quality English prompts for ChatGPT Image 2. Use when user wants to create AI image prompts, needs GPT-Image-2 prompt writing help, describes a...

Registry SourceRecently Updated