agent-estimation

Accurately estimate AI agent work effort using the agent's own operational units (tool-call rounds) instead of human time. Use when asked to estimate, scope, plan, or evaluate how long a coding task will take. Prevents the common failure mode where agents anchor to human developer timelines and massively overestimate. Outputs a structured breakdown with round counts, risk factors, and a final wallclock conversion.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-estimation" with this command: npx skills add zhanghandong/agent-estimation/zhanghandong-agent-estimation-agent-estimation

Agent Work Estimation Skill

Problem

AI coding agents systematically overestimate task duration because they anchor to human developer timelines absorbed from training data. A task an agent can complete in 30 minutes gets estimated as "2-3 days" because that's what a human developer forum post would say.

Solution

Force the agent to estimate from its own operational units — tool-call rounds — and only convert to human wallclock time at the very end.

Core Units

UnitDefinitionScale
RoundOne tool-call cycle: think → write code → execute → verify → fix~2-4 min wallclock
ModuleA functional unit built from multiple rounds until usable2-15 rounds
WaveA batch of modules with no mutual dependencies, executable in parallel1-N modules
ProjectAll waves sequentially + integration + debuggingSum of waves

A Round is the atomic unit. It maps directly to one iteration of:

  1. Agent reasons about what to do
  2. Agent writes/edits code
  3. Agent runs the code or a test
  4. Agent reads the output
  5. Agent decides if it needs to fix something (if yes → next round)

Estimation Procedure

When asked to estimate a task, follow these steps in order:

Step 1: Decompose into Modules

Break the task into functional modules. Each module should be independently buildable and testable. Ask yourself: "What are the distinct pieces I would build one at a time?"

Step 2: Estimate Rounds per Module

For each module, estimate the number of rounds using these anchors:

PatternTypical RoundsExamples
Boilerplate / known pattern1-2CRUD endpoint, config file, standard API client
Moderate complexity3-5Custom UI layout, state management, data pipeline
Exploratory / under-documented5-10Unfamiliar framework, platform-specific APIs, complex integrations
High uncertainty8-15Undocumented behavior, novel algorithms, multi-system debugging

Key calibration rules:

  • If you can generate the code in one shot and it will likely run → 1 round
  • If you'll need to generate, run, see an error, and fix → 2-3 rounds
  • If the library/framework has sparse docs and you'll be guessing → 5+ rounds
  • If it involves platform permissions, OS-level APIs, or environment-specific behavior the user must manually verify → add 2-3 rounds

Step 3: Assign Risk Coefficients

Each module gets a risk coefficient that inflates its round count:

Risk LevelCoefficientWhen to Apply
Low1.0Mature ecosystem, clear docs, agent has strong pattern match
Medium1.3Minor unknowns, may need 1-2 extra debug rounds
High1.5Sparse docs, platform quirks, integration unknowns
Very High2.0Possible dead ends, may need to change approach entirely

Step 3.5: Construct Waves (Optional — for parallel / multi-agent scenarios)

If the task will be executed by multiple agents or the user asks for fastest completion:

  1. Map dependencies: For each module, list which other modules it depends on
  2. Group into waves:
    • Wave 1: All modules with zero dependencies
    • Wave 2: Modules whose dependencies are all in Wave 1
    • Wave N: Modules whose dependencies are all in previous waves
  3. Note agent count: How many agents can run in parallel within each wave

Skip this step for:

  • Single-agent sequential execution
  • Projects with < 3 modules
  • Tightly coupled modules where parallelism gains are negligible

Step 4: Calculate Totals

Sequential mode (single agent, default):

Module effective rounds = base rounds × risk coefficient
Project rounds = Σ(module effective rounds) + integration rounds
Integration rounds = 10-20% of base total (for wiring modules together)

Wave mode (multi-agent):

Wave duration = max(effective rounds of modules in wave)
Project rounds = Σ(wave durations) + coordination rounds + integration rounds
Coordination rounds = 2-3 rounds upfront (contract definition between agents)

Step 5: Convert to Wallclock Time

Only at the very end, convert to human time:

Wallclock time = project rounds × minutes_per_round

Default minutes_per_round = 3 minutes (includes agent generation time + user review time).

Adjust this parameter based on context:

  • Fast iteration, user barely reviews → 2 min/round
  • Complex domain, user carefully reviews each step → 4 min/round
  • User needs to manually test (mobile, hardware, permissions) → 5 min/round

Output Format

Always output the estimation in this exact structure:

### Task: [task name]

#### Module Breakdown

| # | Module | Base Rounds | Risk | Effective Rounds | Notes |
|---|--------|------------|------|-----------------|-------|
| 1 | ...    | N          | 1.x  | M               | why   |
| 2 | ...    | N          | 1.x  | M               | why   |

#### Summary

- **Base rounds**: X
- **Integration**: +Y rounds
- **Risk-adjusted total**: Z rounds
- **Sequential wallclock**: A – B minutes (at N min/round)

**Wave Execution** (if applicable):
- Wave 1: [modules] → max M rounds
- Wave 2: [modules] → max M rounds
- Coordination overhead: +C rounds
- **Parallel wallclock**: A – B minutes (at N min/round, K agents)
- **Speedup vs sequential**: ~X%

#### Biggest Risks
1. [specific risk and what could blow up the estimate]
2. [...]

Anti-Patterns to Avoid

These are the failure modes this skill exists to prevent:

  1. Human-time anchoring: "A developer would take about 2 weeks..." → NO. Start from rounds.
  2. Padding by vibes: Adding time "just to be safe" without specific risk rationale → NO. Use risk coefficients.
  3. Confusing complexity with volume: 500 lines of boilerplate ≠ hard. One line of CGEvent API ≠ easy. Estimate by uncertainty, not line count.
  4. Forgetting integration cost: Modules work alone but break together. Always add integration rounds.
  5. Ignoring user-side bottlenecks: If the user must manually grant permissions, restart an app, or test on a device, that's extra round time. Adjust minutes_per_round, don't add phantom rounds.
  6. Assuming parallelism is free: Multi-agent wave execution has coordination cost (contract definition, conflict resolution). Always add coordination rounds.

Calibration Reference

Here are example projects with known round counts to help calibrate:

See references/calibration-examples.md for detailed examples across project types.

Eval Prompts

See evals/evals.json for test cases to validate estimation accuracy.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

rust-code-navigator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

domain-cli

No summary provided by upstream source.

Repository SourceNeeds Review
General

m15-anti-pattern

No summary provided by upstream source.

Repository SourceNeeds Review