Adaptive Subagent Routing
FIRST: Decompose Before Acting
Do NOT start working on the task directly. Before reading files or making edits, decompose the task into subtasks and delegate each one to the cheapest sufficient model via the Agent tool. The only exception is if the entire task is under ~100 tokens of output (a one-liner fix).
Quick Decision (apply to EVERY delegation)
Default: Haiku. Escalate only with reason.
- Is this a mechanical lookup? (find files, grep for a string, list structure, check if something exists) → Haiku
- Does it require comprehension or writing? (implement, refactor, test, debug, fix, analyze code, explain logic, summarize complex systems) → Sonnet
- Is it architectural, security-sensitive, or ambiguous? → Opus
Haiku boundary: Haiku can find things and format things. It cannot understand or reason about code. If the task requires reading code and making a judgment, that's Sonnet.
If unsure between two tiers, pick the cheaper one. Escalate after failure, not before.
Before each delegation, output **Routing to {Model}:** {brief reason} on its own line.
When handling work inline, output **Staying on {Model}:** {brief reason} instead.
When to Delegate vs Handle Inline
Always delegate to a cheaper tier — the 3x–15x cost savings always outweigh subagent overhead.
Skip delegation only when:
- The output is under ~100 tokens (one-liner answers, quick fixes already in context)
- The overhead of spawning a subagent exceeds the work itself
- The subtask needs the same model tier you're already on and context is already available
Routing Rules
| Haiku (1x) | Sonnet (3x) | Opus (15x) |
|---|---|---|
| File search, grep, glob | Implementation, bug fix | Architecture, migration |
| Format/lint existing text | Refactor (< 10 files) | Security review |
| Check if file/string exists | Test writing | Ambiguous/conflicting scope |
| List files, directory structure | Debugging with stack traces | Multi-system design |
| Literal string replacements | Code generation (< 500 lines) | Performance analysis |
| Summarize/analyze code logic | ||
| Rename across multiple files | ||
| Code review, explain code |
Escalate +1 tier: ambiguous requirements, public-facing output, security-sensitive, production-critical, or 2 consecutive failures. Downgrade: after planning completes, or remaining work is formatting/summarization.
Cost Ratios
| Model | Relative Cost | Pricing |
|---|---|---|
| Haiku (1x) | Baseline | ~$1/M input, $5/M output |
| Sonnet (3x) | 3x Haiku | ~$3/M input, $15/M output |
| Opus (15x) | 15x Haiku | ~$15/M input, $75/M output |
A 10-delegation task routed to Haiku instead of Opus saves ~93% on those calls.
Per-Subtask Routing
When a request involves multiple steps:
- Decompose first. Break the request into a todo list of subtasks.
- Route each subtask independently. Most items are cheaper than the overall task.
- Downgrade after planning. If Opus produces a plan, implementation goes to Sonnet. Lookups and cleanup go to Haiku.
Example — "design and implement a caching layer":
- Plan the architecture → Opus
- Search for existing cache usage → Haiku
- Implement the cache module → Sonnet
- Write tests → Sonnet
- Update internal dev notes → Haiku
- Write public API docs → Sonnet (escalated: public-facing)
Delegation
Choose agent type by task and model by complexity independently.
| Agent Type | Use When | Tools Available |
|---|---|---|
Explore | Search, grep, read-only exploration | Read, Glob, Grep (no Edit/Write) |
general-purpose | Implementation, edits, multi-step work | All tools |
Plan | Architecture design, implementation planning | Read, Glob, Grep (no Edit/Write) |
Combine freely — Explore + haiku for file lookups, general-purpose + sonnet for implementation, Explore + sonnet for complex investigation, Plan + sonnet for straightforward design.
Agent(subagent_type: "Explore", model: "haiku", description: "...", prompt: "...")
Agent(subagent_type: "general-purpose", model: "sonnet", description: "...", prompt: "...")
Agent(subagent_type: "Explore", model: "sonnet", description: "...", prompt: "...")
Agent(subagent_type: "Plan", model: "sonnet", description: "...", prompt: "...")
Keep prompts scoped: relevant files, constraints, expected output format. The description parameter is required — use a short 3-5 word summary.
Expected Output Formats
| Tier | Expected Output |
|---|---|
| Haiku | File paths, grep results, existence checks, formatted text |
| Sonnet | Code diffs, implementation files, test files, error analysis |
| Opus | Numbered plan steps, architecture decisions with rationale, risk assessment |
Parallel Fan-Out
When a request contains independent subtasks, launch multiple subagents in a single message.
Before splitting, build a dependency graph. Does any subtask need another's output, or do they modify the same file? If yes, run them sequentially.
| Pattern | Parallel? | Why |
|---|---|---|
| Separate files, separate concerns | Yes | No shared state |
| Research + unrelated implementation | Yes | Read-only doesn't conflict with writes |
| Implementation + its tests | No | Tests depend on the implementation |
| Feature + docs describing that feature | No | Docs need to reflect what was built |
| Two features touching different modules | Yes | Independent code paths |
| Refactor + anything in same files | No | Refactor changes the baseline |
After all phases return, review outputs for conflicts before applying.
User Overrides
If the user says any of the following, respect it for the current request:
- "no routing" / "don't delegate" — handle everything inline on the current model.
- "use Opus" / "use Sonnet" / "use Haiku" — force that tier for all delegations.
Log the override with **Override:** {what the user requested} and resume normal routing on the next request.
Routing Log
After each routing decision, append a line to routing.log in the project root using the Bash tool:
echo "{model} | {agent_type or inline} | {estimated savings vs Opus, e.g. ~93%} | {brief task description}" >> routing.log
Log every **Routing to** and **Staying on** decision.
Guardrails
- Max 2 retries per subagent, then escalate or ask the user.
- Max 5 sequential delegations per user request. No limit on parallel.
- Always review subagent outputs before applying — check for conflicts, hallucinated paths, and incomplete work.
- Do NOT modify project files: Never write routing rules to CLAUDE.md. Never add hooks to settings.json. This skill is loaded automatically by the plugin.