Model Hierarchy
Route tasks to the cheapest model that can handle them. Most agent work is routine.
Core Principle
80% of agent tasks are janitorial. File reads, status checks, formatting, simple Q&A. These don't need expensive models. Reserve premium models for problems that actually require deep reasoning.
Model Tiers
Tier 1: Cheap ($0.10-0.50/M tokens)
Model Input Output Best For
DeepSeek V3 $0.14 $0.28 General routine work
GPT-4o-mini $0.15 $0.60 Quick responses
Claude Haiku $0.25 $1.25 Fast tool use
Gemini Flash $0.075 $0.30 High volume
GLM 5 (Zhipu) (OpenRouter Z.AI) (OpenRouter Z.AI) Routine + moderate text; 200K context; text-only — do not use for image/vision
Kimi K2.5 (Moonshot) $0.45 $2.25 Routine + moderate; 262K context; multimodal (text + image + video)
Text-only models (e.g. GLM 5): Do not use for any task that requires image input or vision — no photo analysis, screenshots, image-generation tools, or document/chart vision. Route to a vision-capable model (e.g. Kimi K2.5, GPT-4o, Gemini, Claude with vision, GLM-4.5V/4.6V).
Vision-capable Tier 1/2 (e.g. Kimi K2.5): Use for routine or moderate tasks that may involve images — screenshots, photo analysis, docs, image-generation orchestration — without moving to premium vision models.
Tier 2: Mid ($1-5/M tokens)
Model Input Output Best For
Claude Sonnet $3.00 $15.00 Balanced performance
GPT-4o $2.50 $10.00 Multimodal tasks
Gemini Pro $1.25 $5.00 Long context
Tier 3: Premium ($10-75/M tokens)
Model Input Output Best For
Claude Opus $15.00 $75.00 Complex reasoning
GPT-4.5 $75.00 $150.00 Frontier tasks
o1 $15.00 $60.00 Multi-step reasoning
o3-mini $1.10 $4.40 Reasoning on budget
Prices as of Feb 2026. Check provider docs for current rates.
Task Classification
Before executing any task, classify it:
ROUTINE → Use Tier 1
Requires image/vision → Do not assign to text-only models (GLM 5, etc.). Use a vision-capable model from Tier 1/2 or 3 (e.g. Kimi K2.5, GPT-4o, Gemini, Claude, GLM-4.5V).
Characteristics:
-
Single-step operations
-
Clear, unambiguous instructions
-
No judgment required
-
Deterministic output expected
Examples:
-
File read/write operations
-
Status checks and health monitoring
-
Simple lookups (time, weather, definitions)
-
Formatting and restructuring text
-
List operations (filter, sort, transform)
-
API calls with known parameters
-
Heartbeat and cron tasks
-
URL fetching and basic parsing
MODERATE → Use Tier 2
Characteristics:
-
Multi-step but well-defined
-
Some synthesis required
-
Standard patterns apply
-
Quality matters but isn't critical
Examples:
-
Code generation (standard patterns)
-
Summarization and synthesis
-
Draft writing (emails, docs, messages)
-
Data analysis and transformation
-
Multi-file operations
-
Tool orchestration
-
Code review (non-security)
-
Search and research tasks
COMPLEX → Use Tier 3
Characteristics:
-
Novel problem solving required
-
Multiple valid approaches
-
Nuanced judgment calls
-
High stakes or irreversible
-
Previous attempts failed
Examples:
-
Multi-step debugging
-
Architecture and design decisions
-
Security-sensitive code review
-
Tasks where cheaper model already failed
-
Ambiguous requirements needing interpretation
-
Long-context reasoning (>50K tokens)
-
Creative work requiring originality
-
Adversarial or edge-case handling
Decision Algorithm
function selectModel(task): # Rule 1: Vision override (Tier 1/2 includes text-only models) if task.requiresImageInput or task.requiresVision: return VISION_CAPABLE_MODEL # e.g. Kimi K2.5, GPT-4o, Gemini, Claude; do not use GLM 5 or other text-only
# Rule 2: Escalation override
if task.previousAttemptFailed:
return nextTierUp(task.previousModel)
# Rule 3: Explicit complexity signals
if task.hasSignal("debug", "architect", "design", "security"):
return TIER_3
if task.hasSignal("write", "code", "summarize", "analyze"):
return TIER_2
# Rule 4: Default classification
complexity = classifyTask(task)
if complexity == ROUTINE:
return TIER_1
elif complexity == MODERATE:
return TIER_2
else:
return TIER_3
Behavioral Rules
For Main Session
-
Default to Tier 2 for interactive work
-
Suggest downgrade when doing routine work: "This is routine - I can handle this on a cheaper model or spawn a sub-agent."
-
Request upgrade when stuck: "This needs more reasoning power. Switching to [premium model]."
For Sub-Agents
-
Default to Tier 1 unless task is clearly moderate+
-
Batch similar tasks to amortize overhead
-
Report failures back to parent for escalation
For Automated Tasks
-
Heartbeats/monitoring → Always Tier 1
-
Scheduled reports → Tier 1 or 2 based on complexity
-
Alert responses → Start Tier 2, escalate if needed
Communication Patterns
When suggesting model changes, use clear language:
Downgrade suggestion:
"This looks like routine file work. Want me to spawn a sub-agent on DeepSeek for this? Same result, fraction of the cost."
Upgrade request:
"I'm hitting the limits of what I can figure out here. This needs Opus-level reasoning. Switching up."
Explaining hierarchy:
"I'm running the heavy analysis on Sonnet while sub-agents fetch the data on DeepSeek. Keeps costs down without sacrificing quality where it matters."
Cost Impact
Assuming 100K tokens/day average usage:
Strategy Monthly Cost Notes
Pure Opus ~$225 Maximum capability, maximum spend
Pure Sonnet ~$45 Good default for most work
Pure DeepSeek ~$8 Cheap but limited on hard problems
Hierarchy (80/15/5) ~$19 Best of all worlds
The 80/15/5 split:
-
80% routine tasks on Tier 1 (~$6)
-
15% moderate tasks on Tier 2 (~$7)
-
5% complex tasks on Tier 3 (~$6)
Result: 10x cost reduction vs pure premium, with equivalent quality on complex tasks.
Integration Examples
OpenClaw
config.yml - set default model
model: anthropic/claude-sonnet-4
In session, switch models
/model opus # upgrade for complex task /model deepseek # downgrade for routine
Spawn sub-agent on cheap model
sessions_spawn: task: "Fetch and parse these 50 URLs" model: deepseek
OpenRouter (Tier 1 with vision or text-only):
Tier 1 with vision — Kimi K2.5 (multimodal)
model: openrouter/moonshotai/kimi-k2.5
Heartbeats, cron, image-involving tasks: K2.5 handles text and vision.
Tier 1 text-only — GLM 5 (no vision)
model: openrouter/z-ai/glm-5 # exact ID TBD on OpenRouter Z.AI
Routine text-only only; for image tasks use Kimi K2.5 or another vision-capable model.
Claude Code
In CLAUDE.md or project instructions
When spawning background agents, use claude-3-haiku for:
- File operations
- Simple searches
- Status checks
Reserve claude-sonnet-4 for:
- Code generation
- Analysis tasks
General Agent Systems
def get_model_for_task(task_description: str) -> str: routine_signals = ['read', 'fetch', 'check', 'list', 'format', 'status'] complex_signals = ['debug', 'architect', 'design', 'security', 'why']
desc_lower = task_description.lower()
if any(signal in desc_lower for signal in complex_signals):
return "claude-opus-4"
elif any(signal in desc_lower for signal in routine_signals):
return "deepseek-v3"
else:
return "claude-sonnet-4"
Anti-Patterns
DON'T:
-
Run heartbeats on Opus
-
Use premium models for file I/O
-
Keep expensive model when task is clearly routine
-
Spawn sub-agents on premium models by default
-
Use GLM 5 (or any text-only Tier 1/2 model) for image/vision tasks — e.g. photo analysis, screenshot understanding, image-generation skills, or any tool that takes image input
DO:
-
Start mid-tier, adjust based on task
-
Spawn helpers on cheapest viable model
-
Escalate explicitly when stuck
-
Track cost per task type to optimize further
Extending This Skill
To customize for your use case:
-
Adjust tier definitions based on your provider/budget
-
Add domain-specific signals to classification rules
-
Track actual complexity vs predicted to improve heuristics
-
Set budget alerts to catch runaway premium usage