model-hierarchy

Route tasks to the cheapest model that can handle them. Most agent work is routine.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "model-hierarchy" with this command: npx skills add aaaaqwq/claude-code-skills/aaaaqwq-claude-code-skills-model-hierarchy

Model Hierarchy

Route tasks to the cheapest model that can handle them. Most agent work is routine.

Core Principle

80% of agent tasks are janitorial. File reads, status checks, formatting, simple Q&A. These don't need expensive models. Reserve premium models for problems that actually require deep reasoning.

Model Tiers

Tier 1: Cheap ($0.10-0.50/M tokens)

Model Input Output Best For

DeepSeek V3 $0.14 $0.28 General routine work

GPT-4o-mini $0.15 $0.60 Quick responses

Claude Haiku $0.25 $1.25 Fast tool use

Gemini Flash $0.075 $0.30 High volume

GLM 5 (Zhipu) (OpenRouter Z.AI) (OpenRouter Z.AI) Routine + moderate text; 200K context; text-only — do not use for image/vision

Kimi K2.5 (Moonshot) $0.45 $2.25 Routine + moderate; 262K context; multimodal (text + image + video)

Text-only models (e.g. GLM 5): Do not use for any task that requires image input or vision — no photo analysis, screenshots, image-generation tools, or document/chart vision. Route to a vision-capable model (e.g. Kimi K2.5, GPT-4o, Gemini, Claude with vision, GLM-4.5V/4.6V).

Vision-capable Tier 1/2 (e.g. Kimi K2.5): Use for routine or moderate tasks that may involve images — screenshots, photo analysis, docs, image-generation orchestration — without moving to premium vision models.

Tier 2: Mid ($1-5/M tokens)

Model Input Output Best For

Claude Sonnet $3.00 $15.00 Balanced performance

GPT-4o $2.50 $10.00 Multimodal tasks

Gemini Pro $1.25 $5.00 Long context

Tier 3: Premium ($10-75/M tokens)

Model Input Output Best For

Claude Opus $15.00 $75.00 Complex reasoning

GPT-4.5 $75.00 $150.00 Frontier tasks

o1 $15.00 $60.00 Multi-step reasoning

o3-mini $1.10 $4.40 Reasoning on budget

Prices as of Feb 2026. Check provider docs for current rates.

Task Classification

Before executing any task, classify it:

ROUTINE → Use Tier 1

Requires image/vision → Do not assign to text-only models (GLM 5, etc.). Use a vision-capable model from Tier 1/2 or 3 (e.g. Kimi K2.5, GPT-4o, Gemini, Claude, GLM-4.5V).

Characteristics:

  • Single-step operations

  • Clear, unambiguous instructions

  • No judgment required

  • Deterministic output expected

Examples:

  • File read/write operations

  • Status checks and health monitoring

  • Simple lookups (time, weather, definitions)

  • Formatting and restructuring text

  • List operations (filter, sort, transform)

  • API calls with known parameters

  • Heartbeat and cron tasks

  • URL fetching and basic parsing

MODERATE → Use Tier 2

Characteristics:

  • Multi-step but well-defined

  • Some synthesis required

  • Standard patterns apply

  • Quality matters but isn't critical

Examples:

  • Code generation (standard patterns)

  • Summarization and synthesis

  • Draft writing (emails, docs, messages)

  • Data analysis and transformation

  • Multi-file operations

  • Tool orchestration

  • Code review (non-security)

  • Search and research tasks

COMPLEX → Use Tier 3

Characteristics:

  • Novel problem solving required

  • Multiple valid approaches

  • Nuanced judgment calls

  • High stakes or irreversible

  • Previous attempts failed

Examples:

  • Multi-step debugging

  • Architecture and design decisions

  • Security-sensitive code review

  • Tasks where cheaper model already failed

  • Ambiguous requirements needing interpretation

  • Long-context reasoning (>50K tokens)

  • Creative work requiring originality

  • Adversarial or edge-case handling

Decision Algorithm

function selectModel(task): # Rule 1: Vision override (Tier 1/2 includes text-only models) if task.requiresImageInput or task.requiresVision: return VISION_CAPABLE_MODEL # e.g. Kimi K2.5, GPT-4o, Gemini, Claude; do not use GLM 5 or other text-only

# Rule 2: Escalation override
if task.previousAttemptFailed:
    return nextTierUp(task.previousModel)

# Rule 3: Explicit complexity signals
if task.hasSignal("debug", "architect", "design", "security"):
    return TIER_3

if task.hasSignal("write", "code", "summarize", "analyze"):
    return TIER_2

# Rule 4: Default classification
complexity = classifyTask(task)

if complexity == ROUTINE:
    return TIER_1
elif complexity == MODERATE:
    return TIER_2
else:
    return TIER_3

Behavioral Rules

For Main Session

  • Default to Tier 2 for interactive work

  • Suggest downgrade when doing routine work: "This is routine - I can handle this on a cheaper model or spawn a sub-agent."

  • Request upgrade when stuck: "This needs more reasoning power. Switching to [premium model]."

For Sub-Agents

  • Default to Tier 1 unless task is clearly moderate+

  • Batch similar tasks to amortize overhead

  • Report failures back to parent for escalation

For Automated Tasks

  • Heartbeats/monitoring → Always Tier 1

  • Scheduled reports → Tier 1 or 2 based on complexity

  • Alert responses → Start Tier 2, escalate if needed

Communication Patterns

When suggesting model changes, use clear language:

Downgrade suggestion:

"This looks like routine file work. Want me to spawn a sub-agent on DeepSeek for this? Same result, fraction of the cost."

Upgrade request:

"I'm hitting the limits of what I can figure out here. This needs Opus-level reasoning. Switching up."

Explaining hierarchy:

"I'm running the heavy analysis on Sonnet while sub-agents fetch the data on DeepSeek. Keeps costs down without sacrificing quality where it matters."

Cost Impact

Assuming 100K tokens/day average usage:

Strategy Monthly Cost Notes

Pure Opus ~$225 Maximum capability, maximum spend

Pure Sonnet ~$45 Good default for most work

Pure DeepSeek ~$8 Cheap but limited on hard problems

Hierarchy (80/15/5) ~$19 Best of all worlds

The 80/15/5 split:

  • 80% routine tasks on Tier 1 (~$6)

  • 15% moderate tasks on Tier 2 (~$7)

  • 5% complex tasks on Tier 3 (~$6)

Result: 10x cost reduction vs pure premium, with equivalent quality on complex tasks.

Integration Examples

OpenClaw

config.yml - set default model

model: anthropic/claude-sonnet-4

In session, switch models

/model opus # upgrade for complex task /model deepseek # downgrade for routine

Spawn sub-agent on cheap model

sessions_spawn: task: "Fetch and parse these 50 URLs" model: deepseek

OpenRouter (Tier 1 with vision or text-only):

Tier 1 with vision — Kimi K2.5 (multimodal)

model: openrouter/moonshotai/kimi-k2.5

Heartbeats, cron, image-involving tasks: K2.5 handles text and vision.

Tier 1 text-only — GLM 5 (no vision)

model: openrouter/z-ai/glm-5 # exact ID TBD on OpenRouter Z.AI

Routine text-only only; for image tasks use Kimi K2.5 or another vision-capable model.

Claude Code

In CLAUDE.md or project instructions

When spawning background agents, use claude-3-haiku for:

  • File operations
  • Simple searches
  • Status checks

Reserve claude-sonnet-4 for:

  • Code generation
  • Analysis tasks

General Agent Systems

def get_model_for_task(task_description: str) -> str: routine_signals = ['read', 'fetch', 'check', 'list', 'format', 'status'] complex_signals = ['debug', 'architect', 'design', 'security', 'why']

desc_lower = task_description.lower()

if any(signal in desc_lower for signal in complex_signals):
    return "claude-opus-4"
elif any(signal in desc_lower for signal in routine_signals):
    return "deepseek-v3"
else:
    return "claude-sonnet-4"

Anti-Patterns

DON'T:

  • Run heartbeats on Opus

  • Use premium models for file I/O

  • Keep expensive model when task is clearly routine

  • Spawn sub-agents on premium models by default

  • Use GLM 5 (or any text-only Tier 1/2 model) for image/vision tasks — e.g. photo analysis, screenshot understanding, image-generation skills, or any tool that takes image input

DO:

  • Start mid-tier, adjust based on task

  • Spawn helpers on cheapest viable model

  • Escalate explicitly when stuck

  • Track cost per task type to optimize further

Extending This Skill

To customize for your use case:

  • Adjust tier definitions based on your provider/budget

  • Add domain-specific signals to classification rules

  • Track actual complexity vs predicted to improve heuristics

  • Set budget alerts to catch runaway premium usage

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

multi-search-engine

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

feishu-automation

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

web-scraping-automation

No summary provided by upstream source.

Repository SourceNeeds Review