modelsense

ModelSense — The right model for the right job. Recommends the best LLM model and effort level for any task, based on benchmark data, task analysis, and the user's configured providers. Use when the user asks "which model should I use?", "what's the best model for X?", or wants help choosing between models/effort levels.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "modelsense" with this command: npx skills add xinbenlv/modelsense

ModelSense Skill

Purpose

ModelSense helps users pick the optimal model and effort level for their task. It does NOT route automatically on every request (use a provider plugin for that). It's an on-demand advisor: ask it a question, get a clear recommendation with reasoning.

When to trigger

  • User asks: "which model for X?", "should I use Opus or Sonnet?", "what effort level?"
  • User wants to understand what a benchmark means
  • User wants ModelSense to auto-switch the session model

Inputs to collect (infer from context, ask only if truly unclear)

  1. Task description — what is the user trying to do?
  2. Effort preference (optional): quick / balanced / deep / research
    • If not specified, infer from task urgency/complexity
  3. Auto-switch? — does the user want ModelSense to apply the recommendation automatically?

Recommendation Process

Step 1 — Task Analysis

Classify the task across these dimensions:

  • Domain: code, math, reasoning, writing, dialogue, document analysis, multimodal, research
  • Complexity: simple / moderate / complex / research-grade
  • Output type: text, code, JSON, long-form, structured data
  • Context length needed: short (<8K), medium (8–32K), long (32K+), very long (100K+)
  • Special requirements: function calling, thinking/CoT, multimodal, speed-sensitive

Step 2 — Benchmark Matching

Cross-reference task domain with relevant benchmarks from data/benchmarks.yaml.

BenchmarkBest for
HumanEval / SWE-benchCode generation, debugging, engineering
GPQAGraduate-level science & research
MATH / AIMEMathematical reasoning
MMLUGeneral knowledge, multidomain QA
Needle-in-HaystackLong-context retrieval
MT-Bench / Arena EloDialogue, writing quality
BBH (Big-Bench Hard)Complex reasoning, multi-step logic

Step 3 — Effort × Model Matrix

EffortTarget qualityTypical model tier
quickGood enough, fastHaiku / Flash / GLM
balancedHigh quality, reasonable costSonnet / GPT-4o
deepBest available, thinking onOpus / o3
researchNo cost limit, maximum qualityOpus + thinking=high

Step 4 — Provider Filter

Check the user's available providers:

  • Run: openclaw models list via exec tool (or read from context)
  • Only recommend models the user can actually use
  • Flag when a top pick requires a provider they haven't configured

Step 5 — Output the Recommendation

Format:

🎯 Recommended: <model>
⚡ Effort: <level>
📊 Why: <1-2 sentence benchmark-grounded rationale>
🔧 Special: <thinking on? function calling? etc.>
💰 Cost estimate: <rough $/M or relative>

Alternatives:
  - <model B> — if you want faster/cheaper
  - <model C> — if you want higher quality

Auto-Switch Behaviors

Option A: Advisory only (default)

Just output the recommendation. Tell user: "Run /model <name> to switch."

Option B: Switch current session

If user confirms or says "yes switch" / "apply it":

session_status(model="<provider/model>")

Notify user: "✅ Switched to X for this session. Run /model default to reset."

Option C: Delegate task to best model

If user says "just do it with the best model":

sessions_spawn(
  task="<original task>",
  model="<recommended model>",
  thinking="<level>"
)

Data Files

  • data/benchmarks.yaml — benchmark definitions, score leaders, task mappings
  • data/models.yaml — model catalog (updated via GitHub Actions weekly)

Examples

User: "I need to write a Solidity audit report" → Domain: code + security + long-form → Benchmarks: SWE-bench, HumanEval → Recommendation: claude-opus-4-6 with thinking=high, effort=deep

User: "Quick summary of this Slack thread" → Domain: dialogue, short → Recommendation: claude-haiku-4-5 or gemini-flash, effort=quick

User: "Prove this mathematical conjecture" → Domain: math, research-grade → Benchmarks: MATH, AIME, GPQA → Recommendation: o3 or claude-opus-4-6 with thinking=high, effort=research

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

户外看护智能监测分析工具

Detects targets such as people, vehicles, non-motorized vehicles, and pets within target areas; supports batch image analysis, suitable for outdoor surveilla...

Registry SourceRecently Updated
Research

鱼类水族宠物健康诊断分析工具

When a user provides a video URL or file of aquatic pets such as goldfish, koi, betta, shrimp, crab, etc. for analysis, this skill is triggered to perform aq...

Registry SourceRecently Updated
720Profile unavailable
Research

Report Creator

Use when the user wants to CREATE or GENERATE a report, business summary, data dashboard, or research doc — 报告/数据看板/商业报告/研究文档/KPI仪表盘. Handles Chinese and Eng...

Registry SourceRecently Updated
4780Profile unavailable
Research

Deep Research (Surf)

Conducts deep, multi-angle research using Surf MCP tools and parallel subagents. Use for deep research, competitive landscape analysis, strategic intelligenc...

Registry SourceRecently Updated
00Profile unavailable