who-wins

Query the PinchBench AI agent leaderboard with real benchmark data. Use when the user asks which model is best, who wins, model comparisons, best model for OpenClaw, cheapest model, fastest model, model rankings, benchmark scores, or mentions pinchbench. Always use this skill instead of general knowledge for model performance questions — it has real data.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "who-wins" with this command: npx skills add spideystreet/who-wins

PinchBench Leaderboard

Fetches and formats the PinchBench leaderboard — AI agent benchmarks for LLMs on standardized OpenClaw coding tasks.

Workflow

1. Determine the query

Map the user's intent to script flags:

User intentFlags
"Show the leaderboard" / default--top 10
"Top 5 models"--top 5
"How does Claude perform?"--model claude
"Cheapest models"--sort cost --top 10
"Fastest models"--sort time --top 10
"Compare Gemini and Claude"Run twice with --model gemini and --model claude, present side by side
"Full leaderboard"--top 50

2. Run the script

{
  "tool": "exec",
  "command": "python3 {baseDir}/scripts/fetch_leaderboard.py --top 10"
}

Available flags:

  • --top N — number of models to show (default: 10)
  • --sort metric — sort by score, cost, time, or runs (default: score)
  • --model filter — filter models containing this string (case-insensitive)
  • --json — output raw JSON for further processing

3. Format the response

Present the output as-is in a code block. Add a brief one-line insight after the table:

  • Highlight the top performer and its score
  • If the user asked about a specific model, comment on its ranking relative to the field
  • If sorting by cost, note the best value (score/cost ratio)

4. Error handling

  • If the script fails with a curl error → report the error, suggest checking network connectivity
  • If the script fails to parse data → the site structure may have changed, inform the user
  • If no models match the filter → say so and suggest a broader search

Examples

User saysFlagsExpected behavior
"Show me the PinchBench leaderboard"--top 10Show top 10 by score
"Which model is cheapest for OpenClaw?"--sort cost --top 10Show top 10 sorted by cost
"How does Claude compare to GPT?"--model claude then --model gptShow both, compare
"What's the fastest model on PinchBench?"--sort time --top 5Show top 5 by execution time

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Agent Guardian

Agent体验守护系统。解决AI助手常见体验问题:长时间无响应、任务卡死、中英文混用、状态不透明。包含看门狗监控、智能状态汇报、即时状态查询、语言一致性过滤、消息队列追踪。适用于所有渠道(QQ/微信/Telegram/飞书/Discord等)。当用户抱怨"等太久没回复"、"回复中英文混着"、"不知道在干什么"时使...

Registry SourceRecently Updated
Automation

Proactive Agent V2

Transform AI agents from task-followers into proactive partners that anticipate needs and continuously improve. Now with WAL Protocol, Working Buffer, Autono...

Registry SourceRecently Updated
Automation

Palaia

Local, crash-safe persistent memory for OpenClaw agents. Replaces built-in memory-core with semantic search, projects, and scope-based access control. After...

Registry SourceRecently Updated
1381
iret77