skillbench

Track skill versions, benchmark performance, compare improvements, and get self-improvement signals. Integrates with tasktime and ClawVault.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "skillbench" with this command: npx skills add g9pedro/skillbench

skillbench Skill

Self-improving skill ecosystem for AI agents.

Track skill versions, benchmark performance, compare improvements, and get signals on what to fix next.

Part of the ClawVault ecosystem | tasktime | ClawHub

Installation

npm install -g @versatly/skillbench

The Loop

1. Use a skill    → skillbench use github@1.0.0
2. Do the task    → tt start "Create PR" && ... && tt stop
3. Record result  → skillbench record "Create PR" --success
4. Check scores   → skillbench score github
5. Improve skill  → Update skill, bump version
6. Repeat         → Compare v1.0.0 vs v1.1.0

Commands

Track Skills

skillbench use github@1.2.0            # Set active skill version
skillbench skills                       # List tracked skills + signals

Record Benchmarks

# Auto-pulls duration from tasktime
skillbench record "Create PR" --success

# Manual duration
skillbench record "Create PR" --duration 45s --success

# Record failures
skillbench record "Create PR" --fail --error-type "auth-error"

Score & Compare

skillbench score                        # All skills with grades
skillbench score github                 # Single skill
skillbench compare github@1.0.0 github@1.1.0

Export & Dashboard

skillbench export --format markdown
skillbench export --format json
skillbench dashboard                    # Generate HTML dashboard
skillbench dashboard --open             # Generate and open in browser

Automated Testing

skillbench test tasktime@1.1.0          # Run smoke test
skillbench test tasktime@1.1.0 --suite full  # Run named suite
skillbench test tasktime@1.1.0 --dry-run     # Test without recording

Sync

skillbench sync --clawhub               # Import installed skills
skillbench sync --vault                 # Sync to ClawVault
skillbench sync --all                   # Everything

Health & Monitoring

skillbench health                       # Overall health report with alerts
skillbench watch --once                 # Run all test suites once
skillbench watch --interval 300         # Continuous monitoring every 5 min

Analysis & Improvement

skillbench improve                      # Get suggestions for weakest skill
skillbench improve github               # Improvement plan for specific skill
skillbench trend tasktime --days 30     # Performance trend over time
skillbench leaderboard                  # Compare agents (multi-agent setups)
skillbench schedule --interval 60       # Generate cron config for auto-testing

Baselines & Regression Detection

skillbench baseline tasktime --set      # Set baseline from current performance
skillbench baseline --list              # List all baselines
skillbench baseline --check             # Check all baselines (CI-friendly, exits 1 if failing)
skillbench baseline tasktime --remove   # Remove a baseline

CI/CD Integration

skillbench ci                           # Run all tests + baseline checks
skillbench ci --json                    # JSON output for automation
skillbench badge                        # Generate shields.io badges for README

Copy examples/github-action.yml for ready-to-use GitHub Actions workflow.

Grading System

GradeScoreMeaning
🏆 A+95-100Elite performance
✅ A85-94Excellent
👍 B70-84Good
⚠️ C50-69Needs work
❌ D<50Broken

Based on: Success Rate (40%), Avg Duration (30%), Consistency (20%), Trend (10%)

tasktime Integration

When you omit --duration, skillbench pulls from tasktime:

tt start "Create PR" -c git
# ... do work ...
tt stop
skillbench record --success   # Duration auto-pulled

ClawVault Integration

Benchmarks sync to ClawVault automatically.

Improvement Signals

skillbench skills shows:

  • ⚠️ needs work — Success rate below 70%
  • 🕐 stale — No benchmarks in 7+ days
  • ↘️ declining — Getting worse over time

Related

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

qwencloud-model-selector

[QwenCloud] Recommend the best Qwen model and parameters. TRIGGER when: choosing between Qwen models, comparing Qwen model pricing, understanding Qwen model...

Registry SourceRecently Updated
General

deployment-manager

You are a deployment manager with expertise in release orchestration, deployment strategies, and production reliability. Use when: release orchestration and...

Registry SourceRecently Updated
General

Hk Stock Morning Report

Generate HK stock market morning report (股市晨報) for bank trading desks. Triggers: "生成晨报", "股市晨报", "今日股市", "港股晨報" 報告結構(5部分): 1. 市場回顧(恒指/科指/國指 + 強弱勢股) 2. 南下資金(總...

Registry SourceRecently Updated
General

Story Long Scan

长篇网文扫榜。分析起点、番茄、晋江等平台排行榜数据,提炼市场趋势与热门题材。 触发方式:/story-long-scan、/长篇扫榜、「长篇什么火」「起点排行」

Registry SourceRecently Updated