skills-coach

Explore capability boundaries of a target Skill, analyze optimization potential, generate an optimized version using Training-Free GRPO, and compile results into a structured report

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "skills-coach" with this command: npx skills add t1ans1r/skills-coach

Skills-Coach v2.3.1

Systematically analyze and optimize OpenClaw skills through automated task generation, Training-Free GRPO optimization, real command execution, comprehensive failure analysis, and detailed evaluation reporting.

What's New in v2.3.1

  • 📝 Documentation Consistency — Unified version numbers across all files
  • 🗂️ File Organization — Cleaned up archive directory and removed duplicates
  • 🔧 Maintenance Release — Bug fixes and documentation improvements

Previous updates (v2.3.0):

  • 🔧 Auto-Fix Integration — Automatically fixes common issues
  • 🔄 Iterative Improvement — Fix → Test → Reanalyze loop (max 2 iterations)
  • 🤖 LLM-Powered Fixes — Uses Claude API to intelligently add missing parameters
  • Optimized Performance — Disabled LLM summaries to prevent API timeouts
  • 🔧 Better Stability — Improved API timeout handling and retry mechanisms

Previous updates (v2.0.0):

  • 🚀 Training-Free GRPO — Revolutionary optimization method based on arXiv:2510.08191
  • 🧠 Experience Library — Learns from optimization attempts
  • 📊 Group Relative Semantic Advantage — Compares rollouts to extract insights
  • 💰 Cost-Effective — Minimal training data, no fine-tuning required

Training-Free GRPO vs Vanilla GRPO

FeatureTraining-Free GRPO (v2.0)Vanilla GRPO (v1.x)
Parameter Updates❌ None✅ Gradient-based
Advantage TypeSemantic (natural language)Numerical (scores)
Knowledge StorageExternal experience libraryModel weights
GeneralizationExcellent (frozen model)Limited (overfitting risk)
Data RequirementsMinimal (dozens of samples)Large (thousands)
CostVery low (~$20)High ($10,000+)
SpeedFast (inference only)Slow (training required)

Configuration Options

Key settings in config.yaml:

# Optimization Method Selection (NEW v2.0.0)
optimization:
  method: "training_free_grpo"  # training_free_grpo | vanilla_grpo

# Training-Free GRPO Parameters
training_free_grpo:
  group_size: 5                  # Number of rollouts per group
  num_epochs: 3                  # Number of optimization epochs
  temperature_learning: 0.7      # Temperature during learning
  temperature_eval: 0.3          # Temperature during evaluation
  
  # Experience Library Management
  max_experiences: 10            # Max experiences per domain
  
  # Domain-Specific Optimization
  markdown_optimization:
    enabled: true
    focus_areas: [clarity, structure, examples, completeness]
  
  code_optimization:
    enabled: true
    focus_areas: [bug_fixes, error_handling, performance, code_quality]
  
  # LLM Configuration
  llm_model: "claude-sonnet-4-6"

Usage

python orchestrator.py <target-skill-path>

Or via Claude:

Use skills-coach on <target-skill-path>

Parameters

  • target-skill-path (required): Path to the directory containing the Skill to analyze and optimize. Must contain a valid SKILL.md.

Execution Flow

This skill orchestrates 6 steps that execute sequentially:

immutability → code-capability → sample-agent → optimize-agent → exec-agent → failure-analyzer → evaluate-agent

CRITICAL IMMUTABILITY RULE:

  • The original {target-skill} is NEVER modified
  • All changes are made to {target-skill}-optimized
  • This ensures the original skill remains intact for comparison

Do not proceed to the next step until the current one has fully completed and its outputs are verified.

Step-by-Step Instructions

Pre-flight Checks

  1. Validate that target-skill-path exists and contains a SKILL.md file
  2. If validation fails, abort and report the error to the user
  3. Initialize run manager (if versioned runs enabled):
    from subskills.run-manager.run_manager import RunManager
    manager = RunManager()
    run_dir = manager.create_run(target_skill_path, config)
    
  4. Create the working directory structure:
    # If versioned runs enabled:
    skills-coach-runs/run_YYYY-MM-DD_HH-MM-SS/
      ├── tasks/{train,test}
      ├── exec_results/{original,optimized}
      ├── optimization/
      ├── code_capabilities.json
      ├── failure_analysis_{original,optimized}.json
      └── {target-skill}-optimized/
    
  5. IMMUTABILITY: Create optimized copy
    cp -r {target-skill} {work-dir}/{target-skill}-optimized
    
    All subsequent modifications will ONLY affect the optimized copy.

Step 0: Code Capability Detection (NEW v1.5.0)

Analyze scripts to detect their actual capabilities:

cd subskills/code-capability-detector
python code_capability_detector.py <target-skill-path> <work-dir>

This analyzes:

  • Command-line parameters supported by scripts
  • Input/output formats
  • Dependencies
  • Error handling and validation presence

Expected outputs:

  • code_capabilities.json - Machine-readable capability data
  • code_capabilities.md - Human-readable report

Purpose: Ensures generated test tasks only use features the scripts actually support.

Verification: Confirm capability files exist before proceeding.

Step 1: Generate Test Tasks (sample-agent)

Execute the task generator:

cd subskills/sample-agent
python task_generator.py <target-skill-path> ../..

The script generates:

  • 12 base training tasks (6 standard + 6 advanced)
  • 8 base test tasks (4 standard + 4 advanced)
  • If boundary probing is enabled and generates boundary tasks:
    • Training: 6 standard + 4 advanced + 6 boundary = 16 total
    • Test: 4 standard + 3 advanced + 3 boundary = 10 total

Expected outputs:

  • tasks/train/task_001/ through tasks/train/task_012/ (or task_016 with boundary tasks)
  • tasks/test/task_001/ through tasks/test/task_008/ (or task_010 with boundary tasks)
  • Each task directory contains: task.md, speccheck.md, and workspace/

Verification: Confirm all task directories exist before proceeding.

Step 2: Optimize the Skill (optimize-agent)

IMPORTANT: This step works on {target-skill}-optimized, NOT the original.

Execute the GRPO optimizer:

cd subskills/optimize-agent
python grpo_optimizer.py <work-dir>/{target-skill}-optimized ../..

The script runs GRPO optimization with:

  • 4 candidate variants per iteration
  • 3-10 iterations with early stopping
  • SKILL.md and optional code-level optimization
  • All changes applied to the optimized copy only

Expected outputs:

  • {target-skill-name}-optimized/ directory containing the optimized SKILL.md
  • optimization_log.md documenting the GRPO optimization process

Verification: Confirm the optimized skill directory and log file exist before proceeding.

Step 3: Execute Both Skill Versions (exec-agent + Claude)

Part A: Generate Task Manifest

Execute the executor to generate task manifest:

cd subskills/exec-agent
python executor.py <target-skill-path> ../..

Expected outputs:

  • task_manifest.json containing all tasks to execute

Part B: Execute Tasks via Skill Tool

Claude reads the manifest and executes each task using the Skill tool:

import json
manifest = json.load(open('task_manifest.json'))

for task in manifest['tasks']:
    # Execute original skill
    Use skill at manifest['target_skill_path'] with task['task_content']
    Save output to task['original_result_dir']/output/
    
    # Execute optimized skill
    Use skill at manifest['optimized_skill_path'] with task['task_content']
    Save output to task['optimized_result_dir']/output/

Expected outputs:

  • exec_results/original/task_001/ through exec_results/original/task_010/
  • exec_results/optimized/task_001/ through exec_results/optimized/task_010/
  • Each result directory contains: output/ with real skill execution results and run_log.md

Verification: Confirm all result directories exist with real outputs before proceeding.

Step 4: Failure Analysis (NEW v1.5.0)

Analyze failed tasks to identify root causes and suggest fixes:

cd subskills/failure-analyzer
python failure_analyzer.py <work-dir>/exec_results/original <work-dir>
python failure_analyzer.py <work-dir>/exec_results/optimized <work-dir>

This analyzes:

  • Error messages and categorizes them (missing_parameter, missing_dependency, etc.)
  • Root causes of failures
  • Specific fix suggestions with code examples
  • Affected files and estimated fix difficulty

Expected outputs:

  • failure_analysis_original.json - Machine-readable failure data
  • failure_analysis_original.md - Human-readable report
  • failure_analysis_optimized.json - Optimized version failures
  • failure_analysis_optimized.md - Optimized version report

Verification: Confirm failure analysis files exist before proceeding.

Step 5: Evaluate and Report (evaluate-agent)

Execute the evaluator to analyze results:

cd subskills/evaluate-agent
python evaluator.py <target-skill-path> <work-dir>

This script:

  1. Analyzes execution results from both skill versions
  2. Generates the comprehensive report
  3. Makes retention decision based on performance comparison

Expected outputs:

  • results_report.md containing comprehensive evaluation metrics and analysis
  • Retention decision: either keep or delete {target-skill-name}-optimized/

Verification: Confirm results_report.md exists.

Final Step: Present Results to User

Read and present the contents of results_report.md to the user, highlighting:

  • Overall performance comparison (original vs. optimized)
  • Key strengths and weaknesses identified
  • Retention decision and rationale
  • Recommendations for further improvement

Output Structure

Versioned Runs (Default):

skills-coach-runs/
├── run_2026-04-13_14-30-00/
│   ├── config.yaml                    # Config used for this run
│   ├── metadata.json                  # Run metadata (duration, scores, decision)
│   ├── tasks/
│   │   ├── train/                     # 12-16 training tasks (depends on boundary probing)
│   │   └── test/                      # 8-10 test tasks (depends on boundary probing)
│   ├── optimization/
│   │   ├── iteration_001/
│   │   │   ├── variant_a/
│   │   │   ├── variant_b/
│   │   │   ├── variant_c/
│   │   │   └── variant_d/
│   │   └── iteration_002/
│   ├── exec_results/
│   │   ├── original/                  # 10 tasks
│   │   └── optimized/                 # 10 tasks
│   ├── optimization_log.md
│   ├── results_report.md
│   └── {target-skill}-optimized/      # If retained
│
├── run_2026-04-13_15-45-00/
│   └── ... (same structure)
│
└── latest -> run_2026-04-13_15-45-00/ # Symlink to latest run

Legacy Flat Structure (if versioned runs disabled):

./
├── tasks/
│   ├── train/          # 12-16 training tasks (depends on boundary probing)
│   └── test/           # 8-10 test tasks (depends on boundary probing)
├── exec_results/
│   ├── original/       # 8-10 tasks
│   └── optimized/      # 8-10 tasks
├── {target-skill}-optimized/  # If retained
├── optimization_log.md
└── results_report.md

Configuration

Features can be controlled via config.yaml:

# Task generation
task_generation:
  num_training_tasks: 16          # 12 for legacy mode
  num_test_tasks: 10              # 8 for legacy mode
  probe_boundaries: true          # Set to false for legacy 20-task mode
  boundary_types:
    - input_minimal
    - input_maximal
    - input_invalid
    - resource_limits
    - failure_modes
    - combinations

# GRPO optimization
grpo:
  optimization_levels:
    - skill_md                    # Always enabled
    - code                        # Remove to disable code optimization
    - config                      # Remove to disable config optimization
  code_mutations:
    - add_caching
    - add_validation
    - add_error_handling
    - optimize_algorithm

# Output structure
output:
  use_versioned_runs: true        # Set to false for legacy flat structure
  runs_directory: "skills-coach-runs"
  keep_latest_symlink: true
  max_runs_to_keep: 10            # Auto-cleanup old runs
  save_intermediate_variants: true
  save_execution_logs: true
  save_metadata: true

# Run comparison
comparison:
  enable_comparison_tool: true
  auto_compare_with_previous: true
  comparison_metrics:
    - baseline_score
    - final_score
    - improvement
    - duration
    - iterations

Run Management Commands

Use run-manager CLI for analysis:

# List all runs
python subskills/run-manager/run_manager.py list

# Compare two runs
python subskills/run-manager/run_manager.py compare run_2026-04-13_14-30-00 run_2026-04-13_15-45-00

# Cleanup old runs (keep latest 10)
python subskills/run-manager/run_manager.py cleanup 10

Error Handling

  • If any subskill fails, stop execution and report the error to the user
  • If sample-agent cannot parse the target SKILL.md, abort before task generation
  • If optimize-agent fails to improve scores after 10 iterations, proceed with the best variant found
  • If exec-agent encounters runtime errors, log them in run_log.md and continue with remaining tasks
  • If evaluate-agent determines the optimized skill performs worse, delete the optimized directory

Constraints

  • All subskills operate autonomously without user input between steps
  • The original target Skill is never modified in place
  • SpecCheck evaluation must be deterministic
  • No data leakage between train and test task sets
  • GRPO optimization runs 3-10 iterations, stopping early if no improvement for 2 consecutive iterations
  • v1.2.0: Generates 12-26 tasks depending on boundary probing:
    • Without boundary probing: 12 training + 8 test = 20 tasks
    • With boundary probing (if boundaries detected): 16 training + 10 test = 26 tasks
  • Can optimize code files in addition to SKILL.md (if enabled in config)
  • Creates versioned run directories (if enabled in config)

Notes

  • This is a meta-skill that operates on other skills
  • Execution may take significant time depending on the complexity of the target skill
  • The GRPO approach is training-free and does not require gradient computation
  • All intermediate outputs are preserved for transparency and debugging
  • Boundary probing tests capability limits with 6 types of edge cases
  • Code optimization can modify Python/shell scripts in addition to SKILL.md
  • Versioned runs preserve all optimization attempts for historical tracking
  • Run comparison tool enables analysis of optimization strategies over time

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

MigraQ

腾讯云迁移平台(CMG/MSP)全流程能力。触发词:资源扫描、扫描阿里云/AWS/华为云/GCP资源、生成云资源清单、选型推荐、对标腾讯云、推荐规格、帮我推荐、给我推荐、ECS对应什么腾讯云产品、成本分析、TCO、迁移报价、询价、价格计算器、cmg-scan、cmg-recommend、cmg-tco

Registry SourceRecently Updated
General

🫧 Flux 2 Klein — Pro Pack on RunComfy

Generate images with Flux 2 Klein (Black Forest Labs' distilled fast variant of Flux 2) on RunComfy — bundled with the model's documented prompting patterns...

Registry SourceRecently Updated
General

XHS Content Generator

根据热点榜单和主题自动生成吸引眼球的小红书爆款标题和内容框架,支持多种热门领域并优化emoji使用。

Registry SourceRecently Updated
General

Bosch Group

提供博世集团历史、业务板块、技术创新及治理结构的全面企业信息和产业发展分析。

Registry SourceRecently Updated