/evaluate:improve
Analyze evaluation results and suggest concrete improvements to a skill.
When to Use This Skill
Use this skill when... Use alternative when...
Have eval results and want to improve the skill Need to run evals first -> /evaluate:skill
Want to improve skill description for better triggering Want to view raw results -> /evaluate:report
Iterating on a skill to increase pass rate Want to file a bug -> /feedback:session
Optimizing skill instructions after benchmarking Need structural fixes -> plugin-compliance-check.sh
Parameters
Parse these from $ARGUMENTS :
Parameter Default Description
<plugin/skill-name>
required Path as plugin-name/skill-name
--apply
false Apply approved changes to SKILL.md
--description-only
false Focus on description improvements only
Execution
Step 1: Load eval results
Read the most recent benchmark from:
<plugin-name>/skills/<skill-name>/eval-results/benchmark.json
If no results exist, suggest running /evaluate:skill first and stop.
Also read the current SKILL.md to understand the skill.
Step 2: Analyze results
Delegate analysis to the eval-analyzer agent via Task:
Task subagent_type: eval-analyzer Prompt: Analyze these evaluation results and identify improvement opportunities. Skill: <path to SKILL.md> Benchmark: <benchmark.json contents> Mode: comparison (if baseline data exists) or benchmark (otherwise)
The analyzer produces categorized suggestions:
-
instructions: Execution flow improvements
-
description: Better intent-matching text
-
examples: Missing or insufficient examples
-
error_handling: Missing edge cases
-
tools: Better tool configurations
-
structure: Organizational improvements
Step 3: Filter suggestions
If --description-only , filter to only description category suggestions.
Sort remaining suggestions by priority (high > medium > low).
Step 4: Present suggestions
Present the categorized suggestions to the user:
Improvement Suggestions: <plugin/skill-name>
Current pass rate: 72%
High Priority
-
[instructions] Add explicit error handling for missing git config Evidence: eval-003 fails because the skill doesn't check for git user.name
-
[description] Add "conventional commit" as trigger phrase Evidence: Skill not selected when user says "make a conventional commit"
Medium Priority
- [examples] Add breaking change example to execution steps Evidence: eval-004 inconsistently handles breaking changes
Low Priority
- [structure] Move flag reference to Quick Reference table Evidence: Flags scattered across multiple sections
If --apply is NOT set, stop here.
Step 5: Apply changes (if --apply)
Use AskUserQuestion to let the user select which suggestions to apply:
Which improvements should I apply? [x] Add error handling for missing git config [x] Add trigger phrases to description [ ] Add breaking change example [ ] Restructure flag reference
For each approved suggestion:
-
Read the current SKILL.md
-
Apply the change using Edit
-
Update the modified date in frontmatter
After applying changes, update (or create) the history file at:
<plugin-name>/skills/<skill-name>/eval-results/history.json
Add a new iteration entry recording:
-
Version number (increment from previous)
-
Timestamp
-
Pass rate from current benchmark
-
Summary of changes made
Step 6: Suggest re-evaluation
After applying changes, suggest:
Changes applied. Run /evaluate:skill <plugin/skill-name> to measure improvement.
Agentic Optimizations
Context Command
Read benchmark cat <plugin>/skills/<skill>/eval-results/benchmark.json | jq .summary
Read skill cat <plugin>/skills/<skill>/SKILL.md
Read history cat <plugin>/skills/<skill>/eval-results/history.json | jq '.iterations[-1]'
Check pass rate cat <plugin>/skills/<skill>/eval-results/benchmark.json | jq '.summary.with_skill.mean_pass_rate'
Quick Reference
Flag Description
--apply
Apply approved changes to SKILL.md
--description-only
Focus on description improvements only