evaluate-improve

Analyze evaluation results and suggest concrete improvements to a skill.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "evaluate-improve" with this command: npx skills add laurigates/claude-plugins/laurigates-claude-plugins-evaluate-improve

/evaluate:improve

Analyze evaluation results and suggest concrete improvements to a skill.

When to Use This Skill

Use this skill when... Use alternative when...

Have eval results and want to improve the skill Need to run evals first -> /evaluate:skill

Want to improve skill description for better triggering Want to view raw results -> /evaluate:report

Iterating on a skill to increase pass rate Want to file a bug -> /feedback:session

Optimizing skill instructions after benchmarking Need structural fixes -> plugin-compliance-check.sh

Parameters

Parse these from $ARGUMENTS :

Parameter Default Description

<plugin/skill-name>

required Path as plugin-name/skill-name

--apply

false Apply approved changes to SKILL.md

--description-only

false Focus on description improvements only

Execution

Step 1: Load eval results

Read the most recent benchmark from:

<plugin-name>/skills/<skill-name>/eval-results/benchmark.json

If no results exist, suggest running /evaluate:skill first and stop.

Also read the current SKILL.md to understand the skill.

Step 2: Analyze results

Delegate analysis to the eval-analyzer agent via Task:

Task subagent_type: eval-analyzer Prompt: Analyze these evaluation results and identify improvement opportunities. Skill: <path to SKILL.md> Benchmark: <benchmark.json contents> Mode: comparison (if baseline data exists) or benchmark (otherwise)

The analyzer produces categorized suggestions:

  • instructions: Execution flow improvements

  • description: Better intent-matching text

  • examples: Missing or insufficient examples

  • error_handling: Missing edge cases

  • tools: Better tool configurations

  • structure: Organizational improvements

Step 3: Filter suggestions

If --description-only , filter to only description category suggestions.

Sort remaining suggestions by priority (high > medium > low).

Step 4: Present suggestions

Present the categorized suggestions to the user:

Improvement Suggestions: <plugin/skill-name>

Current pass rate: 72%

High Priority

  1. [instructions] Add explicit error handling for missing git config Evidence: eval-003 fails because the skill doesn't check for git user.name

  2. [description] Add "conventional commit" as trigger phrase Evidence: Skill not selected when user says "make a conventional commit"

Medium Priority

  1. [examples] Add breaking change example to execution steps Evidence: eval-004 inconsistently handles breaking changes

Low Priority

  1. [structure] Move flag reference to Quick Reference table Evidence: Flags scattered across multiple sections

If --apply is NOT set, stop here.

Step 5: Apply changes (if --apply)

Use AskUserQuestion to let the user select which suggestions to apply:

Which improvements should I apply? [x] Add error handling for missing git config [x] Add trigger phrases to description [ ] Add breaking change example [ ] Restructure flag reference

For each approved suggestion:

  • Read the current SKILL.md

  • Apply the change using Edit

  • Update the modified date in frontmatter

After applying changes, update (or create) the history file at:

<plugin-name>/skills/<skill-name>/eval-results/history.json

Add a new iteration entry recording:

  • Version number (increment from previous)

  • Timestamp

  • Pass rate from current benchmark

  • Summary of changes made

Step 6: Suggest re-evaluation

After applying changes, suggest:

Changes applied. Run /evaluate:skill &#x3C;plugin/skill-name> to measure improvement.

Agentic Optimizations

Context Command

Read benchmark cat <plugin>/skills/<skill>/eval-results/benchmark.json | jq .summary

Read skill cat <plugin>/skills/<skill>/SKILL.md

Read history cat <plugin>/skills/<skill>/eval-results/history.json | jq '.iterations[-1]'

Check pass rate cat <plugin>/skills/<skill>/eval-results/benchmark.json | jq '.summary.with_skill.mean_pass_rate'

Quick Reference

Flag Description

--apply

Apply approved changes to SKILL.md

--description-only

Focus on description improvements only

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ruff linting

No summary provided by upstream source.

Repository SourceNeeds Review
General

imagemagick-conversion

No summary provided by upstream source.

Repository SourceNeeds Review
General

jq json processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

api-testing

No summary provided by upstream source.

Repository SourceNeeds Review