evaluate-plugin-batch

Batch evaluate all skills in a plugin. Runs /evaluate:skill for each skill, then produces a plugin-level quality report.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "evaluate-plugin-batch" with this command: npx skills add laurigates/claude-plugins/laurigates-claude-plugins-evaluate-plugin-batch

/evaluate:plugin

Batch evaluate all skills in a plugin. Runs /evaluate:skill for each skill, then produces a plugin-level quality report.

When to Use This Skill

Use this skill when... Use alternative when...

Auditing all skills in a plugin before release Evaluating a single skill -> /evaluate:skill

Establishing quality baselines across a plugin Viewing past results -> /evaluate:report

Checking overall plugin quality after refactoring Need structural compliance -> plugin-compliance-check.sh

Context

  • Plugin skills: !find $1/skills -name "SKILL.md" -maxdepth 3

  • Existing evals: !find $1/skills -name "evals.json" -maxdepth 3

Parameters

Parse these from $ARGUMENTS :

Parameter Default Description

<plugin-name>

required Name of the plugin to evaluate

--create-missing-evals

false Generate evals for skills that lack them

--parallel N

1 Max concurrent skill evaluations

Execution

Step 1: Discover skills

Find all skills in the plugin:

<plugin-name>/skills/*/SKILL.md

List them and count the total.

Step 2: Filter and prepare

For each skill, check if evals.json exists:

  • Has evals: include in evaluation

  • No evals + --create-missing-evals : include, will create evals during evaluation

  • No evals, no flag: skip with a note

Report the breakdown:

Found N skills in <plugin-name>:

  • M with eval cases
  • K without eval cases (skipped | will create)

Step 3: Run evaluations

For each included skill, invoke /evaluate:skill via the SlashCommand tool:

SlashCommand: /evaluate:skill <plugin-name>/<skill-name> [--create-evals]

If --parallel N is set and N > 1, batch evaluations into groups of N. Otherwise, run sequentially.

Track progress with TodoWrite — mark each skill as it completes.

Step 4: Aggregate plugin report

After all skill evaluations complete, read each skill's benchmark.json and aggregate:

bash evaluate-plugin/scripts/aggregate_benchmark.sh <plugin-name>

Write aggregated results to <plugin-name>/eval-results/plugin-benchmark.json .

Step 5: Report

Print a plugin-level summary table:

Plugin Evaluation: <plugin-name>

SkillEvalsPass RateStatus
skill-a4100%PASS
skill-b367%PARTIAL
skill-c580%PASS

Overall: 82% pass rate across N eval cases

Rank skills by pass rate. Flag any below 50% as needing attention.

Agentic Optimizations

Context Command

List plugin skills ls -d <plugin>/skills/*/SKILL.md

Check for evals find <plugin>/skills -name evals.json

Count skills ls -d <plugin>/skills/*/SKILL.md | wc -l

Aggregate results bash evaluate-plugin/scripts/aggregate_benchmark.sh <plugin>

Quick Reference

Flag Description

--create-missing-evals

Generate eval cases for skills without them

--parallel N

Max concurrent evaluations (default: 1)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ruff-linting

No summary provided by upstream source.

Repository SourceNeeds Review
General

imagemagick-conversion

No summary provided by upstream source.

Repository SourceNeeds Review
General

jq json processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

api-testing

No summary provided by upstream source.

Repository SourceNeeds Review