evaluate-report

View evaluation results and benchmark reports for a skill or plugin.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "evaluate-report" with this command: npx skills add laurigates/claude-plugins/laurigates-claude-plugins-evaluate-report

/evaluate:report

View evaluation results and benchmark reports for a skill or plugin.

When to Use This Skill

Use this skill when... Use alternative when...

Want to see results from a past evaluation Need to run new evaluations -> /evaluate:skill

Comparing benchmark runs over time Want improvement suggestions -> /evaluate:improve

Reviewing quality trends for a plugin Need structural compliance -> plugin-compliance-check.sh

Parameters

Parse these from $ARGUMENTS :

Parameter Default Description

<target>

required plugin/skill-name for skill or plugin-name for plugin

--latest

true Show most recent benchmark only

--history

false Show all benchmark iterations

--compare

false Compare current vs previous benchmark

Execution

Step 1: Resolve target

Determine if $ARGUMENTS refers to a skill or plugin:

  • Contains / : treat as plugin-name/skill-name , look for <plugin>/skills/<skill>/eval-results/benchmark.json

  • No / : treat as plugin-name , look for <plugin>/eval-results/plugin-benchmark.json

If the target file does not exist, report that no evaluation results are available and suggest running /evaluate:skill or /evaluate:plugin .

Step 2: Load results

Skill-level report (--latest or default): Read benchmark.json and display:

Evaluation Report: <plugin/skill-name>

Last run: <timestamp> Runs per eval: N

MetricWith SkillBaselineDelta
Pass Rate85%42%+43%
Duration14s12s+2s

Per-Eval Results

Eval IDDescriptionPass RateFailures
eval-001Basic usage100%
eval-002Edge case67%assertion X

Skill-level history (--history ): Read history.json and display improvement iterations:

Evaluation History: <plugin/skill-name>

VersionDatePass RateChanges
v3 (current)2026-03-0485%Improved step 3 instructions
v22026-03-0172%Added error handling
v12026-02-2855%Initial evaluation

Skill-level comparison (--compare ): Read current and previous benchmarks, show delta:

Comparison: <plugin/skill-name>

MetricPreviousCurrentDelta
Pass Rate72%85%+13%
Duration16s14s-2s

Improved evals: eval-002, eval-004 Regressed evals: none

Plugin-level report: Read plugin-benchmark.json and display:

Plugin Report: <plugin-name>

SkillEvalsPass RateStatus
skill-a4100%PASS
skill-b367%NEEDS WORK

Overall: 82% pass rate Skills evaluated: N / M total

Agentic Optimizations

Context Command

Find skill benchmark cat <plugin>/skills/<skill>/eval-results/benchmark.json | jq .summary

Find plugin benchmark cat <plugin>/eval-results/plugin-benchmark.json | jq .summary

Check for history cat <plugin>/skills/<skill>/eval-results/history.json | jq '.iterations | length'

List available results find <plugin> -name benchmark.json -o -name plugin-benchmark.json

Quick Reference

Flag Description

--latest

Most recent benchmark (default)

--history

Show all iterations

--compare

Compare current vs previous

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ruff linting

No summary provided by upstream source.

Repository SourceNeeds Review
General

imagemagick-conversion

No summary provided by upstream source.

Repository SourceNeeds Review
General

jq json processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

api-testing

No summary provided by upstream source.

Repository SourceNeeds Review