skill-audit
Scans all skill locations (global, workspace, project) and produces a structured audit report.
What It Checks
Structural Quality (per skill)
- Description quality — Is the
descriptionfield trigger-oriented (tells the model when to use it) vs a vague summary? - Gotchas section — Does the SKILL.md include a Gotchas/Pitfalls/Common Issues section? (Highest-signal content per Anthropic)
- Progressive disclosure — Does the skill use subdirectories (scripts/, references/, assets/, examples/) or is it a flat SKILL.md?
- File structure — Are there scripts, templates, or reference files the agent can discover?
- YAML frontmatter — Does it have
name,description, and optionallycompatibility? - Category fit — Does it map cleanly to one of the 9 skill categories (Library/API, Verification, Data, Automation, Scaffolding, Code Quality, CI/CD, Runbooks, Infrastructure)?
Cross-Skill Issues
- Duplicates — Same skill name or overlapping functionality across global/workspace/project dirs
- Orphan files — Stale
.skillfiles, empty dirs, leftover copies - Category gaps — Which of the 9 categories have no skills at all?
- Stale skills — Skills that reference missing tools, dead paths, or deprecated APIs
How to Run
Tell the agent: "audit my skills" or "run skill-audit"
The agent will:
- Run
scripts/audit.shto scan all skill locations and collect metadata - Score each skill (0-10) based on the checks above
- Produce a summary report with:
- Per-skill scorecard
- Top issues to fix (sorted by impact)
- Category coverage map
- Duplicate/orphan findings
Output
Results are written to .sub-agent-results/skill-audit-report.md and summarized in chat.
Scoring
| Points | Criteria |
|---|---|
| +2 | Has YAML frontmatter with name + description |
| +2 | Description is trigger-oriented (contains "use when", "trigger", action verbs) |
| +2 | Has a Gotchas/Pitfalls/Common Issues section |
| +2 | Uses progressive disclosure (has subdirs with scripts/references/assets) |
| +1 | Has at least one script or executable file |
| +1 | SKILL.md is between 200-5000 chars (not too sparse, not bloated) |
Scores: 8-10 = Good, 5-7 = Needs work, 0-4 = Poor
References
- Anthropic: Lessons from Building Claude Code Skills — Thariq's 9 categories, gotchas sections, progressive disclosure
- Ole Lehmann: Auto-improve Skills — Autoresearch loop (future enhance mode)