Improve Skill — The Meta-Skill

Evaluate an existing Factory Droid skill against the current project codebase and produce concrete, prioritized improvement recommendations.

When to Use

After first invocation of a skill — "that worked, but it missed X"
When a project evolves and a skill hasn't kept up
When onboarding a skill from another project into a new codebase
When the user says "make this skill better" or "this skill doesn't handle Y"
Auto-invocable: the droid can suggest improvements after observing skill friction

Inputs

When invoked, determine which skill to evaluate:

Explicit: /improve-skill ghost-catalog — user names the skill
Contextual: If the user just invoked a skill and says "improve it" — use the last invoked skill
Discovery: If no skill is named, list all installed skills and ask which one to evaluate

Evaluation Framework

Run these 6 analyses in order. For each, produce findings and a verdict (STRONG / ADEQUATE / WEAK / MISSING).

1. Skill Definition Audit

Read the skill's SKILL.md and all supporting files.

Check for:

Clarity: Are instructions unambiguous? Could a droid follow them without guessing?
Completeness: Are all commands documented with inputs, outputs, and edge cases?
Structured inputs: Does the skill define what it expects (paths, flags, formats)?
Success criteria: Does the skill define what "done" looks like?
Error handling: Does the skill say what to do when things fail?
Verification step: Does the skill require confirmation that it worked?

Report gaps as specific line-level observations, not vague suggestions.

2. Codebase Fit Analysis

Scan the current project to understand its shape:

File types: What extensions exist? Does the skill handle all of them?
Directory structure: How deep? How many top-level dirs? Does the skill's path logic match?
Naming conventions: camelCase, kebab-case, PascalCase? Does the skill respect them?
Framework patterns: Next.js App Router, Flask blueprints, monorepo? Does the skill understand the framework?
Scale: How many files? Does the skill handle the volume efficiently (batch vs one-by-one)?

Identify blind spots — file types, directories, or patterns the skill doesn't account for but the project uses.

3. Edge Case Discovery

Look for things that would break or confuse the skill:

Unusual files: Files with no extension, dotfiles, generated files, symlinks
Encoding issues: Non-UTF-8 files, BOM markers, mixed line endings
Directive conflicts: 'use client', shebangs (#!/usr/bin/env), pragma comments (@ts-nocheck)
Framework magic: Auto-generated files (.next/, __generated__/), barrel exports (index.ts)
Binary in source dirs: Images in public/, fonts in assets/, SQLite in data/
Monorepo patterns: Multiple package.json, nested .gitignore, workspace configs

For each edge case found in the current project, note whether the skill handles it.

4. Output Quality Assessment

If the skill has been invoked before and produced artifacts (files, DB entries, reports), evaluate them:

Accuracy: Are generated descriptions meaningful or generic ("Module for foo")?
Consistency: Same patterns get same treatment across the codebase?
Completeness: Did it miss any files it should have caught?
Formatting: Headers well-formed? DB entries normalized? Reports readable?

If no prior invocation exists, simulate a dry run by analyzing what the skill WOULD produce for 5-10 representative files.

5. Composability Check

How well does this skill play with others?

Data output: Does it produce machine-readable output other skills can consume?
Idempotency: Safe to run twice? Does it detect prior runs?
State management: Where does it store state? Is that documented?
Cross-skill hooks: Could other skills trigger this one? Could this one feed into others?
Token efficiency: Does the skill load too much context? Could supporting files be trimmed?

6. User Experience Review

From the user's perspective:

Invocation friction: Is /skill-name enough or does the user need to remember flags?
Feedback quality: Does the skill report what it did clearly?
Progressive disclosure: Can a user start simple and go deeper?
Recovery: If the skill makes a mistake, how hard is it to undo?
Documentation: Could a new user understand the skill from SKILL.md alone?

Output Format

After running all 6 analyses, produce a structured improvement report:

SKILL IMPROVEMENT REPORT
========================
Skill:    [name]
Project:  [current project]
Date:     [today]
Agent:    [agent_id]

SCORECARD
---------
1. Definition Clarity:   [STRONG/ADEQUATE/WEAK/MISSING]
2. Codebase Fit:         [STRONG/ADEQUATE/WEAK/MISSING]
3. Edge Case Coverage:   [STRONG/ADEQUATE/WEAK/MISSING]
4. Output Quality:       [STRONG/ADEQUATE/WEAK/MISSING]
5. Composability:        [STRONG/ADEQUATE/WEAK/MISSING]
6. User Experience:      [STRONG/ADEQUATE/WEAK/MISSING]

Overall: [X/6 STRONG]

TOP IMPROVEMENTS (prioritized by impact)
-----------------------------------------
1. [HIGH] Title
   Problem: ...
   Fix: ...
   Effort: [trivial / small / medium / large]

2. [HIGH] Title
   Problem: ...
   Fix: ...
   Effort: ...

3. [MEDIUM] Title
   ...

EDGE CASES FOUND IN THIS PROJECT
----------------------------------
- [handled/unhandled] Description

SUGGESTED SKILL.MD PATCHES
----------------------------
[Specific text additions/changes to the skill definition]

After the Report

Ask the user which improvements to implement. Then:

Edit the skill's SKILL.md with the approved changes
Add or update supporting files if needed
Re-run the skill on a sample of files to verify the improvement
Update the improvement report with results

Auto-Invocation Guidance

This skill should be considered when:

A user says "that didn't work right" or "it missed something" after a skill runs
A skill produces generic or inaccurate output
The user asks "how can we make this better"
The droid observes repeated manual corrections after a skill invocation
A skill is being used in a project type it wasn't originally designed for

Principles

Be specific, not vague: "Add .vue to the CMP category in line 47 of SKILL.md" not "consider more file types"
Prioritize by impact: What change helps the most files / saves the most tokens / prevents the most errors?
Show evidence: Every finding should reference a specific file, line, or pattern in the project
Respect the user's design: Improve the skill's execution of its stated purpose, don't redesign it
Measure before and after: If possible, show what changes quantitatively (files caught, accuracy %, etc.)

improve-skill

Safety Notice

Copy this and send it to your AI assistant to learn