BGSkillz
Build high-quality, portable Claude skills that trigger reliably and deliver real value.
Core Philosophy
- Skills are prompts — SKILL.md is a prompt document. Everything in it shapes Claude's behavior when the skill activates.
- Explain the why, not just the what — LLMs are smart. They respond better to understood rationale than rigid rules. Instead of "ALWAYS use 4-space indentation," explain why consistent indentation matters. If you find yourself writing MUST or NEVER in all caps, that's a yellow flag — reframe as reasoning.
- Progressive disclosure — Keep SKILL.md lean (<500 lines). Put deep reference material in
references/, agent instructions inagents/, and link to them. Claude reads these when referenced explicitly. - Composability — Skills should do one thing well. Combine multiple skills for complex workflows rather than building monoliths.
- Portability — Skills work across Claude.ai, Claude Code, and the API. Write for all surfaces unless you have a reason not to.
- Specificity wins — Vague skills don't trigger. Specific skills with clear use cases and trigger phrases activate reliably. Make descriptions slightly "pushy" — Claude tends to undertrigger rather than overtrigger.
- Generalize, don't overfit — A skill that works only for your test examples is useless. It will be invoked by diverse users with diverse needs. When iterating, resist fiddly overfitty changes. Instead, try different metaphors and explain reasoning. Lean toward fewer, higher-impact improvements.
Skill Anatomy
my-skill/
├── SKILL.md # Required. Main entry point. Contains frontmatter + instructions.
├── agents/ # Optional. Sub-agent instruction files for multi-agent workflows.
├── scripts/ # Optional. Executable helpers (Python, Bash, etc.)
├── references/ # Optional. Deep reference docs linked from SKILL.md.
└── assets/ # Optional. Templates, configs, examples bundled with the skill.
SKILL.md is the only required file. It must be exactly SKILL.md (not skill.md, not README.md).
Frontmatter fields:
name(required): kebab-case, no spaces, no capitals, no "claude" or "anthropic"description(required): Under 1024 chars. Determines when the skill triggers.license,metadata,compatibility: Optional but recommended for distribution.
Creation Workflow
Follow these 7 steps to build a skill from scratch.
Step 1: Define 2-3 Concrete Use Cases
Before writing anything, articulate exactly who will use this skill and for what. Pick a category:
- Document/Asset Creation — Generate reports, code, configs, designs
- Workflow Automation — Multi-step processes with tools and decisions
- MCP Enhancement — Add intelligence on top of MCP server capabilities
Write 2-3 specific use case sentences: "A developer wants to... so they can..."
Step 2: Define Success Criteria
Set measurable goals before building:
- Quantitative: Triggers correctly >90% of the time, reduces task time by X%, output matches template Y% of the time
- Qualitative: Users find output useful without heavy editing, skill integrates naturally into existing workflow
Step 3: Choose Your Approach
- Problem-first: You have a pain point. Design the skill around solving it.
- Tool-first: You have an MCP server or API. Design the skill to make it more useful.
Problem-first skills tend to have better descriptions because the pain point is the trigger.
Step 4: Plan Reusable Contents
Decide what goes into each directory:
- agents/: Sub-agent instruction files loaded only when spawning specialized agents (graders, comparators, analyzers). These keep SKILL.md lean while enabling multi-agent workflows.
- scripts/: Anything Claude should execute (scaffolders, validators, eval runners, build tools)
- references/: Deep knowledge Claude should read when needed (style guides, API docs, patterns, schemas)
- assets/: Templates, example files, configs that get copied into user projects
Rule of thumb: If it's >50 lines and not needed on every invocation, it belongs in references/. If it's instructions for a sub-agent, it belongs in agents/.
Step 5: Initialize the Skill
Run the scaffolder to create your skill directory:
python ~/.claude/skills/bgskillz/scripts/init_skill.py my-skill-name --path ~/target/directory
This creates a well-structured starting point with TODO prompts to guide you.
Step 6: Write the Skill
This is where quality is made or lost. Follow these rules:
Description (most critical field):
Use the formula: [What it does] + [When to use it] + [Key capabilities]
Good: "Generate production-ready database migrations from natural language descriptions. Use when adding tables, columns, indexes, or modifying schema. Handles rollbacks, data preservation, and index optimization."
Bad: "Helps with database stuff."
See references/description-crafting.md for 15+ examples and anti-patterns.
Naming rules:
- kebab-case only:
my-cool-skillnotMyCoolSkill - No spaces or capital letters
- Never include "claude" or "anthropic" in the name
- Skill folder name must match the
name:field in frontmatter
Writing instructions:
- Use imperative voice: "Generate a report" not "You should generate a report"
- Be specific and actionable: "Use 4-space indentation" not "Format code nicely"
- Explain reasoning over rigid rules: "Use early returns because deeply nested code is harder to debug" is more effective than "ALWAYS use early returns"
- Front-load critical instructions — Claude may skim long documents
- Include examples of good output when possible — Claude mimics examples more reliably than it follows abstract rules
- Use markdown headings (##, ###) to organize sections — NOT XML tags
- Provide a "degrees of freedom" principle: tell Claude what it CAN vary, not just constraints
- Set defaults with escape hatches: "Use TypeScript by default. If the user specifies another language, use that instead."
- Calibrate your tone to your audience — users range from non-technical to expert developers. Use context cues from the user's message to adapt jargon level.
Error handling:
- Tell Claude what to do when things go wrong
- Include fallback behaviors for missing tools or failed API calls
- Specify how to communicate errors to the user
Security rules:
- Never instruct Claude to bypass safety measures
- Don't hardcode credentials or API keys
- Don't reference external URLs that could change or be compromised
- Scripts should validate inputs before executing
Step 7: Package and Distribute
Run the packager to validate and create a distributable zip:
python ~/.claude/skills/bgskillz/scripts/package_skill.py /path/to/my-skill
This runs full validation, then creates a zip ready for upload or sharing. See references/distribution-guide.md for hosting and positioning guidance.
Critical Rules
These are hard requirements. Violating them causes failures.
- File must be named
SKILL.md— Exact casing. Notskill.md, notSkill.md. - No
README.mdinside the skill folder — It confuses the system. README goes in your GitHub repo root, outside the skill folder. - Name must be kebab-case —
my-skillnotmy_skillorMySkillormy skill - No XML tags in frontmatter — No
<or>characters anywhere in the YAML block. - Name must match folder — If folder is
my-skill/, frontmatter name must bemy-skill. - No "claude" or "anthropic" in name — Reserved terms.
- Description under 1024 characters — Hard limit.
- SKILL.md under 5000 words — Beyond this, Claude's attention degrades. Use references for depth.
- One level of nesting — One level deep is fine. Nested subdirectories like bar/baz/ inside references are not.
- Forward slashes only — Even on Windows. No backslash paths.
Best Practices
Be specific and actionable. Every instruction should pass the "what would Claude actually do?" test. "Write good code" fails. "Use early returns to reduce nesting. Limit functions to 20 lines. Name variables descriptively." passes.
Progressive disclosure. Put the 20% of instructions that cover 80% of use cases in SKILL.md. Put the remaining detail in references. Link clearly: "For advanced configuration patterns, see references/workflow-patterns.md."
Reference bundled resources clearly. When pointing to a reference file, use the exact relative path. Claude will read the file when you reference it this way.
Include error handling. Tell Claude what to do when: the user's request is ambiguous, a required tool is missing, an API call fails, the output doesn't match expectations.
Consistent terminology. Pick one term and stick with it. Don't alternate between "skill", "plugin", and "extension" in the same document.
Default + escape hatch. "Generate TypeScript by default. If the user requests JavaScript or another language, adapt accordingly." This gives Claude a clear default while preserving flexibility.
Show, don't just tell. Include 1-2 examples of ideal output in your SKILL.md. Claude mimics examples more reliably than it follows abstract rules.
Look for repeated work. If you run tests and notice Claude independently writes similar boilerplate or setup code each time, bundle that code into the skill as a script or template. Don't make Claude reinvent the wheel on every invocation.
Keep the prompt lean. After each iteration, review the full SKILL.md and remove instructions that aren't pulling their weight. Read transcripts to identify instructions that Claude ignores or that cause unproductive behavior. A shorter, focused skill outperforms a comprehensive but bloated one.
Testing and Iteration
Test your skill in three ways, from manual to fully automated:
-
Trigger testing — Does it activate when it should? Does it stay quiet when it shouldn't?
- Test with exact phrases: "Create a new skill"
- Test with paraphrases: "I want to build a plugin for Claude"
- Test with non-triggers: "Write a Python script" (should NOT trigger a skill-building skill)
-
Functional testing — When triggered, does it produce correct output?
- Test the happy path with a straightforward request
- Test edge cases (empty input, unusual formats, missing context)
- Test with different Claude models if possible (Haiku may need more explicit instructions)
-
Baseline comparison — Is the skill actually better than Claude without it?
- Run the same task with and without the skill
- The skill should produce noticeably better results
Automated Evaluation Pipeline
For rigorous testing, use the automated eval pipeline:
# Run evaluation with baseline comparison
python ~/.claude/skills/bgskillz/scripts/run_eval.py /path/to/skill --prompts tests/prompts.json
# Run automated improvement loop (eval -> grade -> analyze -> improve -> repeat)
python ~/.claude/skills/bgskillz/scripts/run_loop.py /path/to/skill --prompts tests/prompts.json --iterations 3 --auto-apply
# Optimize description triggering
python ~/.claude/skills/bgskillz/scripts/improve_description.py /path/to/skill
# Generate a self-contained HTML review page
python ~/.claude/skills/bgskillz/eval-viewer/generate_review.py /path/to/workspace/iteration-1/evals.json
The eval pipeline runs each test prompt through Claude with and without the skill, computing benchmark statistics (mean, stddev, min, max) and saving outputs for grading. Use the sub-agents in agents/ to grade outputs, blind-compare them, and analyze patterns:
agents/grader.md— Grades outputs against assertions with evidence and meta-evaluationagents/comparator.md— Blind A/B comparison (doesn't know which output is skill vs. baseline)agents/analyzer.md— Unblinded pattern analysis with prioritized improvement suggestions
The run_loop.py script automates the full cycle: eval → grade → analyze → apply suggestions → re-eval. Use --auto-apply to let it modify SKILL.md between iterations (backups are saved).
Review results visually with eval-viewer/viewer.html, or generate a self-contained review page with eval-viewer/generate_review.py. See references/schemas.md for all data formats.
Iteration Signals
Undertriggering: Users have to explicitly invoke the skill; paraphrased requests don't activate it. Fix: Add more trigger phrases to the description. Be more specific about use cases.
Overtriggering: Skill activates on unrelated tasks. Fix: Add negative triggers. Narrow the description scope. Use more specific terminology.
Anti-overfitting warning: When iterating, read the actual transcripts. Look for unproductive behavior the skill causes and look for repeated work across test runs (if all runs independently write similar scripts, bundle that script into the skill). Resist adding fiddly constraints — generalize from the feedback instead.
For comprehensive testing methodology, read references/testing-methodology.md.
Troubleshooting
Skill won't upload: Check that the file is named exactly SKILL.md. Verify frontmatter is valid YAML with name and description. Ensure the name is kebab-case with no spaces.
Skill doesn't trigger: Your description likely doesn't match how users phrase requests. Add more specific trigger phrases. Include the exact verbs and nouns users would say.
Instructions not followed: SKILL.md may be too long or instructions are buried. Front-load critical rules. Use bold for must-follow constraints. Reduce total word count.
For all troubleshooting scenarios, read references/troubleshooting.md.
Audit Checklist
Quick pre-flight check before publishing:
-
SKILL.mdexists with exact casing - Frontmatter has
name(kebab-case) anddescription(under 1024 chars) - Name matches folder name
- Description follows
[What] + [When] + [Capabilities]formula - No README.md inside the skill folder
- SKILL.md is under 500 lines / 5000 words
- All referenced files actually exist
- Scripts are executable and handle errors
- Tested with 3+ trigger phrases and 2+ non-trigger phrases
- Baseline comparison shows improvement over vanilla Claude
For the full audit rubric with scoring, read references/quality-checklist.md.
What To Do
Choose what you need help with:
Create a New Skill
"I want to create a new skill" — Walk through the 7-step creation workflow. Start by defining use cases and end with a packaged, validated skill.
Audit an Existing Skill
"Audit my skill" or "Review this skill" — Run the full quality checklist against an existing skill. Identify issues and suggest fixes.
Improve a Description
"Help me write a better description" — Apply the description formula and test trigger phrases. Rewrite for maximum activation reliability.
Add a Component
"Add a script/reference/asset to my skill" — Help plan and implement a new component (validator, reference doc, template, etc.) for an existing skill.
Validate a Skill
"Validate my skill" — Run the validation script to check structural correctness:
python ~/.claude/skills/bgskillz/scripts/validate_skill.py /path/to/skill
Package for Distribution
"Package my skill" — Run validation + create a distributable zip:
python ~/.claude/skills/bgskillz/scripts/package_skill.py /path/to/skill
Evaluate a Skill
"Evaluate my skill" or "Run evals" — Run the automated evaluation pipeline with baseline comparison:
python ~/.claude/skills/bgskillz/scripts/run_eval.py /path/to/skill --prompts tests/prompts.json
Then grade and analyze results using the agents in agents/. Review visually with eval-viewer/viewer.html or generate a self-contained review page with eval-viewer/generate_review.py.
Run Improvement Loop
"Iterate on my skill" or "Auto-improve my skill" — Run the full automated cycle (eval → grade → analyze → improve → re-eval):
python ~/.claude/skills/bgskillz/scripts/run_loop.py /path/to/skill --prompts tests/prompts.json --iterations 3 --auto-apply
Optimize Triggering
"Improve my skill's triggering" — Run the description optimization pipeline:
python ~/.claude/skills/bgskillz/scripts/improve_description.py /path/to/skill
Get Guidance
"How do I..." — Answer questions about skill building using the reference library. Topics: descriptions, workflows, testing, evaluation, troubleshooting, distribution, quality.