Skill Improver
Process user feedback from skill retrospectives and update skill files to improve them over time.
When to Use
-
User asks to "review skill feedback" or "improve skills based on usage"
-
You notice feedback files in .claude/feedback/
-
User mentions a skill didn't work well or missed something
-
Periodic review (monthly) to incorporate learnings
How It Works
Step 1: Gather Feedback
Read all feedback files in .claude/feedback/ :
ls -la .claude/feedback/retro-*.md
Look for patterns:
-
Multiple users reporting same missing step → add to skill
-
Benchmarks don't match user's context → add context-specific ranges
-
Workflow confusing → restructure or add clarifications
-
Skill incomplete → add missing sections
Step 2: Identify High-Impact Changes
Prioritize updates based on:
High Priority (do first):
-
Missing critical steps that users had to figure out themselves
-
Incorrect benchmarks or numbers
-
Confusing workflow that requires clarification
-
Safety issues or errors
Medium Priority:
-
Additional examples or templates
-
Better explanations of existing steps
-
Alternative approaches for different contexts
Low Priority:
-
Nice-to-have additions
-
Stylistic improvements
-
Minor clarifications
Step 3: Update Skill Files
For each skill needing updates:
3a. Add a "Learnings" Section
If the skill doesn't have one, add at the end:
Learnings from Use
[Date]: [Brief description of what was learned]
- Feedback: [What users reported]
- Update: [What we changed]
- Result: [Expected improvement]
3b. Update Main Content
If feedback suggests core changes:
-
Add missing steps to checklists
-
Update benchmarks with ranges (e.g., "20-30% for B2C, 50-70% for B2B")
-
Restructure workflow if confusing
-
Add "Common Pitfalls" section if users make same mistakes
3c. Version the Change
At the top of the skill, track versions:
name: skill-name version: 1.2.0 last_updated: 2026-01-22 changelog:
- v1.2.0 (2026-01-22): Added missing step for X based on user feedback
- v1.1.0 (2026-01-15): Updated benchmarks for Y context
- v1.0.0 (2026-01-01): Initial release
Step 4: Archive Processed Feedback
Move processed feedback to archive:
mkdir -p .claude/feedback/archive mv .claude/feedback/retro-2026-01-22-*.md .claude/feedback/archive/
Keep a summary of learnings in .claude/feedback/SUMMARY.md :
Feedback Summary
[Skill Name]
Total feedback sessions: 12 Last updated: 2026-01-22
Key learnings:
- Added step for X (reported by 3 users)
- Updated benchmarks for B2B context (reported by 5 users)
- Clarified workflow around Y (reported by 2 users)
Patterns:
- Users in enterprise context need higher benchmarks
- Early-stage startups need more examples
- Non-technical users need clearer explanations of jargon
Example Workflow
Scenario: product-market-fit skill needs improvement
Step 1: Review Feedback
Read .claude/feedback/retro-2026-01-22-143022.md :
Feedback
Missed important steps? yes
Improvements needed: The Sean Ellis test threshold of 40% seems high for B2B enterprise products. We're at 32% "very disappointed" but our retention is 85% D30 which is excellent. Should the skill mention that thresholds vary by product type?
Step 2: Identify Pattern
Check other feedback files → 3 more users report B2B context needs different benchmarks.
Step 3: Update Skill
Edit .claude/skills/product-market-fit/SKILL.md :
Before:
Sean Ellis Test (40% Rule)
"How would you feel if you could no longer use [product]?"
- ≥40% "Very disappointed" = Strong PMF
After:
Sean Ellis Test (Context-Dependent Thresholds)
"How would you feel if you could no longer use [product]?"
Thresholds by product type:
- Consumer B2C: ≥40% "Very disappointed" = Strong PMF
- SMB B2B: ≥35% "Very disappointed" = Strong PMF
- Enterprise B2B: ≥30% "Very disappointed" = Strong PMF (longer sales cycles, different buying psychology)
Why the difference?
- Enterprise buyers are more rational than emotional
- Switching costs are higher (contracts, integrations)
- Retention is a better PMF signal for B2B (see Step 2)
Add to Learnings section:
Learnings from Use
2026-01-22: Refined Sean Ellis thresholds by product type
- Feedback: 4 users reported 40% threshold too high for B2B enterprise
- Update: Added context-specific thresholds (B2C 40%, SMB 35%, Enterprise 30%)
- Result: More accurate PMF diagnosis for different product types
Step 4: Archive & Track
mv .claude/feedback/retro-2026-01-22-*.md .claude/feedback/archive/
Update .claude/feedback/SUMMARY.md :
product-market-fit
Total feedback sessions: 4 Last updated: 2026-01-22
Key learnings:
- Added context-specific Sean Ellis thresholds (reported by 4 users)
- B2B needs different benchmarks than B2C
Next improvements to consider:
- Add industry-specific retention benchmarks
- Include examples from different verticals
Quality Checklist
Before updating any skill, ensure:
-
Feedback is from multiple users (pattern, not outlier)
-
Change makes skill more accurate, not just more complex
-
Benchmarks are sourced or validated (not anecdotal)
-
Update is backward compatible (doesn't break existing workflows)
-
Learnings section documents why we made the change
-
Version number incremented appropriately (semver)
-
Processed feedback archived, not deleted
Feedback Categories
Track feedback by type to identify systemic issues:
Category 1: Missing Steps
Example: "Skill forgot to mention we need to segment cohorts by acquisition channel" Action: Add step to checklist
Category 2: Incorrect Benchmarks
Example: "40% D30 retention is not 'strong' for our B2B SaaS, it's average" Action: Update benchmarks with context (B2C vs B2B vs Enterprise)
Category 3: Confusing Workflow
Example: "I didn't know whether to do cohort analysis before or after Sean Ellis test" Action: Number steps clearly, add workflow diagram
Category 4: Missing Context
Example: "Skill assumes I have 1000+ users, what if I only have 50?" Action: Add "Early Stage Adaptation" section
Category 5: Tool-Specific Issues
Example: "How do I calculate D30 retention in Google Analytics?" Action: Add "Implementation in Common Tools" section
Best Practices
Do:
✅ Look for patterns across multiple feedback sessions ✅ Update skills incrementally (small, tested changes) ✅ Document why changes were made (Learnings section) ✅ Preserve feedback history (archive, don't delete) ✅ Version skills so users know what changed
Don't:
❌ Update based on single piece of feedback (might be outlier) ❌ Make skills overly complex trying to cover every edge case ❌ Remove content without understanding why it was there ❌ Ignore feedback for more than 30 days (patterns emerge) ❌ Update without testing the new version
Automation Ideas
Weekly Digest (optional)
Create a script to summarize new feedback:
#!/bin/bash
.claude/hooks/learning/weekly-feedback-digest.sh
echo "📊 Feedback Digest (Last 7 Days)" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
find .claude/feedback -name "retro-*.md" -mtime -7 | while read -r file; do echo "" echo "File: $(basename $file)" grep "Skills Used" -A 5 "$file" grep "Improvements needed:" -A 3 "$file" done
Auto-Tag for Review
When feedback mentions specific issues, auto-tag:
-
"missing step" → tag for immediate review
-
"wrong number" → tag for fact-check
-
"confusing" → tag for clarity rewrite
Success Metrics
Track improvement over time:
-
Feedback frequency: Decreasing = skills getting better
-
Repeated issues: Should approach zero over time
-
User satisfaction: Track "Did this skill help?" responses
-
Skill usage: Updated skills should see increased usage
Meta: This Skill Improves Itself
This skill should follow its own advice:
Learnings from Use:
[To be filled as this skill gets used and improved]
Version History:
- v1.0.0 (2026-01-22): Initial release - framework for skill improvement
Next Steps:
-
Review feedback in .claude/feedback/
-
Identify patterns and prioritize updates
-
Update skill files with improvements
-
Document learnings
-
Archive processed feedback
-
Commit changes with clear message
The more you use this system, the better your skills become. It's a continuous improvement loop.