Skill Evolver
Analyze skill execution traces to discover issues, identify improvement opportunities, and apply fixes to skill files.
Trace Format
Traces are JSON with this structure:
{ "id": "uuid", "request": "user's original request", "skills_used": ["skill-name"], "success": true/false, "total_turns": 2, "total_input_tokens": 5000, "total_output_tokens": 200, "duration_ms": 7000, "steps": [ {"role": "assistant", "content": "...", "tool_name": null}, {"role": "tool", "tool_name": "...", "tool_input": {}, "tool_result": "..."} ], "llm_calls": [ {"turn": 1, "stop_reason": "tool_use", "input_tokens": 2500, "output_tokens": 50} ] }
Workflow
This skill can receive two types of input (at least one required):
-
Traces: Execution trace data from real skill runs — provides data-driven problem discovery
-
Feedback: User-written improvement suggestions — provides directed guidance for changes
When both are provided, combine insights: use traces to validate/discover issues and feedback to prioritize and guide fixes.
Step 1: Analyze Inputs
If traces are provided, run the analysis script:
scripts/analyze_traces.py <traces.json> [--skill <name>] [--format json|text]
Output includes:
-
Success rate
-
Average turns, duration, tokens
-
Common issues and warnings
-
Recommendations
If feedback is provided, identify the user's improvement goals and map them to actionable changes.
If both are provided, cross-reference: does the feedback align with trace-discovered issues? Use feedback to prioritize which trace-identified problems to fix first.
Step 2: Extract Issue Details
For failed or problematic traces, extract full context:
scripts/extract_issue_context.py <traces.json> --failed scripts/extract_issue_context.py <traces.json> --trace-id <id> --show-llm scripts/extract_issue_context.py <traces.json> --high-turns
Skip this step if only feedback was provided (no traces).
Step 3: Identify Root Causes
Map issues to skill components using references/issue-patterns.md:
Issue Type Likely Fix Location
execution_failure scripts/, error handling
high_turn_count SKILL.md clarity, add examples
tool_errors scripts/, input validation
high_token_usage SKILL.md verbosity, progressive disclosure
repeated_tool_calls SKILL.md decision trees
For feedback-only input, map the user's suggestions directly to the appropriate skill components.
Step 4: Apply Fixes
Read the target skill and apply changes based on analysis:
-
For script errors: Fix scripts, add validation, improve error messages
-
For efficiency issues: Add examples, decision trees, clearer instructions
-
For token issues: Reduce SKILL.md, move content to references/
-
For trigger issues: Update frontmatter description
-
For feedback-guided changes: Apply the user's specific suggestions
Scope constraints — strictly follow:
-
Only modify the target skill's existing files (SKILL.md, scripts/, references/)
-
Do NOT create new reference files, templates, or guides
-
Do NOT search the web for domain-specific content
-
Do NOT generate CHANGELOG, improvement reports, or other extra deliverables
-
The evolved skill files themselves are the sole deliverable
Quick Reference
Issue Severity Levels
-
high: Failures, max_tokens, tool errors → Fix immediately
-
medium: High turns, high tokens, retries → Optimize
-
low: Long duration → Consider optimization
Key Metrics Thresholds
Metric Warning Action
success_rate <90% Review failures
avg_turns
4 Simplify workflow
avg_tokens
30000 Reduce context
duration_ms
60000 Optimize scripts
Common Fixes
Low success rate:
-
Add error handling in scripts
-
Add input validation
-
Clarify ambiguous instructions
High turn count:
-
Add decision tree
-
Provide more examples
-
Use scripts for multi-step operations
High token usage:
-
Reduce SKILL.md lines (<500)
-
Move details to references/
-
Remove redundant examples