skill-iter-tune

Skill Iter Tune

Iterative skill refinement through execute-evaluate-improve feedback loops. Each iteration runs the skill via Claude, evaluates output via Gemini, and applies improvements via Agent.

Architecture Overview

┌──────────────────────────────────────────────────────────────────────────┐ │ Skill Iter Tune Orchestrator (SKILL.md) │ │ → Parse input → Setup workspace → Iteration Loop → Final Report │ └────────────────────────────┬─────────────────────────────────────────────┘ │ ┌───────────────────┼───────────────────────────────────┐ ↓ ↓ ↓ ┌──────────┐ ┌─────────────────────────────┐ ┌──────────┐ │ Phase 1 │ │ Iteration Loop (2→3→4) │ │ Phase 5 │ │ Setup │ │ ┌─────┐ ┌─────┐ ┌─────┐ │ │ Report │ │ │─────→│ │ P2 │→ │ P3 │→ │ P4 │ │────→│ │ │ Backup + │ │ │Exec │ │Eval │ │Impr │ │ │ History │ │ Init │ │ └─────┘ └─────┘ └─────┘ │ │ Summary │ └──────────┘ │ ↑ │ │ └──────────┘ │ └───────────────┘ │ │ (if score < threshold │ │ AND iter < max) │ └─────────────────────────────┘

Chain Mode Extension

Chain Mode (execution_mode === "chain"):

Phase 2 runs per-skill in chain_order: Skill A → ccw cli → artifacts/skill-A/ ↓ (artifacts as input) Skill B → ccw cli → artifacts/skill-B/ ↓ (artifacts as input) Skill C → ccw cli → artifacts/skill-C/

Phase 3 evaluates entire chain output + per-skill scores Phase 4 improves weakest skill(s) in chain

Key Design Principles

Iteration Loop: Phases 2-3-4 repeat until quality threshold, max iterations, or convergence
Two-Tool Pipeline: Claude (write/execute) + Gemini (analyze/evaluate) = complementary perspectives
Pure Orchestrator: SKILL.md coordinates only — execution detail lives in phase files
Progressive Phase Loading: Phase docs read only when that phase executes
Skill Versioning: Each iteration snapshots skill state before execution
Convergence Detection: Stop early if score stalls (no improvement in 2 consecutive iterations)

Interactive Preference Collection

// ★ Auto mode detection const autoYes = /\b(-y|--yes)\b/.test($ARGUMENTS)

if (autoYes) { workflowPreferences = { autoYes: true, maxIterations: 5, qualityThreshold: 80, executionMode: 'single' } } else { const prefResponse = AskUserQuestion({ questions: [ { question: "选择迭代调优配置：", header: "Tune Config", multiSelect: false, options: [ { label: "Quick (3 iter, 70)", description: "快速迭代，适合小幅改进" }, { label: "Standard (5 iter, 80) (Recommended)", description: "平衡方案，适合多数场景" }, { label: "Thorough (8 iter, 90)", description: "深度优化，适合生产级 skill" } ] } ] })

const configMap = { "Quick": { maxIterations: 3, qualityThreshold: 70 }, "Standard": { maxIterations: 5, qualityThreshold: 80 }, "Thorough": { maxIterations: 8, qualityThreshold: 90 } } const selected = Object.keys(configMap).find(k => prefResponse["Tune Config"].startsWith(k) ) || "Standard" workflowPreferences = { autoYes: false, ...configMap[selected] }

// ★ Mode selection: chain vs single const modeResponse = AskUserQuestion({ questions: [{ question: "选择调优模式：", header: "Tune Mode", multiSelect: false, options: [ { label: "Single Skill (Recommended)", description: "独立调优每个 skill，适合单一 skill 优化" }, { label: "Skill Chain", description: "按链序执行，前一个 skill 的产出作为后一个的输入" } ] }] }); workflowPreferences.executionMode = modeResponse["Tune Mode"].startsWith("Skill Chain") ? "chain" : "single"; }

Input Processing

$ARGUMENTS → Parse: ├─ Skill path(s): first arg, comma-separated for multiple │ e.g., ".claude/skills/my-skill" or "my-skill" (auto-prefixed) │ Chain mode: order preserved as chain_order ├─ Test scenario: --scenario "description" or remaining text └─ Flags: --max-iterations=N, --threshold=N, -y/--yes

Execution Flow

⚠️ COMPACT DIRECTIVE: Context compression MUST check TodoWrite phase status. The phase currently marked in_progress is the active execution phase — preserve its FULL content. Only compress phases marked completed or pending .

Phase 1: Setup (one-time)

Read and execute: Ref: phases/01-setup.md

Parse skill paths, validate existence
Create workspace at .workflow/.scratchpad/skill-iter-tune-{ts}/
Backup original skill files
Initialize iteration-state.json

Output: workDir , targetSkills[] , testScenario , initialized state

Iteration Loop

// Orchestrator iteration loop while (true) { // Increment iteration state.current_iteration++; state.iterations.push({ round: state.current_iteration, status: 'pending', execution: null, evaluation: null, improvement: null });

// Update TodoWrite TaskUpdate(iterationTask, { subject: Iteration ${state.current_iteration}/${state.max_iterations}, status: 'in_progress', activeForm: Running iteration ${state.current_iteration} });

// === Phase 2: Execute === // Read: phases/02-execute.md // Single mode: one ccw cli call for all skills // Chain mode: sequential ccw cli per skill in chain_order, passing artifacts // Snapshot skill → construct prompt → ccw cli --tool claude --mode write // Collect artifacts

// === Phase 3: Evaluate === // Read: phases/03-evaluate.md // Construct eval prompt → ccw cli --tool gemini --mode analysis // Parse score → write iteration-N-eval.md → check termination

// Check termination if (shouldTerminate(state)) { break; // → Phase 5 }

// === Phase 4: Improve === // Read: phases/04-improve.md // Agent applies suggestions → write iteration-N-changes.md

// Update TodoWrite with score // Continue loop }

Phase 2: Execute Skill (per iteration)

Read and execute: Ref: phases/02-execute.md

Snapshot skill → iteration-{N}/skill-snapshot/
Build execution prompt from skill content + test scenario
Execute: ccw cli -p "..." --tool claude --mode write --cd "${iterDir}/artifacts"
Collect artifacts

Phase 3: Evaluate Quality (per iteration)

Read and execute: Ref: phases/03-evaluate.md

Build evaluation prompt with skill + artifacts + criteria + history
Execute: ccw cli -p "..." --tool gemini --mode analysis
Parse 5-dimension score (Clarity, Completeness, Correctness, Effectiveness, Efficiency)
Write iteration-{N}-eval.md
Check termination: score >= threshold | iter >= max | convergence | error limit

Phase 4: Apply Improvements (per iteration, skipped on termination)

Read and execute: Ref: phases/04-improve.md

Read evaluation suggestions
Launch general-purpose Agent to apply changes
Write iteration-{N}-changes.md
Update state

Phase 5: Final Report (one-time)

Read and execute: Ref: phases/05-report.md

Generate comprehensive report with score progression table
Write final-report.md
Display summary to user

Phase Reference Documents (read on-demand when phase executes):

Phase Document Purpose Compact

1 phases/01-setup.md Initialize workspace and state TodoWrite 驱动

2 phases/02-execute.md Execute skill via ccw cli Claude TodoWrite 驱动 + 🔄 sentinel

3 phases/03-evaluate.md Evaluate via ccw cli Gemini TodoWrite 驱动 + 🔄 sentinel

4 phases/04-improve.md Apply improvements via Agent TodoWrite 驱动 + 🔄 sentinel

5 phases/05-report.md Generate final report TodoWrite 驱动

Compact Rules:

TodoWrite in_progress → 保留完整内容，禁止压缩
TodoWrite completed → 可压缩为摘要
🔄 sentinel fallback → 若 compact 后仅存 sentinel 而无完整 Step 协议，立即 Read() 恢复

Core Rules

Start Immediately: First action is preference collection → Phase 1 setup
Progressive Loading: Read phase doc ONLY when that phase is about to execute
Snapshot Before Execute: Always snapshot skill state before each iteration
Background CLI: ccw cli runs in background, wait for hook callback before proceeding
Parse Every Output: Extract structured JSON from CLI outputs for state updates
DO NOT STOP: Continuous iteration until termination condition met
Single State Source: iteration-state.json is the only source of truth

Data Flow

User Input (skill paths + test scenario) ↓ (+ execution_mode + chain_order if chain mode) ↓ Phase 1: Setup ↓ workDir, targetSkills[], testScenario, iteration-state.json ↓ ┌─→ Phase 2: Execute (ccw cli claude) │ ↓ artifacts/ (skill execution output) │ ↓ │ Phase 3: Evaluate (ccw cli gemini) │ ↓ score, dimensions[], suggestions[], iteration-N-eval.md │ ↓ │ [Terminate?]─── YES ──→ Phase 5: Report → final-report.md │ ↓ NO │ ↓ │ Phase 4: Improve (Agent) │ ↓ modified skill files, iteration-N-changes.md │ ↓ └───┘ next iteration

TodoWrite Pattern

// Initial state TaskCreate({ subject: "Phase 1: Setup workspace", activeForm: "Setting up workspace" }) TaskCreate({ subject: "Iteration Loop", activeForm: "Running iterations" }) TaskCreate({ subject: "Phase 5: Final Report", activeForm: "Generating report" })

// Chain mode: create per-skill tracking tasks if (state.execution_mode === 'chain') { for (const skillName of state.chain_order) { TaskCreate({ subject: Chain: ${skillName}, activeForm: Tracking ${skillName}, description: Skill chain member position ${state.chain_order.indexOf(skillName) + 1} }) } }

// During iteration N // Single mode: one score per iteration (existing behavior) // Chain mode: per-skill status updates if (state.execution_mode === 'chain') { // After each skill executes in Phase 2: TaskUpdate(chainSkillTask, { subject: Chain: ${skillName} — Iter ${N} executed, activeForm: ${skillName} iteration ${N} }) // After Phase 3 evaluates: TaskUpdate(chainSkillTask, { subject: Chain: ${skillName} — Score ${chainScores[skillName]}/100, activeForm: ${skillName} scored }) } else { // Single mode (existing) TaskCreate({ subject: Iteration ${N}: Score ${score}/100, activeForm: Iteration ${N} complete, description: Strengths: ... | Weaknesses: ... | Suggestions: ${count} }) }

// Completed — collapse TaskUpdate(iterLoop, { subject: Iteration Loop (${totalIters} iters, final: ${finalScore}), status: 'completed' })

Termination Logic

Error Handling

Phase Error Recovery

2: Execute CLI timeout/crash Retry once with simplified prompt, then skip

3: Evaluate CLI fails Retry once, then use score 50 with warning

3: Evaluate JSON parse fails Extract score heuristically, save raw output

4: Improve Agent fails Rollback from iteration-{N}/skill-snapshot/

Any 3+ consecutive errors Terminate with error report

Error Budget: Each phase gets 1 retry. 3 consecutive failed iterations triggers termination.

Coordinator Checklist

Pre-Phase Actions

Read iteration-state.json for current state
Verify workspace directory exists
Check error count hasn't exceeded limit

Per-Iteration Actions

Increment current_iteration in state
Create iteration-{N} subdirectory
Update TodoWrite with iteration status
After Phase 3: check termination before Phase 4
After Phase 4: write state, proceed to next iteration

Post-Workflow Actions

Execute Phase 5 (Report)
Display final summary to user
Update all TodoWrite tasks to completed

skill-iter-tune

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

skill-generator

review-code

ccw-help

skill-tuning