Recovery Skill

When to Use

Context window exhausted mid-workflow
Session interrupted or lost
Need to resume from last completed step
Workflow state needs reconstruction

Step 1: Identify Last Completed Step

Check gate files for last successful validation:

Location: .claude/context/history/gates/{workflow_id}/
Find highest step number with validation_status: "pass"
This is the last successfully completed step

Review reasoning files for progress:

Location: .claude/context/history/reasoning/{workflow_id}/
Read reasoning files up to last completed step
Extract context and decisions made

Identify artifacts created:

Check artifact registry: .claude/context/artifacts/registry-{workflow_id}.json
List all artifacts created up to last step
Verify artifact files exist

Step 2: Load Plan Documents

Read plan document (stateless):

Load plan-{workflow_id}.json from artifact registry
Extract current workflow state
Identify completed vs pending tasks

Load relevant phase plan (if multi-phase):

Check if project is multi-phase (exceeds phase_size_max_lines threshold)
Load active phase plan: plan-{workflow_id}-phase-{n}.json
Understand phase boundaries and dependencies

Understand current state:

Map completed tasks to plan
Identify next steps
Check for dependencies

Step 3: Context Recovery

Load artifacts from last completed step:

Read artifact registry
Load all artifacts with validation_status: "pass"
Verify artifact integrity

Read reasoning files for context:

Load reasoning files from completed steps
Extract key decisions and context
Understand workflow progression

Reconstruct workflow state:

Combine plan, artifacts, and reasoning
Create recovery state document
Validate state consistency

Step 4: Resume Execution

Continue from next step:

Identify next step after last completed
Load step requirements from plan
Prepare inputs for next step

Planner updates plan status (stateless):

Update plan-{workflow_id}.json with current status
Mark completed steps
Update progress tracking

Orchestrator coordinates next agents:

Pass recovered artifacts to next step
Resume workflow execution
Monitor for additional interruptions

</execution_process>

Failure Classification

When a task fails, classify the failure type:

Failure Type Indicators Recovery Action

BROKEN_BUILD Build errors, syntax errors, module not found ROLLBACK + fix

VERIFICATION_FAILED Test failures, validation errors, assertion errors RETRY with fix (max 3 attempts)

CIRCULAR_FIX Same error 3+ times, similar approaches repeated SKIP or ESCALATE

CONTEXT_EXHAUSTED Token limit reached, maximum length exceeded Compress context, continue

UNKNOWN No pattern match RETRY once, then ESCALATE

Circular Fix Detection

Iron Law: If the same approach has been tried 3+ times without success, STOP.

When circular fix is detected:

Stop the current approach immediately
Document what was tried (approaches, errors, files)
Try fundamentally different approach (different library, different pattern, simpler implementation)
If still failing, ESCALATE to human intervention

Detection Algorithm:

Extract keywords from current approach (excluding stop words)
Compare with keywords from last 3 attempts
If Jaccard similarity > 30% for 2+ attempts, flag as circular

Example:

Attempt 1: "Using async await for fetch" Attempt 2: "Using async/await with try-catch" Attempt 3: "Trying async await pattern again" => CIRCULAR FIX DETECTED - Stop and try callback pattern instead

Attempt Count Thresholds

Failure Type Max Attempts Then Action

VERIFICATION_FAILED 3 SKIP + ESCALATE

UNKNOWN 2 ESCALATE

BROKEN_BUILD 1 ROLLBACK (if good commit exists)

CIRCULAR_FIX 0 Immediately SKIP

References

See references/ for detailed patterns:

failure-types.md
Failure classification details and indicators
recovery-actions.md
Recovery action decision tree and execution
merge-strategies.md
File merge strategies for multi-agent scenarios

<best_practices>

Recovery Validation Checklist

Last completed step identified correctly
Plan document loaded and validated
All artifacts from completed steps available
Reasoning files reviewed for context
Workflow state reconstructed accurately
No duplicate work will be performed
Next step inputs prepared
Recovery logged in reasoning file

</best_practices>

<error_handling>

Error Handling

Missing plan document: Request planner to recreate plan from requirements
Missing artifacts: Request artifact recreation from source agent
Corrupted artifacts: Request artifact recreation with validation
Incomplete reasoning: Use artifact registry and gate files to reconstruct state

</error_handling>

1. Check gate files for last completed step

ls .claude/context/history/gates/{workflow_id}/

2. Load plan document

cat .claude/context/artifacts/plan-{workflow_id}.json

3. Review reasoning files

cat .claude/context/history/reasoning/{workflow_id}/*.json

4. Resume from next step

</usage_example>

<usage_example> Natural language invocation:

"Resume the workflow from where we left off" "Recover the workflow state and continue" "What was the last completed step?"

</usage_example>

Planner Agent: .claude/agents/core/planner.md
Memory files: .claude/context/memory/

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

recovery

Safety Notice

Copy this and send it to your AI assistant to learn

1. Check gate files for last completed step

2. Load plan document

3. Review reasoning files

4. Resume from next step

Source Transparency

Related Skills

filesystem

slack-notifications

chrome-browser

text-to-sql