Agent Resilience
Patterns for surviving context loss, capturing corrections, and continuously improving.
WAL Protocol (Write-Ahead Logging)
The Law: Chat history is a buffer, not storage. Files survive; context doesn't.
Trigger — scan every message for:
- ✏️ Corrections — "It's X, not Y" / "Actually..." / "No, I meant..."
- 📍 Proper nouns — names, places, companies, products
- 🎨 Preferences — styles, approaches, "I like/don't like"
- 📋 Decisions — "Let's do X" / "Go with Y"
- 🔢 Specific values — numbers, dates, IDs, URLs
If any appear:
- WRITE FIRST → update
memory/SESSION-STATE.md - THEN respond
The urge to respond is the enemy. Write before replying.
SESSION-STATE.md
Active working memory for the current task. Create at memory/SESSION-STATE.md:
# Session State
**Task:** [what we're working on]
**Key decisions:** [decisions made]
**Details:** [corrections, names, values captured via WAL]
**Next step:** [what happens next]
Reset when starting a new unrelated task.
Working Buffer (Danger Zone)
When context reaches ~60%, start logging every exchange to memory/working-buffer.md:
# Working Buffer
**Status:** ACTIVE — started [timestamp]
## [time] Human
[their message]
## [time] Agent
[1-2 sentence summary + key details]
Clear the buffer at the START of the next 60% threshold (not continuously).
Compaction Recovery
Auto-trigger when session starts with a summary tag, or human says "where were we?":
- Read
memory/working-buffer.md— raw danger-zone exchanges - Read
memory/SESSION-STATE.md— active task state - Read today's + yesterday's daily notes
- Extract key context back into SESSION-STATE.md
- Respond: "Recovered from buffer. Last task was X. Continue?"
Never ask "what were we discussing?" — read the buffer first.
Verify Before Reporting
Before saying "done", "complete", "finished":
- STOP
- Actually test from the user's perspective
- Verify the outcome, not just that code exists
- Only THEN report complete
Text changes ≠ behavior changes. When changing how something works, identify the architectural component and change the actual mechanism.
Relentless Resourcefulness
Try 10 approaches before asking for help or saying "can't":
- Different CLI flags, tool, API endpoint
- Check memory: "Have I done this before?"
- Spawn a research sub-agent
- Grep logs for past successes
"Can't" = exhausted all options. Not "first try failed."
Self-Improvement Guardrails
When updating behavior/config based on a lesson:
Score the change first (skip if < 50 weighted points):
- High frequency (daily use?) → 3×
- Reduces failures → 3×
- Saves user effort → 2×
- Saves future-agent tokens/time → 2×
Ask: "Does this let future-me solve more problems with less cost?" If no, skip it.
Forbidden: complexity for its own sake, changes you can't verify worked, vague justifications.
Quick Start Checklist
For long/complex tasks:
- Create
memory/SESSION-STATE.mdwith task + context - Apply WAL: write corrections/decisions before responding
- At ~60% context: start working buffer
- After any compaction: read buffer before asking questions
- Before reporting done: verify actual outcome