Agent Prompt Patterns
Battle-tested patterns for agents that ship, not agents that demo. If your agent works in a live-fire notebook but breaks in production, you have a demo, not an agent.
When to Use
- Designing a new agent's behavioral rules and operating manual
- An agent is hallucinating completions, skipping steps, or claiming work it didn't do
- Building multi-agent pipelines where output quality compounds (or collapses)
- Setting up human-in-the-loop approval tiers for different risk levels
- Enforcing reliability in automated workflows (cron jobs, scheduled tasks, pipelines)
- Writing AGENTS.md or operating manuals for production agent workspaces
- Debugging why an agent keeps violating rules you've already stated
- Evaluating whether an agent should exist at all (deletion test)
- Building harnesses that make autonomy safe and useful
When NOT to Use
- One-shot prompts with no agent persistence — these patterns assume continuity
- Pure chatbot / conversational UX with no action-taking capability
- Academic prompt engineering research — these are production patterns, not benchmarks
- Agents with no filesystem, no tool access, and no side effects — nothing to harness
- You're still in the "make it work at all" phase — get basic functionality first, then harden
1. Consumer-First Design
Principle: Every agent output must have a named consumer. If nobody uses the output, the agent shouldn't exist.
This is the most important pattern because it kills bloat before it starts. Agents proliferate. Each one feels useful when you build it. Six months later you have 14 agents and can't remember what half of them do.
The Deletion Test
Ask: If I delete this agent, which other agent's work breaks?
If the answer is "nothing" or "I'm not sure," the agent is a vanity project.
# Agent Registry (in AGENTS.md)
## daily-digest
- **Consumers:** Sam (morning briefing), weekly-report agent (aggregation)
- **Deletion impact:** Sam loses morning summary, weekly-report loses daily inputs
- **Verdict:** KEEP
## inbox-sorter
- **Consumers:** None identified
- **Deletion impact:** Unknown
- **Verdict:** CANDIDATE FOR REMOVAL — validate or kill within 7 days
How to Apply
Every agent entry in your operating manual should answer:
- Who consumes this output? (name the human or agent)
- What format do they need? (not what's convenient to produce)
- What breaks if this stops? (the deletion test)
- What's the feedback loop? (how does the consumer signal quality issues?)
If an agent produces beautiful summaries that nobody reads, it's burning tokens for nothing.
Anti-Pattern: The "Nice to Have" Agent
# BAD: No consumer, no deletion impact
## sentiment-tracker
Monitors social media sentiment about our brand.
Runs daily. Outputs to sentiment-log.md.
# GOOD: Named consumer, clear dependency
## sentiment-tracker
Monitors social media sentiment for weekly-report.
Consumer: weekly-report agent (pulls sentiment delta for executive summary)
Deletion impact: weekly-report loses sentiment section; Sam must manually check socials
Format: JSON with {platform, score_delta, top_mentions[3]}
2. Proof-of-Work Enforcement
Principle: Never claim done unless the action actually started. Every status update needs proof — PID, file path, URL, command output. No proof = didn't happen. Write first, speak second.
This pattern exists because LLMs are pathological completers. They want to say "Done!" because that's the satisfying end of a sequence. The problem is they'll say "Done!" before doing anything, or after attempting something that silently failed.
The Rule
STATUS UPDATE FORMAT:
- "Started X" → must include: PID, command, or file path
- "Completed X" → must include: output snippet, file path, or URL
- "Failed X" → must include: error message, what was tried
- "Skipped X" → must include: reason with evidence
Examples
# BAD: No proof
✅ Backed up database
✅ Sent daily digest email
✅ Rotated API keys
# GOOD: Every claim has evidence
✅ Backed up database → /backups/2026-03-15-db.sql.gz (43MB, sha256: a1b2c3...)
✅ Sent daily digest → Message-ID: <abc123@mail.example.com>, 3 recipients
✅ Rotated API keys → new key fingerprint: sk-...x4f2, old key revoked at 14:32 UTC
Implementation Pattern
# In a script gate or agent wrapper:
run_with_proof() {
local task="$1"
shift
local output
output=$("$@" 2>&1)
local exit_code=$?
if [ $exit_code -eq 0 ]; then
echo "DONE: $task | proof: $(echo "$output" | tail -3)"
else
echo "FAIL: $task | exit=$exit_code | error: $(echo "$output" | tail -5)"
fi
return $exit_code
}
# Usage:
run_with_proof "database backup" pg_dump -Fc mydb -f /backups/latest.dump
Agent Operating Manual Rule
## Proof-of-Work (AGENTS.md entry)
NEVER say "done" without evidence. For every completed action, include at least one of:
- File path of output produced
- PID of process started
- URL of resource created/modified
- Command output (truncated to last 5 lines)
- Screenshot or hash of artifact
If you cannot produce proof, say "ATTEMPTED but cannot verify" and explain why.
3. Cascading Validation
Principle: Dependent sequential steps — each task validates the previous output before starting its own work. Failures loop back with fix instructions, not silent continuations.
Cascading validation prevents the "garbage in, garbage out" problem in multi-step pipelines. Without it, step 3 happily processes the corrupt output of step 2, and you don't discover the problem until step 7.
The Pattern
Step 1: Produce output A
Step 2: Validate A meets spec → if invalid, return to Step 1 with fix instructions
Step 3: Use validated A to produce B
Step 4: Validate B meets spec → if invalid, return to Step 3 with fix instructions
...
Example: Content Pipeline
## Newsletter Pipeline (cascading validation)
### Step 1: Research
- Output: research-notes.md
- Validation: must contain ≥ 3 sources, each with URL and date
- Failure: "Research incomplete — need 3+ sourced items. Currently have {n}. Add more."
### Step 2: Draft
- Input: validated research-notes.md
- Pre-check: verify research-notes.md passes Step 1 validation (don't trust upstream)
- Output: draft.md
- Validation: 400-800 words, includes all research items, no placeholder text
- Failure: "Draft {issue}. Fix and resubmit. Do not proceed to editing."
### Step 3: Edit
- Input: validated draft.md
- Pre-check: verify draft.md passes Step 2 validation
- Output: final.md
- Validation: grammar check passes, links resolve, formatting correct
- Failure: "Edit issues found: {list}. Return to editing. Do not publish."
### Step 4: Publish
- Input: validated final.md
- Pre-check: verify final.md passes Step 3 validation
- Gate: HUMAN APPROVAL REQUIRED before publish
Key Rule: Never Trust Upstream
Even if Step 1 "passed," Step 2 should re-validate Step 1's output before proceeding. This catches:
- Race conditions (output modified between steps)
- Silent corruption (file written but content wrong)
- Upstream validation bugs (Step 1's validator had a gap)
Implementation
def cascading_step(input_path, input_validator, processor, output_validator, max_retries=3):
"""Each step validates its input AND its output."""
# Validate input (don't trust upstream)
input_valid, input_errors = input_validator(input_path)
if not input_valid:
return {"status": "BLOCKED", "reason": f"Input validation failed: {input_errors}"}
for attempt in range(max_retries):
output = processor(input_path)
output_valid, output_errors = output_validator(output)
if output_valid:
return {"status": "DONE", "output": output, "attempts": attempt + 1}
# Loop back with fix instructions
processor = make_fix_processor(processor, output_errors)
return {"status": "FAILED", "reason": f"Failed after {max_retries} attempts", "last_errors": output_errors}
4. Advisory Mode Tiers
Principle: Not all actions carry the same risk. Categorize agent capabilities into tiers with different autonomy levels and approval requirements.
The mistake people make is binary: either the agent can do everything, or it can do nothing. Tiers let you give autonomy where it's safe and require approval where it's not.
The Four Tiers
| Tier | Risk | Probation | Graduation | Example |
|---|---|---|---|---|
| Low | Reversible, internal only | 3 days | Self-promote after clean streak | Read files, search, summarize |
| Medium | Visible to user, recoverable | 2 weeks | Human approves promotion | Create files, edit code, run tests |
| High | Visible to others, hard to reverse | 2 weeks minimum | Never fully unsupervised | Git push, create PRs, post to Slack |
| Restricted | Irreversible or impersonation risk | Permanent | Always draft-only | Send email from user's account, delete data, financial transactions |
Critical Rule: Email = Restricted
Sending email from a user's account is always Restricted tier. No exceptions. No graduation. Always draft-only with human send.
Why: Email is identity. An AI sending email "as you" creates legal, professional, and trust risks that no amount of testing eliminates.
## Advisory Mode Configuration (AGENTS.md)
### Tier: Low (auto-approve after 3-day probation)
- Read any file in workspace
- Search codebase
- Generate summaries to memory files
- Run read-only API calls
### Tier: Medium (human approves after 2-week probation)
- Create/edit files in workspace
- Run test suites
- Generate reports
- Schedule cron jobs (read-only actions only)
### Tier: High (2-week probation, never fully autonomous)
- Git commit and push
- Create pull requests
- Post to Slack channels
- Modify cron jobs
### Tier: Restricted (always draft-only, human executes)
- Send email from user's account
- Delete files/data outside workspace
- Financial transactions (invoice, payment)
- Modify access controls or permissions
- Post to social media as user
Probation Protocol
## Probation Rules
1. New capability starts at its tier's probation period
2. During probation: agent proposes action, human approves/denies
3. Clean streak = no denials or corrections for full probation period
4. After clean streak:
- Low: auto-promotes, agent logs the promotion
- Medium: agent requests promotion, human approves
- High: agent requests promotion, human approves, but spot-checks continue
- Restricted: never promotes — always draft-only
5. Any denial during probation resets the probation clock
6. Graduated capability can be demoted if quality degrades
5. Completion Contracts
Principle: Every automated workflow needs binary done-criteria, observable evidence, staged approval, and timeout bounds. No "probably done" — it's done or it's not.
The Contract Template
## Completion Contract: {workflow name}
### Done Criteria (all must be true)
- [ ] {criterion 1} — verified by: {method}
- [ ] {criterion 2} — verified by: {method}
- [ ] {criterion 3} — verified by: {method}
### Evidence Required
- {artifact 1}: {location/format}
- {artifact 2}: {location/format}
### Approval Stages
1. Automated validation passes (criteria above)
2. Agent self-review (checklist)
3. Human approval (if tier requires it)
### Timeout
- Maximum duration: {time}
- On timeout: {action — alert human, retry once, abort}
- Escalation: {who gets notified}
### Rollback
- Revert procedure: {steps}
- Rollback trigger: {conditions}
Example: Deployment Contract
## Completion Contract: Production Deploy
### Done Criteria
- [ ] All tests pass on deploy branch — verified by: CI green check
- [ ] Docker image builds successfully — verified by: image SHA in registry
- [ ] Health check returns 200 — verified by: curl to /health within 60s
- [ ] No error spike in first 5 minutes — verified by: error rate < 0.1%
### Evidence Required
- CI run URL with green status
- Docker image SHA256
- Health check response (timestamp + status code)
- Error rate dashboard screenshot at T+5min
### Approval Stages
1. CI passes automatically
2. Agent verifies health check and error rate
3. Human confirms "deploy complete" in Slack
### Timeout
- Maximum duration: 15 minutes from deploy start
- On timeout: auto-rollback to previous version, alert #ops channel
- Escalation: page on-call engineer if rollback also fails
### Rollback
- Revert: deploy previous Docker image SHA
- Trigger: health check fails OR error rate > 0.5% OR human says "rollback"
6. Cross-Validation
Principle: Generate with one model, review with another. Different architectures catch different blind spots.
Single-model pipelines have correlated failure modes. If Claude hallucinates a fact, Claude reviewing its own work will often confirm the hallucination. A different model (or a human) brings uncorrelated errors.
The Sub-Agent QC Workflow
Produce (Sonnet) → Review (Sam) → Cross-check (GPT) → Incorporate → Deliver
This isn't about which model is "better." It's about error decorrelation. Each reviewer catches things the others miss.
Implementation
## Cross-Validation Protocol
### Step 1: Produce (Primary Model)
- Model: Claude Sonnet (fast, cost-effective for drafts)
- Output: first draft with citations
### Step 2: Review (Human)
- Reviewer: Sam
- Focus: factual accuracy, tone, strategic alignment
- Output: annotated draft with corrections
### Step 3: Cross-Check (Secondary Model)
- Model: GPT-4 or different Claude variant
- Prompt: "Review this document for factual errors, logical inconsistencies,
and unsupported claims. Do not rewrite — only flag issues with explanations."
- Focus: catch blind spots the primary model and human missed
- Output: issue list with severity ratings
### Step 4: Incorporate
- Primary model incorporates human + cross-check feedback
- Changes tracked and justified
### Step 5: Deliver
- Final version with revision history
- Confidence rating based on number of issues found and fixed
When Cross-Validation Matters Most
- Legal or compliance content — different models interpret regulations differently
- Financial calculations — arithmetic errors are model-specific
- Factual claims — hallucination patterns differ across architectures
- Security reviews — different models catch different vulnerability classes
When It's Overkill
- Internal notes nobody else will read
- Ephemeral content (daily logs, scratch work)
- Tasks where speed matters more than correctness
- Outputs with automated validation (tests, linters) that catch errors mechanically
7. Rule Escalation Ladder
Principle: Rules start as prose. If violated, they escalate to loaded rules. If violated again, they become script gates. Critical rules skip the ladder entirely.
The problem with prose rules is enforcement. An agent "knows" the rule but still violates it under pressure (long context, competing instructions, ambiguous situations). The escalation ladder adds mechanical enforcement for rules that matter.
The Three Levels
Level 1: Prose Rule (in AGENTS.md)
"Don't send emails without approval"
→ Relies on agent reading and following the rule
→ Appropriate for: new rules, low-risk guidelines
Level 2: Loaded Rule (in decisions.md, checked at session start)
"EMAIL_SENDING: RESTRICTED — always draft-only, never auto-send"
→ Agent must load and acknowledge before acting
→ Appropriate for: rules violated once, medium-risk operations
Level 3: Script Gate (mechanical enforcement)
Pre-send hook checks for human approval token
→ Agent literally cannot bypass the rule
→ Appropriate for: rules violated twice, high-risk operations, critical rules
Escalation Protocol
## Rule Escalation (AGENTS.md)
### Escalation Triggers
- First violation of a prose rule → add to decisions.md as loaded rule
- Second violation (now a loaded rule) → implement as script gate
- Any violation of a critical rule → skip to script gate immediately
### Critical Rules (always script-gated)
- Sending email from user's account
- Deleting files outside workspace
- Financial transactions
- Modifying access controls
- Publishing to external platforms
### Currently Loaded Rules (decisions.md)
- See decisions.md for the current set — these are checked every session start
Script Gate Example
#!/bin/bash
# scripts/gate-email-send.sh — mechanical enforcement of email restriction
APPROVAL_TOKEN_FILE="/tmp/.email-approval-$(date +%Y%m%d)"
if [ ! -f "$APPROVAL_TOKEN_FILE" ]; then
echo "BLOCKED: Email sending requires human approval."
echo "Human: run 'echo APPROVED > $APPROVAL_TOKEN_FILE' to authorize."
exit 1
fi
APPROVAL=$(cat "$APPROVAL_TOKEN_FILE")
if [ "$APPROVAL" != "APPROVED" ]; then
echo "BLOCKED: Approval token invalid."
exit 1
fi
echo "GATE PASSED: Email send authorized for today."
# Proceed with email send
"$@"
# Consume the token (one-time use)
rm "$APPROVAL_TOKEN_FILE"
8. Heartbeat Protocol
Principle: Periodic health checks batched together. Context monitor, system health, memory maintenance — all in one scheduled pulse, not scattered across individual crons.
Heartbeat vs. Cron Decision
| Use Heartbeat When | Use Individual Cron When |
|---|---|
| Check is lightweight (< 30 seconds) | Task is heavyweight (minutes) |
| Multiple checks share context | Task is completely independent |
| Failure in one check should inform others | Task has its own retry/error handling |
| You want a single "system status" view | Task needs its own schedule (not aligned) |
Heartbeat Structure
## Heartbeat Protocol (runs every 4 hours)
### Phase 1: Context Monitor (5 seconds)
- Check MEMORY.md size (warn if > 200 lines)
- Check daily note exists for today
- Verify SOUL.md and AGENTS.md haven't been modified unexpectedly
### Phase 2: System Health (10 seconds)
- Disk space check (warn if < 10% free)
- Check if critical services are running (by PID file)
- Verify cron jobs are registered and last-ran within expected windows
### Phase 3: Memory Maintenance (15 seconds)
- Scan for contradictions (see Pattern 9)
- Archive daily notes older than 7 days
- Update system-health.json with current status
### Output
- Write to HEARTBEAT.md: timestamp, all-clear or issues found
- If issues found: list them with severity and suggested fix
- If critical issue: alert human immediately (don't wait for next heartbeat)
Implementation
#!/bin/bash
# scripts/heartbeat.sh
HEARTBEAT_FILE="HEARTBEAT.md"
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
STATUS="ALL CLEAR"
ISSUES=""
# Phase 1: Context Monitor
MEMORY_LINES=$(wc -l < MEMORY.md 2>/dev/null || echo "0")
if [ "$MEMORY_LINES" -gt 200 ]; then
ISSUES="$ISSUES\n- WARN: MEMORY.md is $MEMORY_LINES lines (limit: 200)"
STATUS="ISSUES FOUND"
fi
if [ ! -f "memory/$(date +%Y-%m-%d).md" ]; then
ISSUES="$ISSUES\n- INFO: No daily note for today"
fi
# Phase 2: System Health
DISK_FREE=$(df -h . | tail -1 | awk '{print $5}' | tr -d '%')
if [ "$DISK_FREE" -gt 90 ]; then
ISSUES="$ISSUES\n- CRITICAL: Disk usage at ${DISK_FREE}%"
STATUS="CRITICAL"
fi
# Phase 3: Memory Maintenance
# (Contradiction detection delegated to agent — see Pattern 9)
# Write heartbeat
cat > "$HEARTBEAT_FILE" << EOF
# Heartbeat
Last check: $TIMESTAMP
Status: $STATUS
$(if [ -n "$ISSUES" ]; then echo -e "\n## Issues\n$ISSUES"; fi)
EOF
echo "Heartbeat complete: $STATUS"
9. Contradiction Detection
Principle: Actively scan for conflicts between memory entries, between memory and SOUL, stale facts, and decision reversals. Don't wait for contradictions to cause errors — find them during maintenance.
Contradiction Types
| Type | Description | Example |
|---|---|---|
| Memory-Memory | Two memory entries say opposite things | "Client prefers email" vs "Client prefers Slack" |
| Memory-SOUL | Memory contradicts core identity/rules | SOUL says "never auto-send email" but memory says "auto-send enabled for digest" |
| Stale Facts | Memory entry is outdated | "API endpoint: api.v1.example.com" when v1 was deprecated |
| Decision Reversal | decisions.md contradicts earlier decision without noting the change | "Use PostgreSQL" then later "Use SQLite" with no migration note |
Scan Protocol
## Contradiction Detection (run during heartbeat Phase 3)
### Scan Checklist
1. Load all memory entries with type=project and type=reference
2. For each pair, check for semantic conflicts:
- Same topic, different conclusions
- Same entity, different attributes
- Same process, different steps
3. Load SOUL.md rules and check memory entries against each rule
4. Check decisions.md for entries that reverse previous decisions without rationale
5. Flag entries older than 30 days for staleness review
### Output Format
- If contradictions found: write to memory/contradictions-{date}.md
- Each entry: the two conflicting sources, the conflict, suggested resolution
- Critical contradictions (SOUL violations): alert immediately
### Resolution
- Human reviews contradictions list
- For each: keep A, keep B, merge, or delete both
- Update affected files
- Log resolution in decisions.md
Example Contradiction Report
# Contradictions Found — 2026-03-15
## CRITICAL: Memory-SOUL Conflict
- **SOUL.md line 23:** "Never auto-send email from user's account"
- **memory/gmail-daily-summary.md:** "Auto-send daily digest at 7am"
- **Resolution needed:** Either update SOUL or disable auto-send
- **Severity:** CRITICAL — active violation of core rule
## WARN: Memory-Memory Conflict
- **memory/wallets-onchain-identity.md:** "Primary wallet: 0xABC..."
- **memory/2026-03-12.md:** "Migrated primary wallet to 0xDEF..."
- **Resolution needed:** Update wallets file with new primary address
- **Severity:** MEDIUM — stale reference may cause wrong wallet usage
## INFO: Stale Entry
- **memory/starred-repos.md:** Last updated 45 days ago
- **Suggestion:** Review and refresh or archive
- **Severity:** LOW
10. Tight Harness Principle
Principle: Autonomy gets useful when the harness is tight. Don't sell agents — sell harnesses. An agent without a harness is a liability. A harness without an agent is just a script.
The Five Harness Components
Every autonomous agent operation needs all five:
| Component | Question | Example |
|---|---|---|
| Objective Metric | How do we know it worked? | "Test suite passes" not "code looks good" |
| Bounded Scope | What can it touch? | "Only files in /src/api/" not "any file" |
| Time Budget | When does it stop? | "15 minutes max" not "when it's done" |
| Reversibility | Can we undo it? | "Git branch, not direct commit to main" |
| Observability | Can we see what it did? | "Full command log" not "trust me" |
The Key Insight
Most agent failures aren't capability failures — they're harness failures. The agent could do the task, but:
- Nobody defined "done" objectively (no metric)
- It modified files it shouldn't have (no scope bound)
- It ran for 3 hours burning tokens (no time budget)
- It pushed directly to main (no reversibility)
- Nobody could tell what it did (no observability)
Harness Configuration Example
## Harness: Automated PR Review Agent
### Objective Metric
- All review comments reference specific code lines
- No false positive rate > 10% (tracked over 2-week window)
- Review completed within 5 minutes of PR open
### Bounded Scope
- READ: any file in the repository
- WRITE: only PR comments via GitHub API
- CANNOT: approve PRs, merge PRs, modify code, close PRs
### Time Budget
- Maximum 5 minutes per PR
- Maximum 20 PRs per day
- On budget exceeded: skip PR, log reason, alert human
### Reversibility
- All comments can be deleted
- No permanent actions taken
- Human can dismiss any comment
### Observability
- Every review logged to reviews/{date}-{pr-number}.md
- Includes: files reviewed, issues found, comments posted, time taken
- Weekly accuracy report generated automatically
Selling Harnesses, Not Agents
When someone asks "can your agent do X?" — the right answer is "here's the harness that makes X safe":
BAD: "Yes, our agent can deploy to production!"
GOOD: "Yes, with this harness: deploys only to staging first, requires health
check pass, auto-rollback on error spike, human approval for prod
promotion, full audit log, 15-minute timeout."
The harness IS the product. The agent is just the engine inside it.
Quick Reference: Pattern Selection Guide
| Situation | Pattern |
|---|---|
| "Should this agent exist?" | Consumer-First Design (#1) |
| "Agent says it's done but I don't believe it" | Proof-of-Work (#2) |
| "Multi-step pipeline keeps producing garbage" | Cascading Validation (#3) |
| "How much autonomy should this agent have?" | Advisory Mode Tiers (#4) |
| "When is this workflow actually done?" | Completion Contracts (#5) |
| "Agent keeps making the same kind of error" | Cross-Validation (#6) |
| "Agent keeps violating a rule" | Rule Escalation Ladder (#7) |
| "How do I monitor agent health?" | Heartbeat Protocol (#8) |
| "Agent's memory is inconsistent" | Contradiction Detection (#9) |
| "How do I make autonomy safe?" | Tight Harness Principle (#10) |
Combining Patterns
These patterns are composable. A production agent typically uses several together:
Consumer-First Design → Does this agent need to exist?
↓ yes
Advisory Mode Tiers → What can it do autonomously?
↓ configured
Completion Contracts → How do we know each task is done?
↓ defined
Cascading Validation → How do multi-step tasks flow?
↓ piped
Proof-of-Work → How do we verify claims?
↓ enforced
Cross-Validation → How do we catch blind spots?
↓ reviewed
Rule Escalation Ladder → How do we handle violations?
↓ gated
Heartbeat Protocol → How do we monitor ongoing health?
↓ pulsing
Contradiction Detection → How do we keep memory consistent?
↓ clean
Tight Harness → How do we keep all of this safe?
Start with #1 (does this agent need to exist?) and #10 (is the harness tight?). Add the others as complexity demands.