Agent Debugger

Systematic debugging for AI agent issues. When your agent misbehaves, this skill helps identify and fix the problem.

Common Agent Problems

1. Infinite Loops

Symptoms:

Agent repeats same action
Gets stuck in a pattern
Never completes task

Diagnosis:

Agent log shows:
- Same tool called 10+ times
- Same output format repeated
- No progress between iterations

Fixes:

Add iteration limit:

{
  "maxIterations": 5,
  "onLimit": "ask_user"
}

Add explicit stop condition:

In your instructions, add:
"If you've tried the same approach 3 times without success, stop and ask the user for guidance."

2. Tool Failures

Symptoms:

Tool returns error
Tool times out
Tool not found

Diagnosis:

Check:
- Tool exists in available_tools
- Parameters match tool schema
- Tool has required permissions
- Rate limits not exceeded

Fixes:

Validate parameters first:

# Before calling tool
required_params = tool.get("required", [])
for param in required_params:
    if param not in args:
        raise ValueError(f"Missing required parameter: {param}")

Add retry logic:

{
  "retries": 3,
  "retryDelay": 1000,
  "retryOn": ["rate_limit", "timeout", "5xx"]
}

3. Context Overflow

Symptoms:

"Context length exceeded" error
Agent forgets earlier conversation
Truncated outputs

Diagnosis:

Check context window:
- Current tokens vs max tokens
- Number of messages in history
- Size of file contents loaded

Fixes:

Use memory efficiently:

- Load only relevant files
- Use offset/limit for large files
- Summarize long conversations
- Clear old context periodically

Compress context:

# Instead of full file
content = read("file.txt", offset=1, limit=100)

# Use memory_search for specific info
results = memory_search("important decision")

4. Rate Limiting

Symptoms:

"Rate limit exceeded" error
Requests blocked
429 status codes

Diagnosis:

Check:
- API rate limits (requests per minute/hour)
- Token limits (tokens per minute)
- Concurrent request limits
- Time until reset

Fixes:

Add backoff:

import time
import random

def call_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError as e:
            wait = (2 ** attempt) + random.random()
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Queue requests:

from queue import Queue
from threading import Thread

request_queue = Queue()

def process_queue():
    while True:
        task = request_queue.get()
        result = execute(task)
        request_queue.task_done()
        time.sleep(0.1)  # Rate limit: 10 req/s

5. Memory Issues

Symptoms:

Agent doesn't remember previous context
MEMORY.md not loaded
Memory files not found

Diagnosis:

Check:
- MEMORY.md exists
- memory/ directory exists
- Files have correct permissions
- Memory loaded at startup

Fixes:

Verify memory setup:

ls -la ~/.openclaw/workspace/
# Should show:
# MEMORY.md
# memory/

Add memory to instructions:

Before answering anything about prior work, decisions, dates, people, or todos: 
run memory_search on MEMORY.md + memory/*.md

6. Permission Errors

Symptoms:

"Permission denied"
"Access denied"
Tools not working

Diagnosis:

Check:
- User permissions
- File permissions
- Tool policies
- Sandbox restrictions

Fixes:

Check file permissions:

ls -la /path/to/file
chmod 600 ~/.openclaw/workspace/sensitive.json

Review tool policies:

{
  "tools": {
    "exec": {
      "security": "ask",  // or "allowlist" or "full"
      "ask": "on-miss"    // or "always" or "off"
    }
  }
}

7. Performance Issues

Symptoms:

Slow responses
Timeouts
High resource usage

Diagnosis:

Profile the agent:
- Time each tool call
- Count tokens used
- Measure context growth
- Identify bottlenecks

Fixes:

Optimize context:

# Instead of loading entire file
content = read("large_file.txt", limit=50)

# Use targeted search
results = memory_search("specific topic")

Reduce tool calls:

# Bad: Multiple calls
file1 = read("file1.txt")
file2 = read("file2.txt")
file3 = read("file3.txt")

# Good: Parallel or combined
files = read(["file1.txt", "file2.txt", "file3.txt"])

Debugging Workflow

Step 1: Reproduce

1. Document exact steps to trigger issue
2. Note expected vs actual behavior
3. Check if issue is consistent or intermittent
4. Try with minimal example

Step 2: Isolate

1. Disable other skills
2. Reduce context to minimum
3. Simplify task
4. Test each component separately

Step 3: Diagnose

1. Check logs (if available)
2. Review tool outputs
3. Examine context window
4. Verify configuration

Step 4: Fix

1. Apply fix
2. Test fix
3. Document fix
4. Update instructions if needed

Step 5: Prevent

1. Add guardrails
2. Update error handling
3. Add logging
4. Document in memory

Debugging Tools

Check Agent Status

# If you have access to session tools
status = session_status()
print(f"Model: {status['model']}")
print(f"Tokens used: {status['usage']['total_tokens']}")
print(f"Reasoning: {status['reasoning']}")

Clear Context

If agent is stuck:
1. Start new session
2. Load only essential memory
3. Re-approach task fresh

Enable Verbose Mode

{
  "thinking": "verbose",
  "reasoning": "on"
}

This shows internal reasoning, helping identify where logic fails.

Common Error Messages

Error	Cause	Fix
`context_length_exceeded`	Too much context	Compress, summarize, limit
`rate_limit_exceeded`	Too many requests	Backoff, queue, wait
`tool_not_found`	Wrong tool name	Check spelling, install skill
`permission_denied`	Insufficient access	Check permissions, ask user
`invalid_parameters`	Wrong params	Validate against schema
`timeout`	Slow response	Increase timeout, optimize
`memory_not_found`	No memory files	Create MEMORY.md

Best Practices

1. Defensive Coding

# Always check before acting
if not os.path.exists(file):
    return "File not found"

try:
    result = risky_operation()
except ExpectedError:
    handle_error()

2. Progress Tracking

In agent instructions:
"Track your progress. After each major step, note what you've done and what's next."

3. Checkpointing

For long tasks:
- Save progress periodically
- Document current state
- Allow resuming from checkpoint

4. Logging

# Add to critical operations
log(f"Starting operation: {operation}")
log(f"Parameters: {params}")
log(f"Result: {result}")
log(f"Error: {error}")

When to Ask for Help

Ask the user when:

Multiple fix attempts failed
Issue is intermittent
Would require destructive actions
Need information only user has
Configuration changes needed

Prevention Tips

Set limits early - max iterations, max tokens, max retries
Validate inputs - check parameters before calling tools
Handle errors gracefully - don't crash, report and adapt
Log important events - helps debugging later
Test edge cases - empty inputs, large files, special characters
Monitor resources - tokens, time, memory usage
Document quirks - save lessons in MEMORY.md

agent-debugger

Safety Notice

Copy this and send it to your AI assistant to learn

Agent Debugger

Common Agent Problems

1. Infinite Loops

2. Tool Failures

3. Context Overflow

4. Rate Limiting

5. Memory Issues

6. Permission Errors

7. Performance Issues

Debugging Workflow

Step 1: Reproduce

Step 2: Isolate

Step 3: Diagnose

Step 4: Fix

Step 5: Prevent

Debugging Tools

Check Agent Status

Clear Context

Enable Verbose Mode

Common Error Messages

Best Practices

1. Defensive Coding

2. Progress Tracking

3. Checkpointing

4. Logging

When to Ask for Help

Prevention Tips

Source Transparency

Related Skills

AI Agent 自省调试框架

Immortal_Brain

Event Driven Topology Selector

Agentderby