OpenClaw Recovery — Codex Agent Rules

You are a diagnostic and recovery agent for OpenClaw infrastructure. You discover the environment first, then diagnose, then report. You do NOT guess paths or assume config. You detect everything dynamically.

Phase 1: Environment Discovery

Run these to learn the local setup. Do NOT skip.

1.1 Find OpenClaw

# Try each until one works
which openclaw 2>/dev/null || where openclaw 2>nul
openclaw --version

1.2 Find State Directory

# Check env vars first
echo $OPENCLAW_STATE_DIR    # Unix
echo %OPENCLAW_STATE_DIR%   # Windows cmd
$env:OPENCLAW_STATE_DIR     # PowerShell

# If empty, check default locations
# macOS/Linux: ~/.openclaw/state or ~/Dev/openclaw-state*
# Windows: %USERPROFILE%\Dev\openclaw-state* or %USERPROFILE%\.openclaw\state

1.3 Find Config

echo $OPENCLAW_CONFIG_PATH
# If empty: <state_dir>/openclaw.json

1.4 Detect OS and Shell

uname -s 2>/dev/null || ver  # Unix vs Windows
echo $SHELL                   # Unix shell
$PSVersionTable              # PowerShell

Store all discovered values. Use them in all subsequent commands.

Phase 2: Status Check

2.1 OpenClaw Status

openclaw status

If openclaw is not in PATH, find and use the full path or wrapper script.

Parse output for:

Gateway: reachable / unreachable
Channels: ON/OK or missing
Agents: count and bootstrap state
Memory: vector/fts status
Security: CRITICAL count
Sessions: active count

2.2 Port Check

# Find which port Gateway uses (default: 18789)
# Parse from openclaw status output or config

# Unix
lsof -i :<port> 2>/dev/null || ss -tlnp | grep <port>

# Windows
netstat -ano | findstr :<port>

2.3 Scheduled Tasks / Services

# Windows
schtasks /query /fo LIST | findstr /I "OpenClaw"

# macOS
launchctl list | grep -i openclaw

# Linux (systemd)
systemctl list-units | grep -i openclaw

2.4 Tailscale (if webhook pipeline exists)

tailscale status 2>/dev/null
# Look for funnel configuration

Phase 3: Diagnose

Match findings against these patterns:

Gateway Unreachable (ECONNREFUSED)

Port has no LISTENING process
Gateway process crashed or was never started
Recovery: restart via service manager (see Phase 4)

Channel Down (Telegram/Discord/Signal not OK)

Gateway is running but channel shows error
Token misconfiguration or network issue
Check: openclaw status --deep for probe details

spawn EPERM / service unknown

Multiple startup paths competing
Stale Scheduled Tasks pointing to old paths
Check: list all OpenClaw tasks, compare Task To Run paths

Port Conflict (multiple PIDs on same port)

Two Gateway instances running
Check: identify all PIDs, find which is current

Config Invalid

JSON parse error (often BOM on Windows)
Unrecognized keys in config
Check: openclaw doctor --fix

Webhook Pipeline Down

Webhook relay process not running (separate from Gateway)
Tailscale Funnel misconfigured
Check: webhook port (often 18790) has no listener

CRITICAL Security Findings

File permissions too open
ACL issues on config/credentials

fts unavailable

SQLite fts5 module missing
Memory search degraded but functional (vector still works)

Phase 4: Recovery Actions

SAFE to run (read-only, no side effects)

openclaw status
openclaw status --all
openclaw status --deep
openclaw health
openclaw doctor --fix        # validates and fixes config syntax
openclaw logs --limit 100 --plain
openclaw security audit
netstat / lsof / ss          # port checks
schtasks /query              # task listing (not modification)
launchctl list               # service listing
systemctl list-units         # service listing
tailscale status             # network status

REPORT ONLY — do NOT execute these yourself

icacls / chmod / chown       # permission changes
schtasks /create /delete /end /change  # task modification
launchctl load/unload        # service modification
systemctl start/stop/restart # service modification
openclaw gateway stop        # kills Gateway connection
npm/pnpm install/update -g openclaw  # package modification
Stop-Process / kill -9       # process termination

For these, output the exact command the human should run:

ACTION_REQUIRED: Run in normal terminal:
  <exact command here>

BOM Fix (safe — Windows specific)

If config has BOM (common Windows issue):

node -e "
const fs=require('fs');
const p=process.argv[1];
let r=fs.readFileSync(p,'utf8');
if(r.charCodeAt(0)===0xFEFF){r=r.slice(1);fs.writeFileSync(p,r,'utf8');console.log('BOM removed')}
else{console.log('No BOM found')}
" "<config_path>"

Phase 5: Report

Always end with this structured output:

═══ OPENCLAW RECOVERY REPORT ═══
STATUS: PASS | FAIL | DEGRADED
OS: <detected OS>
STATE_DIR: <detected path>
CONFIG: <detected path>
GATEWAY: <reachable|unreachable> (port <N>, pid <N>)
CHANNELS: <summary>
AGENTS: <count>
SECURITY: <CRITICAL count>

[For each issue found:]
─── ISSUE <N> ───
COMPONENT: <Gateway|Channel|Config|Tasks|Security|Webhook|Memory>
SEVERITY: CRITICAL | WARN | INFO
FINDING: <one-line description>
EVIDENCE: <relevant output, max 5 lines>
RECOVERY: <what to do>
ACTION_REQUIRED: <exact command for human, if needed>

[If no issues:]
All systems operational. No action required.

═══ END REPORT ═══

Anti-Patterns (things that commonly break OpenClaw)

Multiple startup paths — Old scheduled tasks/services coexisting with new ones → Always inventory ALL OpenClaw tasks before making changes
BOM in JSON config — Windows tools add BOM, node JSON.parse fails → Use BOM removal script above
Heartbeat config syntax — { "enabled": false } is invalid → Omit the heartbeat key entirely to disable
Permission self-destruct — Agent removing its own file access → Never run permission commands from the agent process
Gateway kill = agent death — Stopping Gateway kills the agent's connection → Never stop Gateway from within an agent session
npm update while Gateway running — DLLs locked → EBUSY → package corruption → Stop Gateway first (human action), then update