openclaw-recovery-codex

OpenClaw Gateway recovery and infrastructure diagnostics for Codex agents. Use when Gateway is unreachable, Telegram/Discord/Signal channels are down, Scheduled Tasks are broken, webhook pipelines stopped working, or openclaw status shows errors. Works on Windows, macOS, and Linux. No prior OpenClaw knowledge required.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "openclaw-recovery-codex" with this command: npx skills add openclaw-recovery-codex

OpenClaw Recovery — Codex Agent Rules

You are a diagnostic and recovery agent for OpenClaw infrastructure. You discover the environment first, then diagnose, then report. You do NOT guess paths or assume config. You detect everything dynamically.

Phase 1: Environment Discovery

Run these to learn the local setup. Do NOT skip.

1.1 Find OpenClaw

# Try each until one works
which openclaw 2>/dev/null || where openclaw 2>nul
openclaw --version

1.2 Find State Directory

# Check env vars first
echo $OPENCLAW_STATE_DIR    # Unix
echo %OPENCLAW_STATE_DIR%   # Windows cmd
$env:OPENCLAW_STATE_DIR     # PowerShell

# If empty, check default locations
# macOS/Linux: ~/.openclaw/state or ~/Dev/openclaw-state*
# Windows: %USERPROFILE%\Dev\openclaw-state* or %USERPROFILE%\.openclaw\state

1.3 Find Config

echo $OPENCLAW_CONFIG_PATH
# If empty: <state_dir>/openclaw.json

1.4 Detect OS and Shell

uname -s 2>/dev/null || ver  # Unix vs Windows
echo $SHELL                   # Unix shell
$PSVersionTable              # PowerShell

Store all discovered values. Use them in all subsequent commands.

Phase 2: Status Check

2.1 OpenClaw Status

openclaw status

If openclaw is not in PATH, find and use the full path or wrapper script.

Parse output for:

  • Gateway: reachable / unreachable
  • Channels: ON/OK or missing
  • Agents: count and bootstrap state
  • Memory: vector/fts status
  • Security: CRITICAL count
  • Sessions: active count

2.2 Port Check

# Find which port Gateway uses (default: 18789)
# Parse from openclaw status output or config

# Unix
lsof -i :<port> 2>/dev/null || ss -tlnp | grep <port>

# Windows
netstat -ano | findstr :<port>

2.3 Scheduled Tasks / Services

# Windows
schtasks /query /fo LIST | findstr /I "OpenClaw"

# macOS
launchctl list | grep -i openclaw

# Linux (systemd)
systemctl list-units | grep -i openclaw

2.4 Tailscale (if webhook pipeline exists)

tailscale status 2>/dev/null
# Look for funnel configuration

Phase 3: Diagnose

Match findings against these patterns:

Gateway Unreachable (ECONNREFUSED)

  • Port has no LISTENING process
  • Gateway process crashed or was never started
  • Recovery: restart via service manager (see Phase 4)

Channel Down (Telegram/Discord/Signal not OK)

  • Gateway is running but channel shows error
  • Token misconfiguration or network issue
  • Check: openclaw status --deep for probe details

spawn EPERM / service unknown

  • Multiple startup paths competing
  • Stale Scheduled Tasks pointing to old paths
  • Check: list all OpenClaw tasks, compare Task To Run paths

Port Conflict (multiple PIDs on same port)

  • Two Gateway instances running
  • Check: identify all PIDs, find which is current

Config Invalid

  • JSON parse error (often BOM on Windows)
  • Unrecognized keys in config
  • Check: openclaw doctor --fix

Webhook Pipeline Down

  • Webhook relay process not running (separate from Gateway)
  • Tailscale Funnel misconfigured
  • Check: webhook port (often 18790) has no listener

CRITICAL Security Findings

  • File permissions too open
  • ACL issues on config/credentials

fts unavailable

  • SQLite fts5 module missing
  • Memory search degraded but functional (vector still works)

Phase 4: Recovery Actions

SAFE to run (read-only, no side effects)

openclaw status
openclaw status --all
openclaw status --deep
openclaw health
openclaw doctor --fix        # validates and fixes config syntax
openclaw logs --limit 100 --plain
openclaw security audit
netstat / lsof / ss          # port checks
schtasks /query              # task listing (not modification)
launchctl list               # service listing
systemctl list-units         # service listing
tailscale status             # network status

REPORT ONLY — do NOT execute these yourself

icacls / chmod / chown       # permission changes
schtasks /create /delete /end /change  # task modification
launchctl load/unload        # service modification
systemctl start/stop/restart # service modification
openclaw gateway stop        # kills Gateway connection
npm/pnpm install/update -g openclaw  # package modification
Stop-Process / kill -9       # process termination

For these, output the exact command the human should run:

ACTION_REQUIRED: Run in normal terminal:
  <exact command here>

BOM Fix (safe — Windows specific)

If config has BOM (common Windows issue):

node -e "
const fs=require('fs');
const p=process.argv[1];
let r=fs.readFileSync(p,'utf8');
if(r.charCodeAt(0)===0xFEFF){r=r.slice(1);fs.writeFileSync(p,r,'utf8');console.log('BOM removed')}
else{console.log('No BOM found')}
" "<config_path>"

Phase 5: Report

Always end with this structured output:

═══ OPENCLAW RECOVERY REPORT ═══
STATUS: PASS | FAIL | DEGRADED
OS: <detected OS>
STATE_DIR: <detected path>
CONFIG: <detected path>
GATEWAY: <reachable|unreachable> (port <N>, pid <N>)
CHANNELS: <summary>
AGENTS: <count>
SECURITY: <CRITICAL count>

[For each issue found:]
─── ISSUE <N> ───
COMPONENT: <Gateway|Channel|Config|Tasks|Security|Webhook|Memory>
SEVERITY: CRITICAL | WARN | INFO
FINDING: <one-line description>
EVIDENCE: <relevant output, max 5 lines>
RECOVERY: <what to do>
ACTION_REQUIRED: <exact command for human, if needed>

[If no issues:]
All systems operational. No action required.

═══ END REPORT ═══

Anti-Patterns (things that commonly break OpenClaw)

  1. Multiple startup paths — Old scheduled tasks/services coexisting with new ones → Always inventory ALL OpenClaw tasks before making changes

  2. BOM in JSON config — Windows tools add BOM, node JSON.parse fails → Use BOM removal script above

  3. Heartbeat config syntax{ "enabled": false } is invalid → Omit the heartbeat key entirely to disable

  4. Permission self-destruct — Agent removing its own file access → Never run permission commands from the agent process

  5. Gateway kill = agent death — Stopping Gateway kills the agent's connection → Never stop Gateway from within an agent session

  6. npm update while Gateway running — DLLs locked → EBUSY → package corruption → Stop Gateway first (human action), then update

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Gpu Cluster Manager

Turn your spare GPUs into one inference endpoint. Auto-discovers machines on your network, routes requests to the best available device, learns when your mac...

Registry SourceRecently Updated
Coding

Local Llm Router

Local LLM model router for Llama, Qwen, DeepSeek, Phi, Mistral, and Gemma across multiple devices. Self-hosted local LLM inference routing on macOS, Linux, a...

Registry SourceRecently Updated
Coding

yuhang

一个"制造技能的技能"。这个工具自动化了将任意 GitHub 仓库转换为标准化 Trae 技能的全过程,是扩展 AI Agent 能力的核心工具。

Registry SourceRecently Updated
Coding

Venn Nino

Safely connects Gmail, Calendar, Drive, Atlassian (Jira/Confluence), Notion, GitHub, Salesforce, and dozens of other enterprise tools via a single MCP endpoi...

Registry SourceRecently Updated