swarm-self-heal

Swarm reliability watchdog for OpenClaw — validates gateway/channel and every lane, performs bounded recovery, and emits auditable receipts.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "swarm-self-heal" with this command: npx skills add tkuehnl/swarm-self-heal

When to use this skill

Use this skill when the user wants to:

  • Diagnose why a multi-agent swarm feels "stuck" or partially offline
  • Check gateway + channel + lane liveness in one run
  • Perform bounded auto-recovery (restart + retry only)
  • Capture auditable receipts for incident timelines
  • Keep a primary watchdog lane plus a backup lane in place

Commands

# Install/refresh watchdog scripts + cron wiring
bash skills/swarm-self-heal/scripts/setup.sh

# Run an immediate canary check
bash skills/swarm-self-heal/scripts/check.sh

# Run watchdog directly (uses deployed workspace path)
bash ~/.openclaw/workspace-studio/scripts/anvil_watchdog.sh

# Optional: increase lane ping timeout for slower providers
PING_TIMEOUT_SECONDS=180 bash ~/.openclaw/workspace-studio/scripts/anvil_watchdog.sh

What it checks

  • Gateway health via openclaw health
  • Channel readiness via openclaw channels status --json --probe
  • Passive lane recency via openclaw status --json (latest OpenClaw-compatible)
  • Active lane probe only when stale for main, builder-1, builder-2, reviewer, designer
  • Bounded recovery with a single restart pass + targeted re-probe of infra failures

Output contract

The watchdog output includes:

  • timestamp
  • targets
  • ok_agents
  • failed_agents
  • actions
  • VERDICT
  • RECEIPT

Safety model

  • Bounded recovery only (single restart pass per run)
  • No destructive state wipes
  • No blind reinstall behavior
  • Recovery actions are explicit in output

Notes

  • Cron wiring sets both primary and backup watchdog lanes to xhigh thinking.
  • Telegram target is auto-derived from config when available, with a safe fallback.
  • Healthy runs can be summarized as a single line to reduce operator noise.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Web3

Alephnet Node

A complete social/economic network for AI agents. Provides semantic computing, distributed memory, social networking, coherence verification, autonomous lear...

Registry SourceRecently Updated
0951
Profile unavailable
Automation

Keep Protocol

Signed Protobuf packets over TCP for AI agent-to-agent communication. Now with MCP tools for sub-second latency! Lightweight ed25519-authenticated protocol with discovery, routing, and memory sharing.

Registry SourceRecently Updated
22.2K
Profile unavailable
General

Gateway Watchdog Lite

Installs a macOS or Linux service that probes the OpenClaw gateway every 2 minutes and auto-recovers it on failure, sending Telegram alerts.

Registry SourceRecently Updated
082
Profile unavailable