failover-gateway

Set up an active-passive failover gateway for OpenClaw. Deploy a standby node that auto-promotes when your primary goes down and auto-demotes when it recovers. Includes health monitor script, systemd services, channel splitting strategy, and step-by-step deployment guide. Use when you need high availability, disaster recovery, or redundancy for your OpenClaw instance.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "failover-gateway" with this command: npx skills add ember-claw/failover-gateway-pub

Failover Gateway for OpenClaw

Deploy a standby OpenClaw gateway that automatically takes over when your primary goes down. Active-passive design with auto-promotion and auto-demotion.

What You Get

  • ~30 second failover — health monitor detects primary down, promotes standby
  • Auto-recovery — when primary comes back, standby demotes itself
  • Zero split-brain — primary and standby use different channels (no duplicate messages)
  • Git-synced workspace — standby pulls latest workspace on promotion
  • $12/month — runs on a minimal VPS

Architecture

PRIMARY (your main VPS)          STANDBY (failover VPS)
├─ Full stack (all channels)     ├─ Single channel only (e.g., Discord DM)
├─ All cron jobs                 ├─ No crons (recovery mode)
├─ Gateway active ✅              ├─ Gateway stopped 💤
└─ Pushes workspace to git       └─ Health monitor watches primary
                                      │
                                      ├─ Primary healthy → sleep
                                      ├─ Primary down 30s → PROMOTE
                                      └─ Primary back → DEMOTE

The key insight: split your channels between primary and standby. Don't share credentials — give each node exclusive ownership of different channels. This eliminates split-brain entirely.

Channel Split Examples

SetupPrimaryStandby
RC + DiscordRocket.Chat (full)Discord DM only
Discord + TelegramDiscord (full)Telegram DM only
Slack + DiscordSlack (full)Discord DM only

Your primary handles everything. The standby is minimal recovery — just enough to stay reachable.

Prerequisites

  • Primary OpenClaw instance running on a VPS
  • A second VPS for the standby ($6-12/mo, any provider)
  • Tailscale mesh network (or any VPN/private network)
  • Git repository for workspace sync (GitHub, GitLab, etc.)
  • A second messaging channel for the standby (different from primary)

Step-by-Step Deployment

Phase 1: Provision the Standby VPS

Any cheap VPS works. Recommended: 2GB RAM, Ubuntu 24.04.

# Harden the box
ufw allow 22/tcp
ufw enable
apt install -y fail2ban unattended-upgrades

# Create openclaw user
adduser openclaw --disabled-password
usermod -aG sudo openclaw
# Copy your SSH key to openclaw user

# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up --hostname=your-failover-name

Phase 2: Install OpenClaw

# As openclaw user
curl -fsSL https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
source ~/.bashrc
nvm install --lts
npm install -g openclaw

# Clone workspace
git clone <your-workspace-repo> ~/.openclaw/workspace

Phase 3: Failover Config

Create a minimal OpenClaw config on the standby. Only enable the standby channel:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-opus-4-6",
        "fallbacks": ["anthropic/claude-sonnet-4-5"]
      },
      "workspace": "/home/openclaw/.openclaw/workspace"
    },
    "list": [{ "id": "main", "default": true }]
  },
  "channels": {
    "discord": {
      "enabled": true,
      "token": "<YOUR_DISCORD_BOT_TOKEN>",
      "dm": {
        "policy": "allowlist",
        "allowFrom": ["<YOUR_DISCORD_USER_ID>"]
      }
    }
  },
  "gateway": {
    "port": 18789,
    "mode": "local",
    "bind": "tailnet"
  }
}

Important: Disable this channel on your primary to avoid conflicts.

Test it works: openclaw gateway run — verify the bot connects and responds, then stop it.

Phase 4: Deploy Health Monitor

Copy the included scripts/health-monitor.sh to the standby:

sudo cp health-monitor.sh /usr/local/bin/openclaw-health-monitor.sh
sudo chmod +x /usr/local/bin/openclaw-health-monitor.sh

Edit the variables at the top:

  • PRIMARY_IP — your primary's Tailscale IP
  • PRIMARY_PORT — your primary's gateway port (default: 18789)
  • SECRETS_HOST — (optional) host to rsync secrets from on promotion

Create the systemd services:

/etc/systemd/system/openclaw-health-monitor.service

[Unit]
Description=OpenClaw Failover Health Monitor
After=network-online.target tailscaled.service
Wants=network-online.target

[Service]
Type=simple
ExecStart=/usr/local/bin/openclaw-health-monitor.sh
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

/etc/systemd/system/openclaw.service

[Unit]
Description=OpenClaw Gateway (Failover)
After=network-online.target tailscaled.service
Wants=network-online.target

[Service]
Type=simple
User=openclaw
Group=openclaw
WorkingDirectory=/home/openclaw/.openclaw/workspace
ExecStart=/usr/bin/openclaw gateway run
Restart=on-failure
RestartSec=5
Environment=HOME=/home/openclaw
Environment=NODE_ENV=production

[Install]
WantedBy=multi-user.target

Enable the monitor (but NOT the gateway — the monitor starts it on promotion):

sudo systemctl daemon-reload
sudo systemctl enable openclaw-health-monitor
sudo systemctl start openclaw-health-monitor
# Do NOT enable openclaw.service — the monitor controls it

Phase 5: Disable Standby Channel on Primary

This is critical. Remove or disable the standby's channel from your primary config:

{
  "channels": {
    "discord": { "enabled": false }
  }
}

Each node owns its channels exclusively. No sharing, no conflicts.

Phase 6: Test

# On primary — simulate failure
sudo systemctl stop openclaw-gateway  # or kill the process

# Watch the standby logs
journalctl -u openclaw-health-monitor -f

# Expected: 3 failed checks → PROMOTE → gateway starts → standby channel live

# On primary — recover
sudo systemctl start openclaw-gateway

# Expected: standby detects primary → DEMOTE → gateway stops

Failover Timeline

TimeEvent
0sPrimary goes down
10sFirst health check fails
20sSecond check fails
30sThird check fails → PROMOTE
35sGit pull, secrets sync
40sGateway starting
45sStandby channel active
~60sYou're reachable again

Edge Cases

ScenarioResult
Primary diesStandby promotes in ~30-60s
Primary + standby dieYou're offline (add a third node?)
Network partitionStandby may promote while primary is still running — but since they use different channels, no conflicts
Standby rebootsHealth monitor auto-restarts (systemd), resumes watching
Primary flapsPromote/demote cycles — health monitor handles it, but consider increasing FAIL_THRESHOLD

Failback

Recovery is automatic. When the primary comes back:

  1. Health monitor detects primary healthy
  2. Stops the standby gateway
  3. Primary resumes all channels
  4. Standby returns to watching

No manual intervention needed.

Cost

ComponentCost
VPS (2GB RAM)$6-12/mo
TailscaleFree (personal)
Git repoFree
Total$6-12/mo

Tips

  • Test monthly. Kill your primary, verify failover works. Trust but verify.
  • Keep the standby minimal. No crons, no extra channels. It's recovery mode.
  • Git push frequently. The standby's workspace is only as fresh as your last push.
  • Use Tailscale. It makes cross-VPS networking trivial. No firewall rules, no port forwarding.
  • Different bot tokens. If using Discord on both, you need two bot applications. Same bot token = last-connect-wins.
  • Monitor the monitor. Check journalctl -u openclaw-health-monitor occasionally to make sure it's running.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Sendflare

通过 Sendflare SDK 发送带附件的电子邮件,管理联系人列表,支持 CC/BCC 和安全 API 认证。

Registry SourceRecently Updated
General

Playtomic - Book courts using padel-tui

This skill should be used when the user asks to "book a padel court", "find available padel courts", "search padel courts near me", "reserve a Playtomic cour...

Registry SourceRecently Updated
General

Fund Keeper

国内场外基金智能顾问 + 股票行情查询。实时估值、买卖建议、收益统计、定投计划、OCR 识图、股票 - 基金联动。支持离线模式、多数据源缓存。

Registry SourceRecently Updated