pilot-heartbeat-monitor

Detect agent failures and trigger automatic task redistribution or re-election. Use this skill when: 1. You need to detect when swarm members become unreachable 2. You want to trigger failover actions on agent failure 3. You need health monitoring for load balancing or leader election Do NOT use this skill when: - Agents can safely fail without recovery (fire-and-forget tasks) - Network partitions are rare and acceptable (use simpler ping checks)

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pilot-heartbeat-monitor" with this command: npx skills add vulture-labs/pilot-heartbeat-monitor

pilot-heartbeat-monitor

Monitor agent health via periodic heartbeats and detect failures using timeout-based detection.

Commands

Send heartbeat to peers

pilotctl --json publish "registry-hostname" "heartbeat:$SWARM_NAME" \
  --data "{\"agent\":\"$AGENT_ID\",\"timestamp\":\"$(date -u +%s)\",\"status\":\"alive\"}"

Detect failed agents

TIMEOUT=30
CURRENT_TIME=$(date -u +%s)

FAILED_AGENTS=$(pilotctl --json inbox \
  | jq --arg now "$CURRENT_TIME" --arg timeout "$TIMEOUT" \
    '[.messages[] | select(.topic == "heartbeat:'$SWARM_NAME'") | {agent: .payload.agent, last_seen: .payload.timestamp}]
    | group_by(.agent)
    | map(select(($now | tonumber) - (map(.last_seen) | max) > ($timeout | tonumber)))
    | .[].agent')

Verify failure with direct ping

for agent in $FAILED_AGENTS; do
  AGENT_ADDR=$(pilotctl --json peers | jq -r '.[] | select(.node_id == "'$agent'") | .address')

  PING_RESULT=$(pilotctl --json ping "$AGENT_ADDR" --count 3 --timeout 2s)
  LOSS=$(echo "$PING_RESULT" | jq -r '.packet_loss_pct')

  if [ "$LOSS" = "100" ]; then
    echo "Agent $agent CONFIRMED DOWN"
  fi
done

Workflow Example

Health monitor for worker pool with automatic task redistribution:

#!/bin/bash
SWARM_NAME="worker-pool"
HEARTBEAT_INTERVAL=5
FAILURE_TIMEOUT=15
REGISTRY_HOST="registry.example.com"

# Background: Send own heartbeats
(
  while true; do
    pilotctl --json publish "$REGISTRY_HOST" "heartbeat:$SWARM_NAME" \
      --data "{\"agent\":\"$AGENT_ID\",\"timestamp\":\"$(date -u +%s)\"}"
    sleep $HEARTBEAT_INTERVAL
  done
) &

# Monitor peer heartbeats
while true; do
  CURRENT_TIME=$(date -u +%s)

  # Detect timeouts and trigger recovery
  # ...

  sleep $HEARTBEAT_INTERVAL
done

Dependencies

Requires pilot-protocol skill, jq, and bc.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Jarvis Core

主动智能助手核心技能 v3.0,从工具型助手升级为有灵魂的伙伴。整合人格一致性、主动思考、情绪感知、置信度透明、心跳跟进、记忆学习闭环,以及策略追踪、关系模式命名、多角色交叉分析和情绪历史感知。

Registry SourceRecently Updated
1.7K1Profile unavailable
Automation

Agent Heartbeat

Unified heartbeat system for OpenClaw agents. Runs parallel health checks, data collectors, and state monitors in one command. Returns a single actionable su...

Registry SourceRecently Updated
1910Profile unavailable
Automation

一步完成进化

Use when you need to stand up or standardize a fresh OpenClaw setup as the Fire Dragon Fruit Architecture: one strong main, one isolated rescue, layered file...

Registry SourceRecently Updated
2650Profile unavailable
Automation

Ghosting-Free Dating. 消失不回。Anti-ghosting.

Ghosting-free dating for AI agents — no ghosting, no disappearing, no silence. Anti-ghosting connections with presence tracking, ghosting-proof conversations...

Registry SourceRecently Updated
950Profile unavailable