OpenClaw Network Diagnostics

Overview

Run a pure network diagnostic worker from CLI to continuously monitor connectivity between:

OpenClaw runtime host
Telegram Bot API (api.telegram.org)
Personal Telegram client approximation via delivery verification cycles

Keep diagnostics isolated from OpenClaw LLM flow:

Use no LLM calls.
Consume no AI tokens.
Run in independent async worker loops.

Skill Files

scripts/netdiag.py: standalone CLI worker (run/start/stop/status/validate-config)
references/config.example.json: complete example configuration
references/example-log-entries.jsonl: sample structured JSON logs
references/openclaw-integration.md: integration patterns with pros/cons
references/ai-log-analysis.md: workflow for later AI-based log analysis

Prerequisites

Install and verify:

Python 3.11+
macOS networking tools: dig, ping, traceroute
Telegram bot token and personal chat id

Install

From skill root:

cd /Users/ivanbelugin/Documents/Connection\ Monitoring\ System/openclaw-network-diagnostics
python3 scripts/netdiag.py validate-config --config references/config.example.json

Create a real config file from the example and set real credentials:

cp references/config.example.json config.json

Then edit config.json:

telegram.bot_token
telegram.personal_chat_id

Run Model

Foreground mode (manual stop via Ctrl+C)

python3 scripts/netdiag.py run --config config.json --pid-file ./logs/netdiag.pid

Behavior:

Start manually from CLI.
Run continuously until manual stop.
Print JSON summary to stdout on stop.
Save summary to logging.summary_file_path.

Background mode (non-blocking service)

python3 scripts/netdiag.py start --config config.json --pid-file ./logs/netdiag.pid
python3 scripts/netdiag.py status --pid-file ./logs/netdiag.pid
python3 scripts/netdiag.py stop --pid-file ./logs/netdiag.pid

Use this mode to avoid blocking OpenClaw main thread.

Monitoring Behavior

Every intervals_sec.ping (default 30s) perform active cycle:

Resolve DNS with TTL snapshot (system + public resolvers).
Send Bot API probe (getMe) and measure round-trip latency.
Run delivery verification cycle (sendMessage + selected ack mode).
Run packet-loss probe (ping) and log packet loss indicators.
Update outage/recovery and anomaly counters.

Additional periodic diagnostics:

Traceroute (intervals_sec.traceroute)
MTU discovery via DF ping binary search (intervals_sec.mtu_test)
DNS re-resolution (intervals_sec.dns_reresolve)

Delivery Verification Modes

Set delivery_verification.mode:

bot_api_ack (default)

Confirm only Bot API acceptance (sendMessage success).
Lowest overhead.
Does not prove handset render/read.

user_reply_ack

Wait for user reply via getUpdates.
Better approximation of “message reached client and user interacted”.
Requires manual interaction.

callback_ack

Send inline button and wait callback query.
Structured acknowledgement event.
Requires button tap.

Read confirmation note:

Telegram Bot API does not expose direct read receipts for bot messages.
user_reply_ack/callback_ack are practical approximations.

Default Tuning (Recommended)

timeouts_ms.connect: 4000
timeouts_ms.request: 10000
retry.max_retries: 2
retry.backoff_base_ms: 500
diagnostics.latency_anomaly_threshold_ms: 1200

Rationale:

Catch transient failures without hiding persistent outages.
Limit retry storm risk during throttling/rate-limit events.

Logging Model

Write JSON lines to rotating files with total budget cap.

Required fields are present in every record:

millisecond UTC timestamp
source/destination ip + ports
dns result snapshot (with TTL)
tls metadata (version, cipher, handshake duration, session reuse heuristic)
http request/response headers and status
payload bytes sent/received
round-trip latency
tcp state
retries/timeouts/socket errors
packet-loss indicator (when probe executed)
connection reset flag
rate-limit metadata
exception stacktrace

Log rotation:

per-file size: logging.max_file_size_mb
total cap: logging.max_total_size_mb (set 500 for your requirement)

Sensitive data handling:

enable/disable redaction via logging.redact_sensitive_fields

OpenClaw Integration Options

Option A: External process (recommended)

Use start/stop/status commands from OpenClaw task hooks.

Pros:

Strong isolation from OpenClaw runtime
Non-blocking by design
Independent restart and fault boundaries

Cons:

Requires pid-file lifecycle

Option B: In-process task

Import and run the worker inside OpenClaw loop.

Pros:

Single-process deployment

Cons:

Faults can impact OpenClaw main runtime
Weaker isolation for network-heavy diagnostics

Use Option A by default for production monitoring.

Stop and Summary

On manual stop (SIGINT/SIGTERM) the worker:

Flushes final metrics
Prints summary JSON to stdout
Saves summary JSON to logging.summary_file_path

Summary fields:

total runtime
total pings
failed pings
average latency
max latency
connection drops
dns changes detected
mtu changes detected
anomaly count

Analyze Logs Later with AI Tools

Use references/ai-log-analysis.md.

Recommended flow:

Slice incident window from logs/netdiag.jsonl
Compute quick counters locally
Feed focused window + summary into ChatGPT Codex with structured prompts
Ask for timeline, root-cause segmentation, anomaly clusters, and config recommendations