OpenClaw Network Diagnostics
Overview
Run a pure network diagnostic worker from CLI to continuously monitor connectivity between:
- OpenClaw runtime host
- Telegram Bot API (
api.telegram.org) - Personal Telegram client approximation via delivery verification cycles
Keep diagnostics isolated from OpenClaw LLM flow:
- Use no LLM calls.
- Consume no AI tokens.
- Run in independent async worker loops.
Skill Files
scripts/netdiag.py: standalone CLI worker (run/start/stop/status/validate-config)references/config.example.json: complete example configurationreferences/example-log-entries.jsonl: sample structured JSON logsreferences/openclaw-integration.md: integration patterns with pros/consreferences/ai-log-analysis.md: workflow for later AI-based log analysis
Prerequisites
Install and verify:
- Python
3.11+ - macOS networking tools:
dig,ping,traceroute - Telegram bot token and personal chat id
Install
From skill root:
cd /Users/ivanbelugin/Documents/Connection\ Monitoring\ System/openclaw-network-diagnostics
python3 scripts/netdiag.py validate-config --config references/config.example.json
Create a real config file from the example and set real credentials:
cp references/config.example.json config.json
Then edit config.json:
telegram.bot_tokentelegram.personal_chat_id
Run Model
Foreground mode (manual stop via Ctrl+C)
python3 scripts/netdiag.py run --config config.json --pid-file ./logs/netdiag.pid
Behavior:
- Start manually from CLI.
- Run continuously until manual stop.
- Print JSON summary to stdout on stop.
- Save summary to
logging.summary_file_path.
Background mode (non-blocking service)
python3 scripts/netdiag.py start --config config.json --pid-file ./logs/netdiag.pid
python3 scripts/netdiag.py status --pid-file ./logs/netdiag.pid
python3 scripts/netdiag.py stop --pid-file ./logs/netdiag.pid
Use this mode to avoid blocking OpenClaw main thread.
Monitoring Behavior
Every intervals_sec.ping (default 30s) perform active cycle:
- Resolve DNS with TTL snapshot (system + public resolvers).
- Send Bot API probe (
getMe) and measure round-trip latency. - Run delivery verification cycle (
sendMessage+ selected ack mode). - Run packet-loss probe (
ping) and log packet loss indicators. - Update outage/recovery and anomaly counters.
Additional periodic diagnostics:
- Traceroute (
intervals_sec.traceroute) - MTU discovery via DF ping binary search (
intervals_sec.mtu_test) - DNS re-resolution (
intervals_sec.dns_reresolve)
Delivery Verification Modes
Set delivery_verification.mode:
bot_api_ack(default)
- Confirm only Bot API acceptance (
sendMessagesuccess). - Lowest overhead.
- Does not prove handset render/read.
user_reply_ack
- Wait for user reply via
getUpdates. - Better approximation of “message reached client and user interacted”.
- Requires manual interaction.
callback_ack
- Send inline button and wait callback query.
- Structured acknowledgement event.
- Requires button tap.
Read confirmation note:
- Telegram Bot API does not expose direct read receipts for bot messages.
user_reply_ack/callback_ackare practical approximations.
Default Tuning (Recommended)
timeouts_ms.connect:4000timeouts_ms.request:10000retry.max_retries:2retry.backoff_base_ms:500diagnostics.latency_anomaly_threshold_ms:1200
Rationale:
- Catch transient failures without hiding persistent outages.
- Limit retry storm risk during throttling/rate-limit events.
Logging Model
Write JSON lines to rotating files with total budget cap.
Required fields are present in every record:
- millisecond UTC timestamp
- source/destination ip + ports
- dns result snapshot (with TTL)
- tls metadata (version, cipher, handshake duration, session reuse heuristic)
- http request/response headers and status
- payload bytes sent/received
- round-trip latency
- tcp state
- retries/timeouts/socket errors
- packet-loss indicator (when probe executed)
- connection reset flag
- rate-limit metadata
- exception stacktrace
Log rotation:
- per-file size:
logging.max_file_size_mb - total cap:
logging.max_total_size_mb(set500for your requirement)
Sensitive data handling:
- enable/disable redaction via
logging.redact_sensitive_fields
OpenClaw Integration Options
Option A: External process (recommended)
Use start/stop/status commands from OpenClaw task hooks.
Pros:
- Strong isolation from OpenClaw runtime
- Non-blocking by design
- Independent restart and fault boundaries
Cons:
- Requires pid-file lifecycle
Option B: In-process task
Import and run the worker inside OpenClaw loop.
Pros:
- Single-process deployment
Cons:
- Faults can impact OpenClaw main runtime
- Weaker isolation for network-heavy diagnostics
Use Option A by default for production monitoring.
Stop and Summary
On manual stop (SIGINT/SIGTERM) the worker:
- Flushes final metrics
- Prints summary JSON to stdout
- Saves summary JSON to
logging.summary_file_path
Summary fields:
- total runtime
- total pings
- failed pings
- average latency
- max latency
- connection drops
- dns changes detected
- mtu changes detected
- anomaly count
Analyze Logs Later with AI Tools
Use references/ai-log-analysis.md.
Recommended flow:
- Slice incident window from
logs/netdiag.jsonl - Compute quick counters locally
- Feed focused window + summary into ChatGPT Codex with structured prompts
- Ask for timeline, root-cause segmentation, anomaly clusters, and config recommendations