Job Execution Monitor

# Job Execution Monitor

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Job Execution Monitor" with this command: npx skills add tradmangh/job-execution-monitor

Job Execution Monitor

Monitor scheduled jobs (cron) and alert when they fail or miss their schedule.

Install / Update (ClawHub)

Install:

clawhub install job-execution-monitor

Update:

clawhub update job-execution-monitor

When to Use

Use when the user asks to:

  • "monitor cron jobs"
  • "alert on job failures"
  • "check if jobs ran"
  • "job healthcheck"
  • "task surveillance"
  • "why didn't my scheduled job run?"

What It Does

Heartbeat-based monitoring:

  • Agent checks jobs during periodic heartbeat polls (~every 60min)
  • Uses cheapest LLM (Gemini Flash or Haiku) to minimize cost
  • Only sends alerts when problems detected
  • Tracks alert state to avoid spam

Cost: ~100k tokens/day (~48 checks × 2k tokens)

How It Works

  1. Configuration: Jobs to monitor are listed in job-execution-monitor.json
  2. Heartbeat check: Agent wakes every 15-30min, checks jobs every ~4th wake
  3. Detection: Compares last run time vs. expected schedule + tolerance
  4. Alert: Wake event with context (expected schedule vs last run) → you decide next action
  5. Recovery: Clears alert when job runs successfully again

Disable / Uninstall

If installed via systemd user timer

systemctl --user stop openclaw-job-execution-monitor.timer
systemctl --user disable openclaw-job-execution-monitor.timer
systemctl --user daemon-reload

# Optional: remove unit files
rm -f ~/.config/systemd/user/openclaw-job-execution-monitor.service \
      ~/.config/systemd/user/openclaw-job-execution-monitor.timer

If installed via cron

crontab -l | sed '/job-execution-monitor\/scripts\/healthcheck\.sh/d' | crontab -

Optional cleanup (config/state/log)

rm -f ~/.openclaw/workspace/job-execution-monitor.json
rm -f ~/.openclaw/workspace/.job-execution-monitor-state.json
rm -f ~/.openclaw/workspace/job-execution-monitor.log

Configuration

File: ~/.openclaw/workspace/job-execution-monitor.json

{
  "checkIntervalMin": 60,
  "jobs": {
    "Daily 21:00 journaling (projects + accomplishments + next day plan)": {
      "schedule": "0 22 * * *",
      "tolerance": 600,
      "critical": true,
      "expectedMinLength": 200,
      "errorPatterns": ["error", "failed", "Pong", "token overflow"]
    }
  }
}

Fields:

  • schedule: cron expression for expected run time
  • tolerance: grace period in seconds (default 600 = 10min)
  • critical: if true, alerts immediately (future: could escalate)
  • expectedMinLength: minimum response length (phase 2)
  • errorPatterns: text patterns that indicate failure (phase 2)

State Tracking

File: ~/.openclaw/workspace/.job-execution-monitor-state.json

{
  "lastCheck": 1771025000,
  "alerts": {
    "daily-wrap-up_missed": 1771024500
  }
}
  • lastCheck: unix timestamp of last check
  • alerts: map of alert_key → timestamp (prevents spam)

Instructions (for Agent)

In HEARTBEAT.md:

## Job Execution Monitor (every ~60min, rotate)

Check cron jobs for missed schedules. Only alert if problem found.

**Instructions:**
1. Load `job-execution-monitor.json` config
2. Call `cron list` 
3. For each job in config:
   - Extract `state.lastRunAtMs` and `state.lastStatus`
   - Parse schedule (e.g., "0 22 * * *" = 22:00 daily)
   - If last run > (expected time + tolerance): **ALERT**
   - If last run recent: **SILENT**
4. On alert: send wake event with job name, expected time, last run time
5. On recovery (was alerting before, now OK): send recovery wake event

**State tracking:** `~/.openclaw/workspace/.job-execution-monitor-state.json`
- Track which jobs already alerted (don't spam)
- Clear alert flag when job recovers

**Rotate check:** Only run every ~4th heartbeat (once/hour if heartbeat is 15min)

Examples

Scenario 1: Job missed

🔴 Job Execution Monitor: "daily-wrap-up" missed schedule
Expected: 22:00 ±10min
Last run: 5h 32m ago
Checking logs...

Scenario 2: Job recovered

✅ Job Execution Monitor: "daily-wrap-up" recovered
Last run: 22:02 (2min ago)

Scenario 3: All OK

(silent - no wake event, no alert)

Cost Analysis

Per check (~2k tokens):

  • Load config: ~200 tokens
  • Call cron list: ~500 tokens
  • Parse + compare: ~500 tokens
  • Decision + response: ~800 tokens

Daily (~48 checks):

  • 48 × 2k = ~100k tokens/day
  • Using Gemini Flash: ~$0.01/day

Compared to alternatives:

  • Bash script every 10min: 0 tokens, but complex + fragile
  • Cron job every 10min: 144 × 2k = ~300k tokens/day
  • Heartbeat every 60min: ~100k tokens/day ← chosen ✅

Files

  • SKILL.md - This documentation
  • README.md - Quick start
  • config/job-execution-monitor.example.json - Config template
  • scripts/patterns.json - Error patterns (phase 2)
  • ~/workspace/job-execution-monitor.json - User config
  • ~/workspace/.job-execution-monitor-state.json - Alert state

Philosophy

Smart monitoring, minimal cost.

  • Check frequency: ~1/hour (good enough for daily jobs)
  • Use cheapest LLM: Gemini Flash or Haiku
  • Only wake main session when real problems occur
  • State tracking prevents alert spam
  • Agent-based = leverages existing tools, no auth/API hassles

Phase 1 complete.
Phase 2 (pattern matching): Coming soon.
Phase 3 (structured validation): Future.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

地藏经药师经智慧

地藏经药师经智慧 - 佛家孝道与救度思想,涵盖地藏本愿、药师十二愿、因果报应、消灾延寿等核心智慧,适用于道德修养、慈悲精神、身心健康

Registry SourceRecently Updated
General

Precision Oncology Zhcn

综合学术文献、流行病学报告、临床与药物指南及临床试验报告,提供关于癌症及其治疗的报告。 基于癌变机制进行详细的分子生物学和组织学分析。 当查询涉及以下内容时加载本技能: - 癌症或肿瘤 - 癌变机制 - 癌症或肿瘤的治疗 典型查询 - 乳腺癌是如何发生的? - 白血病的一线和二线治疗 - CAR-T 疗法治疗胰腺...

Registry SourceRecently Updated
General

hermes-traffic-guardian

Hermes runtime traffic monitoring baseline for opt-in proxy inspection, egress detection, and attestation-aware traffic posture.

Registry SourceRecently Updated
General

Scp Paradigm

Use when analyzing how industry structure drives firm behavior and market performance, assessing market concentration, entry barriers, or competitive dynamic...

Registry SourceRecently Updated