Business Heartbeat Monitor

Your agent runs heartbeats on a schedule. Each cycle is an opportunity to check on everything that matters and fix what's broken — before anyone notices.

This skill provides a complete HEARTBEAT.md framework for business operations monitoring.

Core HEARTBEAT.md Template

## Production Site Health (every heartbeat)
1. Check all production URLs for 200 status with expected content:
   - yoursite.com — expect 200 + contains "Your Brand"
   - app.yoursite.com — expect 200
   - api.yoursite.com/health — expect 200 + {"status":"ok"}
2. If any fail:
   - Attempt restart (via deployment redeploy, service restart, etc.)
   - If auto-recovery works: log to daily notes, move on
   - If auto-recovery fails: alert immediately on primary channel
3. Log all results to daily notes

## Service & Infrastructure Health (every heartbeat)
1. Check critical background services are running
   - Database connections responding
   - Queue workers processing
   - Cron jobs firing on schedule
2. If a service is down:
   - Attempt restart
   - If restart fails: alert with error details
3. If a process has been stalled for 2+ consecutive heartbeats:
   - Kill and restart automatically
   - Log the stall duration and any error output

## Support SLA Check (every heartbeat)
1. Scan inbox for unanswered customer messages
2. Flag any message older than your SLA threshold (e.g., 4 hours)
3. For routine questions (order status, how-to, account access):
   - Draft a response
   - Send if within Tier 1 autonomy, otherwise flag for review
4. For complex issues:
   - Summarize the problem
   - Flag for human review with recommended action

## Payment & Revenue Check (every heartbeat, summarize nightly)
1. Check for failed payments or declined charges
2. Flag any for retry or customer follow-up
3. Track daily revenue running total
4. Note any anomalies (unusual spikes, drops, large transactions)

## Background Process Health (every heartbeat)
1. Check for any listed active processes (tmux sessions, workers, agents)
2. For each: verify it's still running
3. If dead: restart automatically, log what happened
4. If stalled (same output for 2+ checks): kill and restart
5. If completed successfully: report completion, clean up

Nightly Deep Dive Template

Run once per day (e.g., 3 AM):

## Nightly Deep Dive (~3 AM)

### Revenue Review
1. Pull yesterday's complete revenue data
2. Break down by product/service
3. Compare to trailing 7-day and 30-day averages
4. Note what sold, what didn't, any patterns

### Day Review
1. What got done from today's plan?
2. What didn't get done and why?
3. What blocked progress?

### Tomorrow's Plan
1. Propose 3-5 concrete priorities ranked by expected impact
2. Each item should connect to a clear business goal
3. Include execution tasks and growth experiments
4. Write to tomorrow's daily notes file

### Health Summary
1. Uptime: percentage across all monitored services
2. Incidents: any outages or degradations, with resolution
3. Support: messages handled, average response time
4. Revenue: daily total, trend direction

Integration with Autonomy Ladder

The monitoring system works best when paired with clear autonomy tiers:

Monitor Finding	Autonomy Tier	Action
Site returning 500	Tier 1	Fix + log
Background process crashed	Tier 1	Restart + log
Support email >4 hours	Tier 1/2	Draft response, send if routine
Payment failed	Tier 2	Retry + report
Revenue anomaly	Tier 2	Investigate + report
Service architecture issue	Tier 3	Diagnose + propose fix
Customer escalation	Tier 3	Summarize + flag

Daily Notes Format

Each heartbeat should append to the day's notes:

## Heartbeat — 3:15 PM

### Health
- ✅ All sites returning 200
- ✅ All services healthy
- ⚠️ Worker queue backed up (47 pending, normally <10)
  - Cause: spike in API requests from new integration
  - Action: scaled workers from 2 → 4, queue clearing

### Support
- 2 new messages since last check
- Replied to order status inquiry (Tier 1, auto-sent)
- Flagged billing dispute for review (Tier 3)

### Revenue
- Running total: $342 (vs $289 yesterday at this time)

Setup Checklist

Copy the HEARTBEAT.md template above
Replace example URLs with your actual production endpoints
Set your SLA threshold (4 hours is a good default)
Configure alerting channel (Telegram, Discord, Slack, etc.)
Define which background processes to monitor
Set your nightly deep dive time (2-4 AM recommended)
Pair with the Autonomy Ladder for escalation rules

Tips

Start with site health only. Get that working reliably before adding inbox triage and revenue tracking. Each layer of monitoring adds complexity.
Don't alert on everything. The goal is signal, not noise. If your agent sends 15 "all clear" messages a day, you'll start ignoring them — and miss the one that matters.
Log everything, alert selectively. Every heartbeat result goes in daily notes. Only failures and anomalies get sent to your channel.
Review weekly. Check the daily notes for patterns. If the same service keeps restarting, that's not a monitoring success — it's an infrastructure problem to fix.

business-heartbeat-monitor

Safety Notice

Copy this and send it to your AI assistant to learn

Business Heartbeat Monitor

Core HEARTBEAT.md Template

Nightly Deep Dive Template

Integration with Autonomy Ladder

Daily Notes Format

Setup Checklist

Tips

Source Transparency

Related Skills

openEuler RPM Packaging

Tianyi Cloud Game

Ugc Ad Script Maker

Campaign Angle Spark