Monitoring

Set up observability for applications and infrastructure with metrics, logs, traces, and alerts.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Monitoring" with this command: npx skills add ivangdavila/monitoring

Complexity Levels

LevelToolsSetup TimeBest For
MinimalUptimeRobot, Healthchecks.io15 minSide projects, MVPs
StandardUptime Kuma, Sentry, basic Grafana1-2 hoursSmall teams, startups
ProfessionalPrometheus, Grafana, Loki, Alertmanager1-2 daysProduction systems
EnterpriseDatadog, New Relic, or full OSS stackOngoingLarge-scale operations

The Three Pillars

PillarWhat It AnswersTools
Metrics"How is the system performing?"Prometheus, Grafana, Datadog
Logs"What happened?"Loki, ELK, CloudWatch
Traces"Why is this request slow?"Jaeger, Tempo, Sentry

Quick Start by Use Case

"I just want to know if it's down" → UptimeRobot (free) or Uptime Kuma (self-hosted). See simple.md.

"I need to debug production errors" → Sentry with your framework SDK. 5-minute setup. See apm.md.

"I want real observability" → Prometheus + Grafana + Loki. See prometheus.md.

"I need to centralize logs" → Loki for simple, ELK for complex queries. See logs.md.

What to Monitor

Applications (RED Method)

  • Rate — requests per second
  • Errors — error rate by endpoint
  • Duration — latency (p50, p95, p99)

Infrastructure (USE Method)

  • Utilization — CPU, memory, disk usage
  • Saturation — queue depth, load average
  • Errors — hardware/system errors

Alerting Principles

DoDon't
Alert on symptoms (user impact)Alert on causes (CPU high)
Include runbook linkRequire investigation to understand
Set appropriate severityMake everything P1
Require actionAlert on "interesting" metrics

Alert fatigue kills monitoring. If alerts are ignored, you have no monitoring.

For alert configuration, severities, and on-call setup, see alerting.md.

Cost Comparison

SolutionMonthly Cost (small)Monthly Cost (medium)
UptimeRobotFree$7
Uptime Kuma$5 (VPS)$5 (VPS)
SentryFree / $26$80
Grafana CloudFree tier$50+
Datadog$15/host$23/host + features
Self-hosted stack$10-20 (VPS)$50-100 (VPS)

Common Mistakes

  • Starting with Prometheus/Grafana when Uptime Kuma would suffice
  • No alerting (dashboards nobody watches)
  • Too many alerts (alert fatigue → ignored)
  • Missing runbooks (alert fires, nobody knows what to do)
  • Not monitoring from outside (only internal checks)
  • Storing logs forever (cost explodes)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

yuqing-bitable-and-label

Incrementally sync data from XiaoAi API to Feishu Bitable and optionally auto-label records with machine-based type and sentiment annotations.

Registry SourceRecently Updated
General

张律师综合套装

张律师法律AI中台 - 中国首个开源法律AI技能库,涵盖刑事辩护、民商事诉讼、合同审查全流程

Registry SourceRecently Updated
General

刑事辩护

刑事辩护全流程AI助手 - 6大阶段21个模板,从接案到执行全覆盖

Registry SourceRecently Updated