observability-alert-manager

Configure Grafana alerts for Claude Code anomalies and thresholds. Use when setting up monitoring alerts for sessions, errors, context usage, or subagents.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "observability-alert-manager" with this command: npx skills add adaptationio/skrillz/adaptationio-skrillz-observability-alert-manager

Observability Alert Manager

Configure and manage Grafana alerts for Claude Code monitoring using enhanced telemetry.

Data Source

Primary: {job="claude_code_enhanced"} in Loki

Operations

create-alert

Define new alert rule. Parameters: name, query (LogQL), threshold, duration, severity, notification.

list-alerts

Show all configured alerts and their status.

test-alert

Simulate alert conditions.

delete-alert

Remove alert rule.

Pre-built Alert Templates

Session Alerts

  1. Long Session Duration: Session >1 hour

    {job="claude_code_enhanced", event_type="session_end"} | json | duration_seconds > 3600
    
  2. High Turn Count: Session >50 turns

    {job="claude_code_enhanced", event_type="session_end"} | json | turn_count > 50
    
  3. Session Error Spike: >5 errors in session

    {job="claude_code_enhanced", event_type="session_end"} | json | error_count > 5
    

Error Alerts

  1. High Error Rate: >5 errors/hour

    count_over_time({job="claude_code_enhanced", event_type="tool_result", status="error"} [1h]) > 5
    
  2. Specific Tool Failures: Bash errors

    count_over_time({job="claude_code_enhanced", event_type="tool_result", status="error", tool="Bash"} [1h]) > 3
    

Context Alerts

  1. High Context Usage: >80% context window

    {job="claude_code_enhanced", event_type="context_utilization"} | json | context_percentage > 80
    
  2. Auto Compaction Triggered: Context full

    {job="claude_code_enhanced", event_type="context_compact", trigger="auto"}
    

Subagent Alerts

  1. Excessive Subagent Spawning: >10 subagents/session
    {job="claude_code_enhanced", event_type="session_end"} | json | subagents_spawned > 10
    

Activity Alerts

  1. Telemetry Staleness: No data >10min

    absent_over_time({job="claude_code_enhanced"} [10m])
    
  2. Unusual Activity Spike: >100 tool calls/hour

    count_over_time({job="claude_code_enhanced", event_type="tool_call"} [1h]) > 100
    

Prompt Pattern Alerts

  1. Debugging Session Spike: Many debugging prompts
    count_over_time({job="claude_code_enhanced", event_type="user_prompt", pattern="debugging"} [1h]) > 10
    

Example Alert Configurations

Create High Error Rate Alert

create-alert \
  --name "High Error Rate" \
  --query 'count_over_time({job="claude_code_enhanced", event_type="tool_result", status="error"} [1h]) > 5' \
  --severity warning \
  --notification slack

Create Context Usage Alert

create-alert \
  --name "High Context Usage" \
  --query '{job="claude_code_enhanced", event_type="context_utilization"} | json | context_percentage > 80' \
  --severity info \
  --notification email

Create Session Duration Alert

create-alert \
  --name "Long Session Warning" \
  --query '{job="claude_code_enhanced", event_type="session_end"} | json | duration_seconds > 3600' \
  --severity info \
  --notification dashboard

Grafana Alert Setup

Via Grafana UI

  1. Navigate to Alerting → Alert rules
  2. Create new rule with Loki data source
  3. Enter LogQL query from templates above
  4. Configure conditions and notifications

Via API

curl -X POST http://localhost:3000/api/ruler/grafana/api/v1/rules/claude-code \
  -H "Content-Type: application/json" \
  -u admin:admin \
  -d '{
    "name": "claude-code-alerts",
    "rules": [
      {
        "alert": "HighErrorRate",
        "expr": "count_over_time({job=\"claude_code_enhanced\", status=\"error\"} [1h]) > 5",
        "for": "5m",
        "labels": {"severity": "warning"},
        "annotations": {"summary": "High error rate detected"}
      }
    ]
  }'

Notification Channels

  • Slack: Webhook integration
  • Email: SMTP configuration
  • PagerDuty: Incident management
  • Dashboard: On-screen annotations

Alert Severity Levels

LevelUse Case
criticalImmediate action required
warningNeeds attention soon
infoInformational, no action needed

Scripts

  • scripts/create-alert.sh - Create new alert
  • scripts/list-alerts.sh - List all alerts
  • scripts/test-alerts.sh - Test alert conditions
  • scripts/import-alert-templates.sh - Import all pre-built templates

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

supabase-cli

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

task-development

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

multi-ai-code-review

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

codex-cli

No summary provided by upstream source.

Repository SourceNeeds Review