monitoring

Monitoring - Complete API Reference

Monitor system health, track errors, and receive alerts when issues occur.

Chat Commands

Service Control

/monitor start # Start monitoring /monitor stop # Stop monitoring /monitor status # Check monitoring status

Health Checks

/monitor health # Run health check /monitor health --verbose # Detailed health info /monitor providers # Check LLM provider status

Alerts

/monitor alerts # View recent alerts /monitor alerts --unread # Unread alerts only /monitor alert-targets # View alert destinations /monitor alert-targets add email <addr> # Add email target /monitor alert-targets add webhook <url> # Add webhook target /monitor alert-targets remove <id> # Remove target

Configuration

/monitor config # View config /monitor cooldown 300 # Set alert cooldown (seconds) /monitor threshold cpu 80 # Set CPU alert threshold /monitor threshold memory 90 # Set memory threshold

TypeScript API Reference

Create Monitoring Service

import { createMonitoringService } from 'clodds/monitoring';

const monitor = createMonitoringService({ // Health check interval intervalMs: 60000, // 1 minute

// Alert targets alertTargets: [ { type: 'email', address: 'alerts@example.com' }, { type: 'webhook', url: 'https://hooks.example.com/alerts' }, ],

// Alert cooldown (prevent spam) alertCooldownMs: 300000, // 5 minutes

// Thresholds thresholds: { cpu: 80, // Alert at 80% CPU memory: 90, // Alert at 90% memory errorRate: 10, // Alert at 10% error rate }, });

Start/Stop Monitoring

// Start monitoring await monitor.start();

// Check if running const isRunning = monitor.isRunning();

// Stop monitoring await monitor.stop();

Health Checks

// Run health check const health = await monitor.runHealthCheck();

console.log(Overall: ${health.status}); // 'healthy' | 'degraded' | 'unhealthy'

console.log('\nSystem:'); console.log( CPU: ${health.system.cpu}%); console.log( Memory: ${health.system.memory}%); console.log( Disk: ${health.system.disk}%);

console.log('\nProviders:'); for (const [name, status] of Object.entries(health.providers)) { console.log( ${name}: ${status.status} (${status.latencyMs}ms)); }

console.log('\nServices:'); for (const [name, status] of Object.entries(health.services)) { console.log( ${name}: ${status.status}); }

Provider Health

// Check LLM provider status const providers = await monitor.checkProviders();

for (const provider of providers) { console.log(${provider.name}:); console.log( Status: ${provider.status}); console.log( Latency: ${provider.latencyMs}ms); console.log( Last error: ${provider.lastError || 'none'}); console.log( Error rate: ${provider.errorRate}%); }

Alert Management

// Get recent alerts const alerts = await monitor.getAlerts({ limit: 10 });

for (const alert of alerts) { console.log([${alert.severity}] ${alert.title}); console.log( ${alert.message}); console.log( Time: ${alert.timestamp}); console.log( Acknowledged: ${alert.acknowledged}); }

// Acknowledge alert await monitor.acknowledgeAlert(alertId);

// Get unread count const unread = await monitor.getUnreadAlertCount();

Alert Targets

// Add alert target await monitor.addAlertTarget({ type: 'email', address: 'team@example.com', });

await monitor.addAlertTarget({ type: 'webhook', url: 'https://hooks.slack.com/...', });

// List targets const targets = monitor.getAlertTargets();

// Remove target await monitor.removeAlertTarget(targetId);

Event Handlers

// Listen for events monitor.on('alert', (alert) => { console.log(🚨 Alert: ${alert.title}); });

monitor.on('healthCheck', (health) => { if (health.status !== 'healthy') { console.log(⚠️ System ${health.status}); } });

monitor.on('providerDown', (provider) => { console.log(❌ Provider down: ${provider.name}); });

monitor.on('providerRecovered', (provider) => { console.log(✅ Provider recovered: ${provider.name}); });

Manual Alerts

// Send manual alert await monitor.sendAlert({ severity: 'warning', // 'info' | 'warning' | 'error' | 'critical' title: 'Custom Alert', message: 'Something important happened', metadata: { key: 'value' }, });

Alert Types

Type Trigger

provider_down LLM provider not responding

high_cpu CPU usage above threshold

high_memory Memory usage above threshold

high_error_rate Error rate above threshold

unhandled_exception Uncaught exception

unhandled_rejection Unhandled promise rejection

Configuration

// Update config monitor.configure({ intervalMs: 30000, alertCooldownMs: 600000, thresholds: { cpu: 85, memory: 95, errorRate: 5, }, });

Best Practices

Set appropriate thresholds - Avoid alert fatigue
Use cooldowns - Prevent alert spam
Multiple targets - Email + webhook for redundancy
Acknowledge alerts - Track what's been handled
Monitor providers - Know when APIs are down
Check health regularly - Don't just rely on alerts

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

binance-futures

trading-futures

tts

auto-reply