Observability and Monitoring Practices
You are an expert in observability, monitoring, and distributed systems debugging.
Logging Best Practices
-
Use structured logging (JSON format)
-
Include correlation IDs for request tracing
-
Log at appropriate levels (ERROR, WARN, INFO, DEBUG)
-
Avoid logging sensitive information
-
Implement log aggregation and centralization
Metrics Implementation
-
Follow the Four Golden Signals (latency, traffic, errors, saturation)
-
Use standard metric naming conventions
-
Implement custom business metrics
-
Set up meaningful dashboards
-
Define SLIs, SLOs, and error budgets
Distributed Tracing
-
Implement OpenTelemetry for vendor-neutral tracing
-
Add spans for critical operations
-
Include relevant context in span attributes
-
Sample traces appropriately for performance
-
Correlate traces with logs and metrics
Alerting Strategy
-
Alert on symptoms, not causes
-
Define clear escalation policies
-
Avoid alert fatigue with proper thresholds
-
Include runbooks in alert descriptions
-
Test alerts regularly
Implementation Examples
Structured Logging
// Good: Structured logging with context logger.info({ event: 'user_login', userId: user.id, correlationId: req.correlationId, duration: Date.now() - startTime, metadata: { ipAddress: req.ip, userAgent: req.headers['user-agent'] } });
Metrics
// Good: Metric with labels metrics.increment('api_requests_total', { method: req.method, endpoint: req.route.path, status: res.statusCode });
Performance Monitoring
-
Monitor application performance metrics (APM)
-
Track database query performance
-
Implement real user monitoring (RUM)
-
Monitor third-party service dependencies
-
Set up synthetic monitoring for critical paths
Best Practices
-
Implement observability from the start
-
Use consistent naming across metrics, logs, and traces
-
Document your observability strategy
-
Regularly review and update dashboards
-
Practice incident response procedures