observability

Query metrics, logs, and dashboards for diagnostics and incident response.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "observability" with this command: npx skills add 5dlabs/cto/5dlabs-cto-observability

Observability Tools

Query metrics, logs, and dashboards for diagnostics and incident response.

Prometheus (Metrics)

Query metrics for performance analysis and alerting.

CPU usage by pod

prometheus_query({ query: 'rate(container_cpu_usage_seconds_total{namespace="my-service"}[5m])' })

Memory usage

prometheus_query({ query: 'container_memory_usage_bytes{namespace="my-service"}' })

HTTP request rate

prometheus_query({ query: 'rate(http_requests_total{namespace="my-service"}[5m])' })

Error rate

prometheus_query({ query: 'rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])' })

Loki (Logs)

Query logs for debugging and incident investigation.

Application logs

loki_query({ query: '{namespace="my-service", app="api"} |= "error"', limit: 100 })

Structured log parsing

loki_query({ query: '{namespace="my-service"} | json | level="error"' })

Time-based filtering

loki_query({ query: '{namespace="my-service"}', start: "2024-01-01T00:00:00Z", end: "2024-01-01T01:00:00Z" })

Common Queries

Scenario Query Type Example

High latency Prometheus histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

Errors spike Loki {app="api"} |= "error" | json | count by (error_type)

Memory leak Prometheus container_memory_usage_bytes{pod=~"api.*"}

Failed requests Loki {app="api"} | json | status >= 500

Incident Response Flow

  • Check alerts - What triggered?

  • Query metrics - Is it resource exhaustion?

  • Query logs - What errors are occurring?

  • Correlate - Match timestamps across metrics and logs

  • Identify root cause - Database? Network? Code bug?

Best Practices

  • Start broad, then narrow - Filter down to specific pods

  • Use time ranges - Don't query unbounded

  • Correlate metrics + logs - Same time window

  • Check dashboard first - Grafana may have pre-built views

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

expo-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

elysia-llm-docs

No summary provided by upstream source.

Repository SourceNeeds Review
General

better-auth-expo

No summary provided by upstream source.

Repository SourceNeeds Review