observability

Query metrics, logs, and dashboards for diagnostics and incident response.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "observability" with this command: npx skills add 5dlabs/cto/5dlabs-cto-observability

Observability Tools

Query metrics, logs, and dashboards for diagnostics and incident response.

Prometheus (Metrics)

Query metrics for performance analysis and alerting.

CPU usage by pod

prometheus_query({ query: 'rate(container_cpu_usage_seconds_total{namespace="my-service"}[5m])' })

Memory usage

prometheus_query({ query: 'container_memory_usage_bytes{namespace="my-service"}' })

HTTP request rate

prometheus_query({ query: 'rate(http_requests_total{namespace="my-service"}[5m])' })

Error rate

prometheus_query({ query: 'rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])' })

Loki (Logs)

Query logs for debugging and incident investigation.

Application logs

loki_query({ query: '{namespace="my-service", app="api"} |= "error"', limit: 100 })

Structured log parsing

loki_query({ query: '{namespace="my-service"} | json | level="error"' })

Time-based filtering

loki_query({ query: '{namespace="my-service"}', start: "2024-01-01T00:00:00Z", end: "2024-01-01T01:00:00Z" })

Common Queries

Scenario Query Type Example

High latency Prometheus histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

Errors spike Loki {app="api"} |= "error" | json | count by (error_type)

Memory leak Prometheus container_memory_usage_bytes{pod=~"api.*"}

Failed requests Loki {app="api"} | json | status >= 500

Incident Response Flow

Check alerts - What triggered?
Query metrics - Is it resource exhaustion?
Query logs - What errors are occurring?
Correlate - Match timestamps across metrics and logs
Identify root cause - Database? Network? Code bug?

Best Practices

Start broad, then narrow - Filter down to specific pods
Use time ranges - Don't query unbounded
Correlate metrics + logs - Same time window
Check dashboard first - Grafana may have pre-built views

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open in GitHub Open in ClawHub

Related Skills

Related by shared tags or category signals.

General

expo-patterns

No summary provided by upstream source.

Repository SourceNeeds Review

-5

General

elysia-llm-docs

No summary provided by upstream source.

Repository SourceNeeds Review

-5

General

better-auth-expo

No summary provided by upstream source.

Repository SourceNeeds Review

-5