log-dive

Unified log search across Loki, Elasticsearch, and CloudWatch. Natural language queries translated to LogQL, ES DSL, or CloudWatch filter patterns. Read-only. Never modifies or deletes logs.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "log-dive" with this command: npx skills add tkuehnl/log-dive

Log Dive — Unified Log Search 🤿

Search logs across Loki, Elasticsearch/OpenSearch, and AWS CloudWatch from a single interface. Ask in plain English; the skill translates to the right query language.

⚠️ Sensitive Data Warning: Logs frequently contain PII, secrets, tokens, passwords, and other sensitive data. Never cache, store, or repeat raw log content beyond the current conversation. Treat all log output as confidential.

Activation

This skill activates when the user mentions:

  • "search logs", "find in logs", "log search", "check the logs"
  • "Loki", "LogQL", "logcli"
  • "Elasticsearch logs", "Kibana", "OpenSearch"
  • "CloudWatch logs", "AWS logs", "log groups"
  • "error logs", "find errors", "what happened in [service]"
  • "tail logs", "follow logs", "live logs"
  • "log backends", "which log sources", "log indices", "log labels"
  • Incident triage involving log analysis
  • "log-dive" explicitly

Permissions

permissions:
  exec: true          # Required to run backend scripts
  read: true          # Read script files
  write: false        # Never writes files — logs may contain secrets
  network: true       # Queries remote log backends

Example Prompts

  1. "Find error logs from the checkout service in the last 30 minutes"
  2. "Search for timeout exceptions across all services"
  3. "What log backends do I have configured?"
  4. "List available log indices in Elasticsearch"
  5. "Show me the labels available in Loki"
  6. "Tail the payment-service logs"
  7. "Find all 5xx errors in CloudWatch for api-gateway"
  8. "Correlate errors between user-service and payment-service"
  9. "What happened in production between 2pm and 3pm today?"

Backend Configuration

Each backend uses environment variables. Users may have one, two, or all three configured.

Loki

VariableRequiredDescription
LOKI_ADDRYesLoki server URL (e.g., http://loki.internal:3100)
LOKI_TOKENNoBearer token for authentication
LOKI_TENANT_IDNoMulti-tenant header (X-Scope-OrgID)

Elasticsearch / OpenSearch

VariableRequiredDescription
ELASTICSEARCH_URLYesBase URL (e.g., https://es.internal:9200)
ELASTICSEARCH_TOKENNoBasic <base64> or Bearer <token> for auth

AWS CloudWatch Logs

VariableRequiredDescription
AWS_PROFILE or AWS_ACCESS_KEY_IDYesStandard AWS credentials
AWS_REGIONYesAWS region for CloudWatch

Agent Workflow

Follow this sequence:

Step 1: Check Backends

Run the backends check to see what's configured:

bash <skill_dir>/scripts/log-dive.sh backends

Parse the JSON output. If no backends are configured, tell the user which environment variables to set.

Step 2: Translate the User's Query

This is the critical step. Convert the user's natural language request into the appropriate backend-specific query. Use the query language reference below.

For ALL backends, pass the query through the dispatcher:

# Search across all configured backends
bash <skill_dir>/scripts/log-dive.sh search --query '<QUERY>' [OPTIONS]

# Search a specific backend
bash <skill_dir>/scripts/log-dive.sh search --backend loki --query '{app="checkout"} |= "error"' --since 30m --limit 200

bash <skill_dir>/scripts/log-dive.sh search --backend elasticsearch --query '{"query":{"bool":{"must":[{"match":{"message":"error"}},{"match":{"service":"checkout"}}]}}}' --index 'app-logs-*' --since 30m --limit 200

bash <skill_dir>/scripts/log-dive.sh search --backend cloudwatch --query '"ERROR" "checkout"' --log-group '/ecs/checkout-service' --since 30m --limit 200

Step 3: List Available Targets

Before searching, you may need to discover what's available:

# Loki: list labels and label values
bash <skill_dir>/scripts/log-dive.sh labels --backend loki
bash <skill_dir>/scripts/log-dive.sh labels --backend loki --label app

# Elasticsearch: list indices
bash <skill_dir>/scripts/log-dive.sh indices --backend elasticsearch

# CloudWatch: list log groups
bash <skill_dir>/scripts/log-dive.sh indices --backend cloudwatch

Step 4: Tail Logs (Live Follow)

bash <skill_dir>/scripts/log-dive.sh tail --backend loki --query '{app="checkout"}'
bash <skill_dir>/scripts/log-dive.sh tail --backend cloudwatch --log-group '/ecs/checkout-service'

Tail runs for a limited time (default 30s) and streams results.

Step 5: Analyze Results

After receiving log output, you MUST:

  1. Identify unique error types — group similar errors, count occurrences
  2. Find the root cause — look for the earliest error, trace dependency chains
  3. Correlate across services — if errors in service A mention service B, note the dependency
  4. Build a timeline — order events chronologically
  5. Summarize actionably — "The checkout service started returning 500s at 14:23 because the database connection pool was exhausted (max 10 connections, 10 in use). The pool exhaustion was triggered by a slow query in the inventory service."

NEVER dump raw log output to the user. Always summarize, extract patterns, and present structured findings.

Discord v2 Delivery Mode (OpenClaw v2026.2.14+)

When the conversation is happening in a Discord channel:

  • Send a compact incident summary first (backend, query intent, top error types, root-cause hypothesis), then ask if the user wants full detail.
  • Keep the first response under ~1200 characters and avoid dumping raw log lines in the first message.
  • If Discord components are available, include quick actions:
    • Show Error Timeline
    • Show Top Error Patterns
    • Run Related Service Query
  • If components are not available, provide the same follow-ups as a numbered list.
  • Prefer short follow-up chunks (<=15 lines per message) when sharing timelines or grouped findings.

Query Language Reference

LogQL (Loki)

LogQL has two parts: a stream selector and a filter pipeline.

Stream selectors:

{app="myapp"}                          # exact match
{namespace="prod", app=~"api-.*"}      # regex match
{app!="debug"}                         # negative match

Filter pipeline (chained after selector):

{app="myapp"} |= "error"              # line contains "error"
{app="myapp"} != "healthcheck"         # line does NOT contain
{app="myapp"} |~ "error|warn"          # regex match on line
{app="myapp"} !~ "DEBUG|TRACE"         # negative regex

Structured metadata (parsed logs):

{app="myapp"} | json                   # parse JSON logs
{app="myapp"} | json | status >= 500   # filter by parsed field
{app="myapp"} | logfmt                 # parse logfmt
{app="myapp"} | regexp `(?P<ip>\d+\.\d+\.\d+\.\d+)` # regex extract

Common patterns:

  • Errors in service: {app="checkout"} |= "error" | json | level="error"
  • HTTP 5xx: {app="api"} | json | status >= 500
  • Slow requests: {app="api"} | json | duration > 5s
  • Stack traces: {app="myapp"} |= "Exception" |= "at "

Elasticsearch Query DSL

Simple match:

{"query": {"match": {"message": "error"}}}

Boolean query (AND/OR):

{
  "query": {
    "bool": {
      "must": [
        {"match": {"message": "error"}},
        {"match": {"service.name": "checkout"}}
      ],
      "must_not": [
        {"match": {"message": "healthcheck"}}
      ]
    }
  },
  "sort": [{"@timestamp": "desc"}],
  "size": 200
}

Time range filter:

{
  "query": {
    "bool": {
      "must": [{"match": {"message": "timeout"}}],
      "filter": [
        {"range": {"@timestamp": {"gte": "now-30m", "lte": "now"}}}
      ]
    }
  }
}

Wildcard / regex:

{"query": {"regexp": {"message": "error.*timeout"}}}

Common patterns:

  • Errors in service: {"query":{"bool":{"must":[{"match":{"message":"error"}},{"match":{"service.name":"checkout"}}]}}}
  • HTTP 5xx: {"query":{"range":{"http.status_code":{"gte":500}}}}
  • Aggregate by field: Use "aggs" — but prefer simple queries for agent use

CloudWatch Filter Patterns

Simple text match:

"ERROR"                              # contains ERROR
"ERROR" "checkout"                   # contains ERROR AND checkout

JSON filter patterns:

{ $.level = "error" }               # JSON field match
{ $.statusCode >= 500 }             # numeric comparison
{ $.duration > 5000 }               # duration threshold
{ $.level = "error" && $.service = "checkout" }  # compound

Negation and wildcards:

?"ERROR" ?"timeout"                  # ERROR OR timeout (any term)
-"healthcheck"                       # does NOT contain (use with other terms)

Common patterns:

  • Errors: "ERROR"
  • Errors in service: { $.level = "error" && $.service = "checkout" }
  • HTTP 5xx: { $.statusCode >= 500 }
  • Exceptions: "Exception" "at "

Output Format

When presenting search results, use this structure:

## Log Search Results

**Backend:** Loki | **Query:** `{app="checkout"} |= "error"`
**Time range:** Last 30 minutes | **Results:** 47 entries

### Error Summary

| Error Type | Count | First Seen | Last Seen | Service |
|-----------|-------|------------|-----------|---------|
| NullPointerException | 23 | 14:02:31 | 14:28:45 | checkout |
| ConnectionTimeout | 18 | 14:05:12 | 14:29:01 | checkout → db |
| HTTP 503 | 6 | 14:06:00 | 14:27:33 | checkout → payment |

### Root Cause Analysis

1. **14:02:31** — First `NullPointerException` in checkout service...
2. **14:05:12** — Database connection timeouts begin...

### Recommended Actions

- [ ] Check database connection pool settings
- [ ] Review recent deployments to checkout service

---
*Powered by Anvil AI 🤿*

Common Workflows

Incident Triage

  1. Check backends → search for errors in affected service → search upstream/downstream services → correlate → build timeline → recommend actions.

Performance Investigation

  1. Search for slow requests (duration > 5s) → identify common patterns → check for database slow queries → check for external service timeouts.

Deployment Verification

  1. Search for errors in the deployed service since deploy time → compare error rate with pre-deploy period → flag new error types.

Limitations

  • Read-only: This skill can only search and read logs. It cannot delete, modify, or create log entries.
  • Output size: Default limit is 200 entries. Log output is pre-filtered to reduce token consumption. For larger investigations, use multiple targeted queries rather than one broad query.
  • Network access: Log backends must be reachable from the machine running OpenClaw.
  • No streaming aggregation: For complex aggregations (percentiles, rates), consider using your backend's native UI (Grafana, Kibana, CloudWatch Insights).

Troubleshooting

ErrorCauseFix
"No backends configured"No env vars setSet LOKI_ADDR, ELASTICSEARCH_URL, or configure AWS CLI
"logcli not found"logcli not installedInstall from https://grafana.com/docs/loki/latest/tools/logcli/
"aws: command not found"AWS CLI not installedInstall from https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
"curl: command not found"curl not installedapt install curl or brew install curl
"jq: command not found"jq not installedapt install jq or brew install jq
"connection refused"Backend unreachableCheck URL, VPN, firewall rules
"401 Unauthorized"Bad credentialsCheck LOKI_TOKEN, ELASTICSEARCH_TOKEN, or AWS credentials

Powered by Anvil AI 🤿

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Observability & Reliability Engineering

Complete observability & reliability engineering system. Use when designing monitoring, implementing structured logging, setting up distributed tracing, buil...

Registry SourceRecently Updated
0283
Profile unavailable
Automation

AI Agent Observability

Evaluate and monitor AI agent fleets across six key dimensions to score health, identify issues, and optimize performance for ops teams managing 1-100+ agents.

Registry SourceRecently Updated
0364
Profile unavailable
General

OpenClaw Logfire

Pydantic Logfire observability — OTEL GenAI traces, tool call spans, token metrics, distributed tracing

Registry SourceRecently Updated
0420
Profile unavailable