observability

Production observability with one service. Audit, fix, verify—every time.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "observability" with this command: npx skills add phrazzld/claude-config/phrazzld-claude-config-observability

/observability

Production observability with one service. Audit, fix, verify—every time.

Philosophy

Two services, not twenty. Sentry handles errors. PostHog handles product analytics. That's it. Vercel captures stdout automatically (no setup needed).

CLI-first. Everything manageable from command line. No dashboard clicking.

AI-agent ready. Errors should trigger automated analysis and fixes.

What This Does

Examines project observability, identifies gaps, implements fixes, and verifies alerting works. Every run does the full cycle.

Branching

Assumes you start on master /main . Before making code changes:

git checkout -b infra/observability-$(date +%Y%m%d)

Architecture

App → Sentry (errors, performance) → PostHog (product analytics, feature flags) → stdout (Vercel captures automatically) → /api/health (uptime monitoring)

AI Integration: Sentry MCP → Claude (query errors, analyze, fix) PostHog MCP → Claude (query funnels, cohorts, events) Sentry webhook → GitHub Action → agent (auto-triage) CLI scripts → manual triage and resolution

Services: 2 (Sentry + PostHog) Built-in free: Vercel logs CLI-manageable: 100% (both have MCP servers)

Process

  1. Audit

Check what exists:

Sentry configured?

~/.claude/skills/sentry-observability/scripts/detect_sentry.sh

Health endpoint?

[ -f "app/api/health/route.ts" ] || [ -f "src/app/api/health/route.ts" ] && echo "✓ Health endpoint" || echo "✗ Health endpoint"

Structured logging?

grep -r "console.log|console.error" --include=".ts" --include=".tsx" src/ app/ 2>/dev/null | head -5

PostHog analytics?

grep -q "posthog" package.json && echo "✓ PostHog" || echo "✗ PostHog not installed (P1)"

Spawn agent for deep review: Spawn observability-advocate agent to audit logging coverage, error handling, and silent failure risks.

  1. Plan

Every project needs:

Essential (every production app):

  • Sentry error tracking with source maps

  • Health check endpoint (/api/health )

  • Structured logging (JSON to stdout)

  • At least one alert rule (new errors)

Required (user-facing apps):

  • PostHog product analytics (funnels, cohorts, session replay)

  • PostHog feature flags (replaces LaunchDarkly)

Recommended:

  • Webhook for AI agent integration

  • Triage scripts for CLI management

Only if needed:

  • Custom uptime monitoring
  1. Execute

Install Sentry:

pnpm add @sentry/nextjs npx @sentry/wizard@latest -i nextjs

Or use init script:

~/.claude/skills/sentry-observability/scripts/init_sentry.sh

Configure PII redaction:

// sentry.client.config.ts Sentry.init({ dsn: process.env.NEXT_PUBLIC_SENTRY_DSN, beforeSend(event) { // Scrub PII if (event.extra) delete event.extra.password; if (event.user) delete event.user.email; return event; }, });

Create health endpoint:

// app/api/health/route.ts export async function GET() { const checks = { app: 'ok', timestamp: new Date().toISOString(), };

// Add service checks as needed // checks.database = await checkDb(); // checks.stripe = await checkStripe();

return Response.json(checks); }

Set up structured logging: Use JSON logs that Vercel can parse:

// lib/logger.ts export function log(level: 'info' | 'warn' | 'error', message: string, data?: Record<string, unknown>) { const entry = { level, message, timestamp: new Date().toISOString(), ...data, }; consolelevel === 'error' ? 'error' : 'log'; }

Create alert rule:

~/.claude/skills/sentry-observability/scripts/create_alert.sh --name "New Errors" --type issue

Set up webhook for AI integration (optional): In Sentry Dashboard → Settings → Integrations → Internal Integrations:

  • Create integration with webhook URL

  • Subscribe to issue events

  • Point to GitHub Action or custom endpoint

  1. Verify

Verify Sentry setup:

~/.claude/skills/sentry-observability/scripts/verify_setup.sh

Test error tracking:

// Trigger test error throw new Error('Test error for Sentry verification');

Then check Sentry dashboard or:

~/.claude/skills/sentry-observability/scripts/list_issues.sh --limit 1

Test health endpoint:

curl -s http://localhost:3000/api/health | jq

Test alerting:

  • Trigger an error

  • Verify alert fires (check email/Slack/webhook)

If any verification fails, go back and fix it.

AI Agent Integration

Option A: Sentry MCP Server

For direct Claude integration, use the Sentry MCP server:

// claude_desktop_config.json { "mcpServers": { "sentry": { "command": "npx", "args": ["-y", "@anthropic/sentry-mcp"], "env": { "SENTRY_AUTH_TOKEN": "your-token", "SENTRY_ORG": "your-org" } } } }

Claude can then:

  • Query recent errors

  • Get full error context

  • Analyze root causes

  • Propose fixes

Option B: Webhook → GitHub Action → Agent

For automated triage:

.github/workflows/sentry-triage.yml

name: Sentry Auto-Triage

on: repository_dispatch: types: [sentry-issue]

jobs: triage: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Analyze with Claude run: | # Query issue details and spawn analysis agent claude --print "Analyze Sentry issue ${{ github.event.client_payload.issue_id }}"

Option C: CLI Scripts (Already Exist)

List and triage issues

~/.claude/skills/sentry-observability/scripts/list_issues.sh --env production ~/.claude/skills/sentry-observability/scripts/triage_score.sh --json

Get issue details for analysis

~/.claude/skills/sentry-observability/scripts/issue_detail.sh PROJ-123

Resolve after fixing

~/.claude/skills/sentry-observability/scripts/resolve_issue.sh PROJ-123

Tool Choices

Sentry over alternatives. Best error tracking, mature CLI, AI-first roadmap (Seer webhooks, auto-fix features), excellent free tier.

Vercel logs over log services. stdout is captured automatically. No additional service needed. Query with vercel logs .

PostHog for ALL analytics. Official MCP server, Terraform provider, all-in-one platform. 1M events/month free.

NOT Vercel Analytics. It has no API, no CLI, no MCP server. Completely unusable for AI-assisted workflows. Do not install it.

Environment Variables

.env.example

Sentry (required for error tracking)

NEXT_PUBLIC_SENTRY_DSN= SENTRY_AUTH_TOKEN= SENTRY_ORG= SENTRY_PROJECT=

What You Get

When complete:

  • Sentry capturing all errors with source maps

  • Health check endpoint at /api/health

  • Structured JSON logging (captured by Vercel)

  • At least one alert rule configured

  • PostHog for product analytics (user-facing apps)

  • AI agent integration ready (MCP or webhooks)

User can:

  • See errors in Sentry immediately when they occur

  • Get alerted on new/critical errors

  • Query errors via CLI (list_issues.sh , triage_score.sh )

  • Trigger AI analysis of errors

  • Monitor app health via /api/health

  • View logs via vercel logs

Active Alerting (MANDATORY)

Passive observability is not observability. Every production app needs:

Email Alerts

  • Sentry alert rule: new issue with level:error → email notification

  • Zero tolerance: fire on FIRST occurrence (not frequency-based)

  • Include: error message, stack trace, affected users count, URL

GitHub Issue Auto-Creation

  • Sentry-GitHub integration: new Sentry issue → GitHub issue

  • Labels: bug , auto-detected , P0

  • Body: stack trace, breadcrumbs, user impact, Sentry link, affected file paths

  • Recent commits to affected files (for causal analysis)

Error Boundaries Must Be Loud

Every error boundary MUST call Sentry.captureException(error) :

// ❌ BAD: Silent error boundary export default function Error({ error }: { error: Error }) { return <p>Something went wrong</p>; }

// ✅ GOOD: Error boundary that screams 'use client'; import * as Sentry from '@sentry/nextjs'; import { useEffect } from 'react';

export default function Error({ error, reset }: { error: Error; reset: () => void }) { useEffect(() => { Sentry.captureException(error); }, [error]); return ( <div> <h2>Something went wrong</h2> <button onClick={reset}>Try again</button> </div> ); }

Root Error Boundary

app/global-error.tsx is REQUIRED for any Next.js app with Sentry:

'use client'; import * as Sentry from '@sentry/nextjs'; import { useEffect } from 'react';

export default function GlobalError({ error, reset }: { error: Error; reset: () => void }) { useEffect(() => { Sentry.captureException(error); }, [error]); return ( <html><body> <h1>Application Error</h1> <button onClick={reset}>Reload</button> </body></html> ); }

PostHog Exception Autocapture

Enable $exception autocapture in PostHog config for client-side coverage that complements Sentry's server-side tracking.

Uptime Monitoring

External ping to /api/health every 5 min. Options:

  • GitHub Actions cron job

  • UptimeRobot (free tier)

  • Simple curl script on a separate machine

9 Observability Gaps (Incident Checklist)

Gap Fix

1 No alerting pipeline Sentry → email + auto-create GitHub issue

2 No global-error.tsx

Add root error boundary with captureException

3 Silent error boundaries Add captureException to ALL error.tsx files

4 Missing route error boundaries Add error.tsx to every app route

5 No Sentry release tracking sentry-cli releases in CI

6 Backend errors invisible Forward function errors to Sentry via wrapper

7 No uptime monitoring External cron pings /api/health every 5 min

8 Logger disconnected from Sentry logError() calls captureException()

9 Exception autocapture off Enable $exception in PostHog config

Related Skills

  • sentry-observability — Detailed Sentry setup and scripts

  • observability-stack — PostHog/analytics integration patterns

  • observability-advocate — Agent for auditing observability coverage

  • check-production — Production health checks

  • flywheel-qa — Agentic QA for preview deployments

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

design-audit

No summary provided by upstream source.

Repository SourceNeeds Review
Security

changelog-audit

No summary provided by upstream source.

Repository SourceNeeds Review
Security

billing-security

No summary provided by upstream source.

Repository SourceNeeds Review