Langfuse Observability

Query traces, prompts, and metrics from Langfuse. Requires env vars:

LANGFUSE_SECRET_KEY
LANGFUSE_PUBLIC_KEY
LANGFUSE_HOST (e.g., https://us.cloud.langfuse.com )

Quick Start

All commands run from the skill directory:

cd ~/.claude/skills/langfuse-observability

List Recent Traces

Last 10 traces

npx tsx scripts/fetch-traces.ts --limit 10

Filter by name pattern

npx tsx scripts/fetch-traces.ts --name "quiz-generation" --limit 5

Filter by user

npx tsx scripts/fetch-traces.ts --user-id "user_abc123" --limit 10

Get Single Trace Details

Full trace with spans and generations

npx tsx scripts/fetch-trace.ts <trace-id>

Get Prompt

Fetch specific prompt

npx tsx scripts/list-prompts.ts --name scry-intent-extraction

With label

npx tsx scripts/list-prompts.ts --name scry-intent-extraction --label production

Get Metrics Summary

Summary for recent traces

npx tsx scripts/get-metrics.ts --limit 50

Filter by trace name

npx tsx scripts/get-metrics.ts --name "quiz-generation" --limit 100

Output Formats

All scripts output JSON to stdout for easy parsing.

Trace List Output

[ { "id": "trace-abc123", "name": "quiz-generation", "userId": "user_xyz", "input": {"prompt": "..."}, "output": {"concepts": [...]}, "latencyMs": 3200, "createdAt": "2025-12-09T..." } ]

Single Trace Output

Includes full nested structure: trace → observations (spans + generations) with token usage.

Metrics Output

{ "totalTraces": 50, "successCount": 48, "errorCount": 2, "avgLatencyMs": 2850, "totalTokens": 125000, "byName": {"quiz-generation": 30, "phrasing-generation": 20} }

Common Workflows

Debug Failed Generation

cd ~/.claude/skills/langfuse-observability

1. Find recent traces

npx tsx scripts/fetch-traces.ts --limit 10

2. Get details of specific trace

npx tsx scripts/fetch-trace.ts <trace-id>

Monitor Token Usage

Get metrics for cost analysis

npx tsx scripts/get-metrics.ts --limit 100

Check Prompt Configuration

npx tsx scripts/list-prompts.ts --name scry-concept-synthesis --label production

Cost Tracking

Calculate Costs

// Get metrics with cost calculation const metrics = await langfuse.getMetrics({ limit: 100 });

// Pricing per 1M tokens (update as needed) const pricing = { "claude-3-5-sonnet": { input: 3.0, output: 15.0 }, "gpt-4o": { input: 2.5, output: 10.0 }, "gpt-4o-mini": { input: 0.15, output: 0.6 }, };

function calculateCost(model: string, inputTokens: number, outputTokens: number) { const p = pricing[model] || { input: 1, output: 1 }; return (inputTokens * p.input + outputTokens * p.output) / 1_000_000; }

Daily/Monthly Spend

Get traces for date range

npx tsx scripts/fetch-traces.ts --from "2025-12-01" --to "2025-12-07" --limit 1000

Calculate spend (parse output and sum costs)

Cost Alerts

Set up alerts in Langfuse dashboard:

Go to Dashboard → Alerts
Create alert for: daily_cost > X or cost_per_trace > Y
Configure notification (email, Slack webhook)

Or implement in code:

async function checkCostBudget() { const dailyMetrics = await langfuse.getMetrics({ since: "24h" }); const dailyCost = calculateTotalCost(dailyMetrics);

if (dailyCost > DAILY_BUDGET) { await notifySlack(⚠️ LLM daily spend ($${dailyCost}) exceeded budget ($${DAILY_BUDGET})); } }

Production Best Practices

Trace Everything

import { Langfuse } from "langfuse";

const langfuse = new Langfuse({ publicKey: process.env.LANGFUSE_PUBLIC_KEY, secretKey: process.env.LANGFUSE_SECRET_KEY, });

// Wrap every LLM call async function tracedLLMCall(name: string, messages: Message[]) { const trace = langfuse.trace({ name, userId: currentUser.id, metadata: { environment: process.env.NODE_ENV }, });

const generation = trace.generation({ name: "chat", model: selectedModel, input: messages, });

try { const response = await llm.chat({ model: selectedModel, messages });

generation.end({
  output: response.choices[0].message,
  usage: {
    promptTokens: response.usage.prompt_tokens,
    completionTokens: response.usage.completion_tokens,
  },
});

return response;

} catch (error) { generation.end({ level: "ERROR", statusMessage: error.message }); throw error; } }

Add Context

// Include useful metadata for debugging const trace = langfuse.trace({ name: "user-query", userId: user.id, sessionId: session.id, // Group related traces metadata: { userPlan: user.plan, feature: "chat", version: "v2.1", }, tags: ["production", "chat-feature"], });

Score Outputs

// Track quality metrics generation.score({ name: "user-feedback", value: userRating, // 1-5 });

// Or automated scoring generation.score({ name: "response-length", value: response.content.length < 500 ? 1 : 0, });

Flush Before Exit

// Important for serverless environments await langfuse.flushAsync();

Promptfoo Integration

Trace → Eval Case Workflow

Find interesting traces in Langfuse (failures, edge cases)
Export as test cases for Promptfoo
Add to regression suite to prevent future issues

// Export failed traces as test cases const failedTraces = await langfuse.getTraces({ level: "ERROR", limit: 50 });

const testCases = failedTraces.map(trace => ({ vars: trace.input, assert: [ { type: "not-contains", value: "error" }, { type: "llm-rubric", value: "Response should address the user's question" }, ], }));

// Add to promptfooconfig.yaml

Langfuse Callback in Promptfoo

promptfooconfig.yaml

defaultTest: options: callback: langfuse callbackConfig: publicKey: ${LANGFUSE_PUBLIC_KEY} secretKey: ${LANGFUSE_SECRET_KEY}

Alternatives Comparison

Feature Langfuse Helicone LangSmith

Open Source ✅ ✅ ❌

Self-Host ✅ ✅ ❌

Free Tier ✅ Generous ✅ 10K/mo ⚠️ Limited

Prompt Mgmt ✅ ❌ ✅

Tracing ✅ ✅ ✅

Cost Track ✅ ✅ ✅

A/B Testing ⚠️ ❌ ✅

Choose Langfuse when: Self-hosting needed, cost-conscious, want prompt management.

Choose Helicone when: Proxy-based setup preferred, simple integration.

Choose LangSmith when: LangChain ecosystem, enterprise support needed.

Related Skills

llm-evaluation
Promptfoo for testing, pairs well with Langfuse for observability
llm-gateway-routing
OpenRouter/LiteLLM for model routing
ai-llm-development
Overall LLM development patterns

Related Commands

/llm-gates
Audit LLM infrastructure including observability gaps
/observe
General observability audit

langfuse-observability

Safety Notice

Copy this and send it to your AI assistant to learn