langfuse-observability

Langfuse Observability

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "langfuse-observability" with this command: npx skills add phrazzld/claude-config/phrazzld-claude-config-langfuse-observability

Langfuse Observability

Query traces, prompts, and metrics from Langfuse. Requires env vars:

Quick Start

All commands run from the skill directory:

cd ~/.claude/skills/langfuse-observability

List Recent Traces

Last 10 traces

npx tsx scripts/fetch-traces.ts --limit 10

Filter by name pattern

npx tsx scripts/fetch-traces.ts --name "quiz-generation" --limit 5

Filter by user

npx tsx scripts/fetch-traces.ts --user-id "user_abc123" --limit 10

Get Single Trace Details

Full trace with spans and generations

npx tsx scripts/fetch-trace.ts <trace-id>

Get Prompt

Fetch specific prompt

npx tsx scripts/list-prompts.ts --name scry-intent-extraction

With label

npx tsx scripts/list-prompts.ts --name scry-intent-extraction --label production

Get Metrics Summary

Summary for recent traces

npx tsx scripts/get-metrics.ts --limit 50

Filter by trace name

npx tsx scripts/get-metrics.ts --name "quiz-generation" --limit 100

Output Formats

All scripts output JSON to stdout for easy parsing.

Trace List Output

[ { "id": "trace-abc123", "name": "quiz-generation", "userId": "user_xyz", "input": {"prompt": "..."}, "output": {"concepts": [...]}, "latencyMs": 3200, "createdAt": "2025-12-09T..." } ]

Single Trace Output

Includes full nested structure: trace → observations (spans + generations) with token usage.

Metrics Output

{ "totalTraces": 50, "successCount": 48, "errorCount": 2, "avgLatencyMs": 2850, "totalTokens": 125000, "byName": {"quiz-generation": 30, "phrasing-generation": 20} }

Common Workflows

Debug Failed Generation

cd ~/.claude/skills/langfuse-observability

1. Find recent traces

npx tsx scripts/fetch-traces.ts --limit 10

2. Get details of specific trace

npx tsx scripts/fetch-trace.ts <trace-id>

Monitor Token Usage

Get metrics for cost analysis

npx tsx scripts/get-metrics.ts --limit 100

Check Prompt Configuration

npx tsx scripts/list-prompts.ts --name scry-concept-synthesis --label production

Cost Tracking

Calculate Costs

// Get metrics with cost calculation const metrics = await langfuse.getMetrics({ limit: 100 });

// Pricing per 1M tokens (update as needed) const pricing = { "claude-3-5-sonnet": { input: 3.0, output: 15.0 }, "gpt-4o": { input: 2.5, output: 10.0 }, "gpt-4o-mini": { input: 0.15, output: 0.6 }, };

function calculateCost(model: string, inputTokens: number, outputTokens: number) { const p = pricing[model] || { input: 1, output: 1 }; return (inputTokens * p.input + outputTokens * p.output) / 1_000_000; }

Daily/Monthly Spend

Get traces for date range

npx tsx scripts/fetch-traces.ts --from "2025-12-01" --to "2025-12-07" --limit 1000

Calculate spend (parse output and sum costs)

Cost Alerts

Set up alerts in Langfuse dashboard:

  • Go to Dashboard → Alerts

  • Create alert for: daily_cost > X or cost_per_trace > Y

  • Configure notification (email, Slack webhook)

Or implement in code:

async function checkCostBudget() { const dailyMetrics = await langfuse.getMetrics({ since: "24h" }); const dailyCost = calculateTotalCost(dailyMetrics);

if (dailyCost > DAILY_BUDGET) { await notifySlack(⚠️ LLM daily spend ($${dailyCost}) exceeded budget ($${DAILY_BUDGET})); } }

Production Best Practices

  1. Trace Everything

import { Langfuse } from "langfuse";

const langfuse = new Langfuse({ publicKey: process.env.LANGFUSE_PUBLIC_KEY, secretKey: process.env.LANGFUSE_SECRET_KEY, });

// Wrap every LLM call async function tracedLLMCall(name: string, messages: Message[]) { const trace = langfuse.trace({ name, userId: currentUser.id, metadata: { environment: process.env.NODE_ENV }, });

const generation = trace.generation({ name: "chat", model: selectedModel, input: messages, });

try { const response = await llm.chat({ model: selectedModel, messages });

generation.end({
  output: response.choices[0].message,
  usage: {
    promptTokens: response.usage.prompt_tokens,
    completionTokens: response.usage.completion_tokens,
  },
});

return response;

} catch (error) { generation.end({ level: "ERROR", statusMessage: error.message }); throw error; } }

  1. Add Context

// Include useful metadata for debugging const trace = langfuse.trace({ name: "user-query", userId: user.id, sessionId: session.id, // Group related traces metadata: { userPlan: user.plan, feature: "chat", version: "v2.1", }, tags: ["production", "chat-feature"], });

  1. Score Outputs

// Track quality metrics generation.score({ name: "user-feedback", value: userRating, // 1-5 });

// Or automated scoring generation.score({ name: "response-length", value: response.content.length < 500 ? 1 : 0, });

  1. Flush Before Exit

// Important for serverless environments await langfuse.flushAsync();

Promptfoo Integration

Trace → Eval Case Workflow

  • Find interesting traces in Langfuse (failures, edge cases)

  • Export as test cases for Promptfoo

  • Add to regression suite to prevent future issues

// Export failed traces as test cases const failedTraces = await langfuse.getTraces({ level: "ERROR", limit: 50 });

const testCases = failedTraces.map(trace => ({ vars: trace.input, assert: [ { type: "not-contains", value: "error" }, { type: "llm-rubric", value: "Response should address the user's question" }, ], }));

// Add to promptfooconfig.yaml

Langfuse Callback in Promptfoo

promptfooconfig.yaml

defaultTest: options: callback: langfuse callbackConfig: publicKey: ${LANGFUSE_PUBLIC_KEY} secretKey: ${LANGFUSE_SECRET_KEY}

Alternatives Comparison

Feature Langfuse Helicone LangSmith

Open Source ✅ ✅ ❌

Self-Host ✅ ✅ ❌

Free Tier ✅ Generous ✅ 10K/mo ⚠️ Limited

Prompt Mgmt ✅ ❌ ✅

Tracing ✅ ✅ ✅

Cost Track ✅ ✅ ✅

A/B Testing ⚠️ ❌ ✅

Choose Langfuse when: Self-hosting needed, cost-conscious, want prompt management.

Choose Helicone when: Proxy-based setup preferred, simple integration.

Choose LangSmith when: LangChain ecosystem, enterprise support needed.

Related Skills

  • llm-evaluation

  • Promptfoo for testing, pairs well with Langfuse for observability

  • llm-gateway-routing

  • OpenRouter/LiteLLM for model routing

  • ai-llm-development

  • Overall LLM development patterns

Related Commands

  • /llm-gates

  • Audit LLM infrastructure including observability gaps

  • /observe

  • General observability audit

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

pencil-renderer

No summary provided by upstream source.

Repository SourceNeeds Review
General

ui-skills

No summary provided by upstream source.

Repository SourceNeeds Review
General

llm-gateway-routing

No summary provided by upstream source.

Repository SourceNeeds Review