Langfuse Observability
Langfuse is the LLM observability layer for joelclaw. Every LLM call produces a Langfuse trace with nested hierarchy, I/O, usage, cost, and attribution.
Architecture
joelclaw has two Langfuse integration points:
1. Pi-session extension (langfuse-cost)
- Source:
pi/extensions/langfuse-cost/index.ts(canonical, git-tracked in this repo) - Runtime: loaded as a pi extension from the same source tree
- What it traces: Every gateway + interactive pi session LLM call
- How: Hooks into pi session events (
session_start,message_start,message_end,tool_call,tool_result,session_shutdown) - Dedup:
globalThis.__langfuse_cost_loaded__guard prevents duplicate extension instances - Optional dependency behavior:
langfuseis lazily loaded (no top-level hard import). Missing module must disable telemetry, not crash extension import. Regression test:pi/extensions/langfuse-cost/index.test.ts - Runtime dependency location: because the extension is loaded from
pi/extensions/at repo root instead of a workspace package, thelangfusenpm package must be available from the repo rootpackage.json. If root install drift drops it, gateway/session telemetry silently degrades to the optional-dependency warning again.
2. System-bus OTEL bridge (langfuse.ts)
- Source:
packages/system-bus/src/lib/langfuse.ts - What it traces: All Inngest function LLM calls (reflect, triage, email cleanup, docs ingest)
- How:
@langfuse/otelLangfuseSpanProcessor+@langfuse/tracingstartObservation() - Produces:
joelclaw.inferencetraces with generation children
Current Trace Hierarchy (pi-session)
The langfuse-cost extension produces a 4-level nested span hierarchy:
joelclaw.session (trace)
└── session (span) — entire session lifetime
└── turn-1 (span) — user message → final assistant response
│ ├── tool:bash (span) — individual tool execution
│ ├── tool:read (span)
│ └── llm.call (generation) — the LLM API call with usage/cost
└── turn-2 (span)
├── tool:edit (span)
├── tool:bash (span)
└── llm.call (generation)
What each level captures
| Level | Created on | Ended on | Contains |
|---|---|---|---|
joelclaw.session trace | session_start | session_shutdown | userId, sessionId, tags, turn count |
session span | session_start | session_shutdown | Channel, session type, turn count |
turn-N span | message_start[user] | message_end[assistant] with text output | User input (clean), sourceChannel metadata |
tool:name span | tool_call event | tool_result event | Tool input, output (truncated 500 chars) |
llm.call generation | message_end[assistant] | immediate | Model, usage, cache tokens, cost, I/O |
Channel header stripping
User messages from Telegram arrive with a ---\nChannel:...\n--- header. The extension:
- Strips the header from trace
input(clean user text only) - Parses known keys (
channel,date,platform_capabilities) intosourceChannelmetadata - Skips multi-line values (e.g.
formatting_guide)
Credentials
Langfuse creds in agent-secrets:
langfuse_public_key—pk-lf-cb8b...langfuse_secret_key—sk-lf-c86f...langfuse_base_url—https://us.cloud.langfuse.com
Gateway gets them via gateway-start.sh env exports. System-bus resolves via env → secrets lease fallback.
Trace Conventions
Naming
- Pi-session:
joelclaw.session(trace) →session→turn-N→tool:name→llm.call - System-bus:
joelclaw.inference(trace) → generation children
Required Attributes
Every trace MUST have:
userId: "joel"sessionId— pi session ID for groupingtags— minimum:["joelclaw", "pi-session"]- Dynamic tags:
provider:anthropic,model:anthropic/claude-opus-4-6,channel:central,session:central
Metadata Shape (flat, filterable)
{
channel: "central", // GATEWAY_ROLE env
sessionType: "central", // "gateway" | "interactive" | "codex" | "central"
component: "pi-session",
model: "anthropic/claude-opus-4-6",
provider: "anthropic",
stopReason: "toolUse", // or "endTurn"
turnCount: 5, // Updated on each turn
sourceChannel: { // Only on first user message per turn
channel: "telegram",
date: "...",
platform_capabilities: "..."
},
tools: ["bash", "read"], // Tool names used this turn
}
Generation usageDetails
{
input: 1, // Non-cached input tokens
output: 97, // Output tokens
total: 68195, // Total tokens
cache_read_input_tokens: 67877, // 90% discount
cache_write_input_tokens: 220, // 25% premium (NOT priced by Langfuse — known gap)
}
Pi session guardrails (alert-only)
Long-running pi sessions can dominate Langfuse spend. The extension now tracks per-session totals and emits warnings only on first threshold breach per guardrail type:
JOELCLAW_LANGFUSE_ALERT_MAX_LLM_CALLS(default:120)JOELCLAW_LANGFUSE_ALERT_MAX_TOTAL_TOKENS(default:1200000)JOELCLAW_LANGFUSE_ALERT_MAX_COST_USD(default:20)
Behavior:
- no automatic model switch
- no forced compaction
- no stop/interruption
- emits
console.warn(...)with session ID + current counters - records breach flags and first breach turn index in trace metadata (
guardrails)
Model/provider normalization
Both the pi-session extension and system-bus Langfuse bridge normalize provider/model before writing tags, trace metadata, and generation model fields. This keeps provider:* + model:* tags aligned with metadata after model switches and for provider-prefixed IDs such as:
anthropic/claude-opus-4-6openai-codex/gpt-5.4
Normalization is fail-open: tracing continues even if normalization cannot resolve a value.
Output-contract + usage-coverage signals (2026-03-02)
System-bus inference now emits explicit coverage/output-contract metadata so low-yield calls are queryable:
usageCoverage: "present"|"missing"usageCaptured: booleanjsonRequested,jsonParsed,outputChars- warning OTEL event:
model_router.usage_missing
For strict machine-readable paths, callers can require output contracts:
requireJson: true— parse failure becomes inference failurerequireTextOutput: true— empty text becomes inference failure
Recall rewrite traces now include rewriteReason in addition to strategy (disabled|skipped|haiku|openai|fallback) to separate deliberate skips from failure fallbacks.
Known Gaps
| Issue | Severity | Notes |
|---|---|---|
cache_write_input_tokens not priced | Medium | Langfuse platform limitation — no cache write rate in their pricing table |
No completionStartTime on first turn | Low | lastAssistantStartTime not set before first message_start[assistant] |
tool_result matching | Low | Relies on toolCallId — if pi changes the field name, spans won't close |
Debugging
Check recent traces
LF_PK=$(secrets lease langfuse_public_key --ttl 5m)
LF_SK=$(secrets lease langfuse_secret_key --ttl 5m)
curl -s -u "$LF_PK:$LF_SK" "https://us.cloud.langfuse.com/api/public/traces?limit=5" \
| jq '[.data[] | {name, ts: .timestamp[:19], obs: (.observations | length), output: (.output // "" | tostring | .[0:60])}]'
Check nested observations on a trace
TRACE_ID="<id>"
curl -s -u "$LF_PK:$LF_SK" "https://us.cloud.langfuse.com/api/public/observations?traceId=$TRACE_ID" \
| jq '[.data[] | {name, type, model, startTime: .startTime[:19], endTime: .endTime[:19]}]'
Common Issues
| Symptom | Cause | Fix |
|---|---|---|
| Double traces | Extension loaded twice via symlink/realpath split | globalThis dedup guard (already fixed) |
[toolUse] output instead of tool names | tool_call events not firing | Check pi version, verify toolName field on event |
| No traces at all | Langfuse creds missing | Check LANGFUSE_PUBLIC_KEY/LANGFUSE_SECRET_KEY env |
channel:interactive on gateway | GATEWAY_ROLE not set | Must be in gateway-start.sh |
| Stale extension code | Gateway/interactive session not reloaded after change | Restart gateway and start a fresh interactive session |
| OTEL emit errors in gateway | system-bus-worker port-forward down | kubectl port-forward -n joelclaw svc/system-bus-worker 3111:3111 |
Key Files
- Pi extension:
pi/extensions/langfuse-cost/index.ts - Pi extension tests:
pi/extensions/langfuse-cost/index.test.ts - System-bus bridge:
packages/system-bus/src/lib/langfuse.ts - Gateway ops notes:
docs/gateway.md
Deployment Workflow
After editing the pi extension:
- Commit changes in this repo (source of truth).
- Restart gateway so the updated extension is loaded.
- Start a new interactive pi session (or reload) so per-session tracing uses the new code.
ADRs
- ADR-0146: Inference Cost Monitoring and Control —
shipped - ADR-0147: Named Agent Profiles (trace attribution by role)