Langfuse Observability

Langfuse is the LLM observability layer for joelclaw. Every LLM call produces a Langfuse trace with nested hierarchy, I/O, usage, cost, and attribution.

Architecture

joelclaw has two Langfuse integration points:

1. Pi-session extension (`langfuse-cost`)

Source: pi/extensions/langfuse-cost/index.ts (canonical, git-tracked in this repo)
Runtime: loaded as a pi extension from the same source tree
What it traces: Every gateway + interactive pi session LLM call
How: Hooks into pi session events (session_start, message_start, message_end, tool_call, tool_result, session_shutdown)
Dedup: globalThis.__langfuse_cost_loaded__ guard prevents duplicate extension instances
Optional dependency behavior: langfuse is lazily loaded (no top-level hard import). Missing module must disable telemetry, not crash extension import. Regression test: pi/extensions/langfuse-cost/index.test.ts
Runtime dependency location: because the extension is loaded from pi/extensions/ at repo root instead of a workspace package, the langfuse npm package must be available from the repo root package.json. If root install drift drops it, gateway/session telemetry silently degrades to the optional-dependency warning again.

2. System-bus OTEL bridge (`langfuse.ts`)

Source: packages/system-bus/src/lib/langfuse.ts
What it traces: All Inngest function LLM calls (reflect, triage, email cleanup, docs ingest)
How: @langfuse/otel LangfuseSpanProcessor + @langfuse/tracing startObservation()
Produces: joelclaw.inference traces with generation children

Current Trace Hierarchy (pi-session)

The langfuse-cost extension produces a 4-level nested span hierarchy:

joelclaw.session (trace)
  └── session (span) — entire session lifetime
        └── turn-1 (span) — user message → final assistant response
        │     ├── tool:bash (span) — individual tool execution
        │     ├── tool:read (span)
        │     └── llm.call (generation) — the LLM API call with usage/cost
        └── turn-2 (span)
              ├── tool:edit (span)
              ├── tool:bash (span)
              └── llm.call (generation)

What each level captures

Level	Created on	Ended on	Contains
`joelclaw.session` trace	`session_start`	`session_shutdown`	userId, sessionId, tags, turn count
`session` span	`session_start`	`session_shutdown`	Channel, session type, turn count
`turn-N` span	`message_start[user]`	`message_end[assistant]` with text output	User input (clean), sourceChannel metadata
`tool:name` span	`tool_call` event	`tool_result` event	Tool input, output (truncated 500 chars)
`llm.call` generation	`message_end[assistant]`	immediate	Model, usage, cache tokens, cost, I/O

Channel header stripping

User messages from Telegram arrive with a ---\nChannel:...\n--- header. The extension:

Strips the header from trace input (clean user text only)
Parses known keys (channel, date, platform_capabilities) into sourceChannel metadata
Skips multi-line values (e.g. formatting_guide)

Credentials

Langfuse creds in agent-secrets:

langfuse_public_key — pk-lf-cb8b...
langfuse_secret_key — sk-lf-c86f...
langfuse_base_url — https://us.cloud.langfuse.com

Gateway gets them via gateway-start.sh env exports. System-bus resolves via env → secrets lease fallback.

Trace Conventions

Naming

Pi-session: joelclaw.session (trace) → session → turn-N → tool:name → llm.call
System-bus: joelclaw.inference (trace) → generation children

Required Attributes

Every trace MUST have:

userId: "joel"
sessionId — pi session ID for grouping
tags — minimum: ["joelclaw", "pi-session"]
Dynamic tags: provider:anthropic, model:anthropic/claude-opus-4-6, channel:central, session:central

Metadata Shape (flat, filterable)

{
  channel: "central",           // GATEWAY_ROLE env
  sessionType: "central",       // "gateway" | "interactive" | "codex" | "central"
  component: "pi-session",
  model: "anthropic/claude-opus-4-6",
  provider: "anthropic",
  stopReason: "toolUse",        // or "endTurn"
  turnCount: 5,                 // Updated on each turn
  sourceChannel: {              // Only on first user message per turn
    channel: "telegram",
    date: "...",
    platform_capabilities: "..."
  },
  tools: ["bash", "read"],      // Tool names used this turn
}

Generation usageDetails

{
  input: 1,                      // Non-cached input tokens
  output: 97,                    // Output tokens
  total: 68195,                  // Total tokens
  cache_read_input_tokens: 67877, // 90% discount
  cache_write_input_tokens: 220,  // 25% premium (NOT priced by Langfuse — known gap)
}

Pi session guardrails (alert-only)

Long-running pi sessions can dominate Langfuse spend. The extension now tracks per-session totals and emits warnings only on first threshold breach per guardrail type:

JOELCLAW_LANGFUSE_ALERT_MAX_LLM_CALLS (default: 120)
JOELCLAW_LANGFUSE_ALERT_MAX_TOTAL_TOKENS (default: 1200000)
JOELCLAW_LANGFUSE_ALERT_MAX_COST_USD (default: 20)

Behavior:

no automatic model switch
no forced compaction
no stop/interruption
emits console.warn(...) with session ID + current counters
records breach flags and first breach turn index in trace metadata (guardrails)

Model/provider normalization

Both the pi-session extension and system-bus Langfuse bridge normalize provider/model before writing tags, trace metadata, and generation model fields. This keeps provider:* + model:* tags aligned with metadata after model switches and for provider-prefixed IDs such as:

anthropic/claude-opus-4-6
openai-codex/gpt-5.4

Normalization is fail-open: tracing continues even if normalization cannot resolve a value.

Output-contract + usage-coverage signals (2026-03-02)

System-bus inference now emits explicit coverage/output-contract metadata so low-yield calls are queryable:

usageCoverage: "present"|"missing"
usageCaptured: boolean
jsonRequested, jsonParsed, outputChars
warning OTEL event: model_router.usage_missing

For strict machine-readable paths, callers can require output contracts:

requireJson: true — parse failure becomes inference failure
requireTextOutput: true — empty text becomes inference failure

Recall rewrite traces now include rewriteReason in addition to strategy (disabled|skipped|haiku|openai|fallback) to separate deliberate skips from failure fallbacks.

Known Gaps

Issue	Severity	Notes
`cache_write_input_tokens` not priced	Medium	Langfuse platform limitation — no cache write rate in their pricing table
No `completionStartTime` on first turn	Low	`lastAssistantStartTime` not set before first `message_start[assistant]`
`tool_result` matching	Low	Relies on `toolCallId` — if pi changes the field name, spans won't close

Debugging

Check recent traces

LF_PK=$(secrets lease langfuse_public_key --ttl 5m)
LF_SK=$(secrets lease langfuse_secret_key --ttl 5m)
curl -s -u "$LF_PK:$LF_SK" "https://us.cloud.langfuse.com/api/public/traces?limit=5" \
  | jq '[.data[] | {name, ts: .timestamp[:19], obs: (.observations | length), output: (.output // "" | tostring | .[0:60])}]'

Check nested observations on a trace

TRACE_ID="<id>"
curl -s -u "$LF_PK:$LF_SK" "https://us.cloud.langfuse.com/api/public/observations?traceId=$TRACE_ID" \
  | jq '[.data[] | {name, type, model, startTime: .startTime[:19], endTime: .endTime[:19]}]'

Common Issues

Symptom	Cause	Fix
Double traces	Extension loaded twice via symlink/realpath split	globalThis dedup guard (already fixed)
`[toolUse]` output instead of tool names	`tool_call` events not firing	Check pi version, verify `toolName` field on event
No traces at all	Langfuse creds missing	Check `LANGFUSE_PUBLIC_KEY`/`LANGFUSE_SECRET_KEY` env
`channel:interactive` on gateway	`GATEWAY_ROLE` not set	Must be in `gateway-start.sh`
Stale extension code	Gateway/interactive session not reloaded after change	Restart gateway and start a fresh interactive session
OTEL emit errors in gateway	system-bus-worker port-forward down	`kubectl port-forward -n joelclaw svc/system-bus-worker 3111:3111`

Key Files

Pi extension: pi/extensions/langfuse-cost/index.ts
Pi extension tests: pi/extensions/langfuse-cost/index.test.ts
System-bus bridge: packages/system-bus/src/lib/langfuse.ts
Gateway ops notes: docs/gateway.md

Deployment Workflow

After editing the pi extension:

Commit changes in this repo (source of truth).
Restart gateway so the updated extension is loaded.
Start a new interactive pi session (or reload) so per-session tracing uses the new code.

ADRs

ADR-0146: Inference Cost Monitoring and Control — shipped
ADR-0147: Named Agent Profiles (trace attribution by role)

langfuse

Safety Notice

Copy this and send it to your AI assistant to learn

Langfuse Observability

Architecture

1. Pi-session extension (`langfuse-cost`)

2. System-bus OTEL bridge (`langfuse.ts`)

Current Trace Hierarchy (pi-session)

What each level captures

Channel header stripping

Credentials

Trace Conventions

Naming

Required Attributes

Metadata Shape (flat, filterable)

Generation usageDetails

Pi session guardrails (alert-only)

Model/provider normalization

Output-contract + usage-coverage signals (2026-03-02)

Known Gaps

Debugging

Check recent traces

Check nested observations on a trace

Common Issues

Key Files

Deployment Workflow

ADRs

Source Transparency

Related Skills

agent-loop

agent-mail

SealVera

langfuse

Safety Notice

Copy this and send it to your AI assistant to learn

Langfuse Observability

Architecture

1. Pi-session extension (langfuse-cost)

2. System-bus OTEL bridge (langfuse.ts)

Current Trace Hierarchy (pi-session)

What each level captures

Channel header stripping

Credentials

Trace Conventions

Naming

Required Attributes

Metadata Shape (flat, filterable)

Generation usageDetails

Pi session guardrails (alert-only)

Model/provider normalization

Output-contract + usage-coverage signals (2026-03-02)

Known Gaps

Debugging

Check recent traces

Check nested observations on a trace

Common Issues

Key Files

Deployment Workflow

ADRs

Source Transparency

Related Skills

agent-loop

agent-mail

SealVera

1. Pi-session extension (`langfuse-cost`)

2. System-bus OTEL bridge (`langfuse.ts`)