Skill: langfuse-trace-logger

Purpose: Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Scope: Called by Loki at the end of every session wrap (Phase 4) for each significant subagent completion. Script: /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py

⚠️ CRITICAL: Python Version

Always use ~/.chatterbox-venv/bin/python3 (Python 3.11.15)

The langfuse SDK uses pydantic v1, which is incompatible with Python 3.14. Running with system Python (python3) or pyenv Python (3.14.x) causes silent failure — no import error, no exception, trace just doesn't appear in Langfuse UI. This will waste 30+ minutes of debugging.

# ✅ Correct
~/.chatterbox-venv/bin/python3 scripts/langfuse-trace-logger.py ...

# ❌ Wrong — silent failure on Python 3.14
python3 scripts/langfuse-trace-logger.py ...
/Users/loki/.pyenv/versions/3.14.3/bin/python3 scripts/langfuse-trace-logger.py ...

Basic Invocation

~/.chatterbox-venv/bin/python3 /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py \
  --session-id "$SESSION_ID" \
  --parent-id "agent:main" \
  --agent "kit" \
  --task "task-label-kebab-case" \
  --model "anthropic/claude-sonnet-4-6" \
  --status "completed" \
  --input "full task prompt given to agent (first 4000 chars)..." \
  --output "what the agent returned or accomplished..." \
  --duration 278 \
  --tokens 16900 \
  --project "reddi-agent-protocol" \
  --skills "product-tour-capture"

Trace Schema

Field	Type	Purpose	Notes
`--session-id`	string	Subagent session key	Use actual subagent session key — enables lineage tracing
`--parent-id`	string	Parent session reference	Always `"agent:main"` unless nested subagent
`--agent`	string	Agent name	Lowercase: kit, archie, sara, finn, quill, etc.
`--task`	string	Task label (kebab-case)	Used for replay grouping: `replay-judge.py --tag "task:kit-setup-rebuild"`
`--model`	string	Model used	e.g. `anthropic/claude-sonnet-4-6`, `anthropic/claude-haiku-4-5`
`--status`	string	Outcome	`completed` / `partial` / `failed`
`--input`	string	Full task prompt	First 4000 chars — this is what gets replayed against other models in judge runs
`--output`	string	Result summary	Agent's output/result — this is what the judge scores
`--duration`	int	Time in seconds	Used for efficiency analysis and agent routing decisions
`--tokens`	int	Total tokens used	Used for cost analysis and budget governance
`--project`	string	Project slug	Must match `projects/<slug>/STATUS.md` — enables project-level filtering
`--skills`	string	Comma-separated skills	e.g. `"product-tour-capture,ffmpeg-studio"` — enables skill effectiveness filtering

Tag Taxonomy

The logger automatically generates these tags from the fields above:

agent:kit — from --agent
model_family:claude-sonnet — derived from --model
project:reddi-agent-protocol — from --project
skill:product-tour-capture — one tag per skill in --skills
task:kit-setup-rebuild — from --task
status:completed — from --status

These tags power the replay-judge filter syntax.

Backfill Pattern

For retroactive logging when a session wrap was skipped or traces are missing.

Idempotent: Uses deterministic trace IDs based on date+agent+task hash. Safe to re-run — won't create duplicates.

# Preview first (dry run)
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
  --from-date 2026-03-24 \
  --to-date 2026-03-24 \
  --dry-run

# Then run for real
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
  --from-date 2026-03-24 \
  --to-date 2026-03-24

Data source: Backfill parses memory/YYYY-MM-DD.md files and extracts structured task outcome blocks. This is why the task outcome block format in memory files must be consistent — inconsistent format breaks parsing silently.

Backfill ID format: backfill-YYYY-MM-DD-<agent>-<task-slug> — deterministic, no duplicate risk.

Replay and Judge

# Report on all Kit traces (past 30 days)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "agent:kit" --report

# Compare all Kit traces against Haiku (cost reduction analysis)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "agent:kit" --models "claude-haiku-4-5" --judge "claude-haiku-4-5" --report

# Judge a specific trace
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --trace-id "backfill-2026-03-24-kit-setup-rebuild" \
  --models "claude-haiku-4-5" --judge "claude-haiku-4-5"

# Filter by project
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "project:reddi-agent-protocol" --report

# Filter by skill
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "skill:product-tour-capture" --report

Verify Traces Appeared

After logging, verify in Langfuse UI: http://localhost:3100

Or check programmatically:

~/.chatterbox-venv/bin/python3 -c "
import subprocess
sk = subprocess.run(
    ['op', 'read', 'op://OpenClaw/Langfuse (Local)/credential'],
    capture_output=True, text=True
).stdout.strip()
from langfuse import Langfuse
lf = Langfuse(public_key='pk-lf-openclaw-local', secret_key=sk, host='http://localhost:3100')
traces = lf.client.trace.list(limit=5)
[print(t.name, t.id[:12]) for t in traces.data]
"

Expected output: last 5 trace names + truncated IDs. If blank, Python version issue (see warning above).

Langfuse Connection Details

Setting	Value
UI	http://localhost:3100
Public key	`pk-lf-openclaw-local`
Secret key	`op://OpenClaw/Langfuse (Local)/credential` (1Password)
Also in 1Password	`op://OpenClaw/Langfuse (Local)/Secret Key`
Docker	Always running (daemon service)

When to Call This Skill

This skill is called during Phase 4 (Traces) of the session-wrap playbook (playbooks/session-wrap/PLAYBOOK.md).

Call once per significant subagent completion. Use data from the task outcome blocks written in Phase 1 (memory file). Don't reconstruct from memory — read what you just wrote.

Minimum threshold for logging: Any subagent run that produced a deliverable (file written, API called, analysis produced). Skip: simple lookups, 1-line tool calls, failed attempts with no output.

Troubleshooting

Symptom	Cause	Fix
Trace doesn't appear in UI	Wrong Python version	Use `~/.chatterbox-venv/bin/python3`
No output, no error	Same — Python 3.14 pydantic v1 incompatibility	Same fix
`ImportError: langfuse not found`	Wrong venv	Same fix
Duplicate traces on backfill	Shouldn't happen — backfill is idempotent	Check if running logger + backfill both for same trace
`op: command not found`	1Password CLI not in PATH	Run from shell with OP_SERVICE_ACCOUNT_TOKEN set, or source `~/.zshrc` first
Langfuse UI empty after logging	Docker daemon down	`docker ps` — restart Langfuse container if needed

langfuse-trace-logger

Safety Notice

Copy this and send it to your AI assistant to learn

Skill: langfuse-trace-logger

⚠️ CRITICAL: Python Version

Basic Invocation

Trace Schema

Tag Taxonomy

Backfill Pattern

Replay and Judge

Verify Traces Appeared

Langfuse Connection Details

When to Call This Skill

Troubleshooting

Source Transparency

Related Skills

aws-ecs-monitor

Penfield

SEO Optimizer Pro

Monkeytype Tracker and Advisor