langfuse-trace-logger

Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Called during session-wrap Phase 4. Supports backfill, tag-based filtering, and replay-judge integration. Requires Python 3.11 via chatterbox-venv due to pydantic v1 compatibility.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "langfuse-trace-logger" with this command: npx skills add nissan/langfuse-trace-logger

Skill: langfuse-trace-logger

Purpose: Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Scope: Called by Loki at the end of every session wrap (Phase 4) for each significant subagent completion. Script: /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py


⚠️ CRITICAL: Python Version

Always use ~/.chatterbox-venv/bin/python3 (Python 3.11.15)

The langfuse SDK uses pydantic v1, which is incompatible with Python 3.14. Running with system Python (python3) or pyenv Python (3.14.x) causes silent failure — no import error, no exception, trace just doesn't appear in Langfuse UI. This will waste 30+ minutes of debugging.

# ✅ Correct
~/.chatterbox-venv/bin/python3 scripts/langfuse-trace-logger.py ...

# ❌ Wrong — silent failure on Python 3.14
python3 scripts/langfuse-trace-logger.py ...
/Users/loki/.pyenv/versions/3.14.3/bin/python3 scripts/langfuse-trace-logger.py ...

Basic Invocation

~/.chatterbox-venv/bin/python3 /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py \
  --session-id "$SESSION_ID" \
  --parent-id "agent:main" \
  --agent "kit" \
  --task "task-label-kebab-case" \
  --model "anthropic/claude-sonnet-4-6" \
  --status "completed" \
  --input "full task prompt given to agent (first 4000 chars)..." \
  --output "what the agent returned or accomplished..." \
  --duration 278 \
  --tokens 16900 \
  --project "reddi-agent-protocol" \
  --skills "product-tour-capture"

Trace Schema

FieldTypePurposeNotes
--session-idstringSubagent session keyUse actual subagent session key — enables lineage tracing
--parent-idstringParent session referenceAlways "agent:main" unless nested subagent
--agentstringAgent nameLowercase: kit, archie, sara, finn, quill, etc.
--taskstringTask label (kebab-case)Used for replay grouping: replay-judge.py --tag "task:kit-setup-rebuild"
--modelstringModel usede.g. anthropic/claude-sonnet-4-6, anthropic/claude-haiku-4-5
--statusstringOutcomecompleted / partial / failed
--inputstringFull task promptFirst 4000 chars — this is what gets replayed against other models in judge runs
--outputstringResult summaryAgent's output/result — this is what the judge scores
--durationintTime in secondsUsed for efficiency analysis and agent routing decisions
--tokensintTotal tokens usedUsed for cost analysis and budget governance
--projectstringProject slugMust match projects/<slug>/STATUS.md — enables project-level filtering
--skillsstringComma-separated skillse.g. "product-tour-capture,ffmpeg-studio" — enables skill effectiveness filtering

Tag Taxonomy

The logger automatically generates these tags from the fields above:

  • agent:kit — from --agent
  • model_family:claude-sonnet — derived from --model
  • project:reddi-agent-protocol — from --project
  • skill:product-tour-capture — one tag per skill in --skills
  • task:kit-setup-rebuild — from --task
  • status:completed — from --status

These tags power the replay-judge filter syntax.


Backfill Pattern

For retroactive logging when a session wrap was skipped or traces are missing.

Idempotent: Uses deterministic trace IDs based on date+agent+task hash. Safe to re-run — won't create duplicates.

# Preview first (dry run)
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
  --from-date 2026-03-24 \
  --to-date 2026-03-24 \
  --dry-run

# Then run for real
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
  --from-date 2026-03-24 \
  --to-date 2026-03-24

Data source: Backfill parses memory/YYYY-MM-DD.md files and extracts structured task outcome blocks. This is why the task outcome block format in memory files must be consistent — inconsistent format breaks parsing silently.

Backfill ID format: backfill-YYYY-MM-DD-<agent>-<task-slug> — deterministic, no duplicate risk.


Replay and Judge

# Report on all Kit traces (past 30 days)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "agent:kit" --report

# Compare all Kit traces against Haiku (cost reduction analysis)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "agent:kit" --models "claude-haiku-4-5" --judge "claude-haiku-4-5" --report

# Judge a specific trace
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --trace-id "backfill-2026-03-24-kit-setup-rebuild" \
  --models "claude-haiku-4-5" --judge "claude-haiku-4-5"

# Filter by project
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "project:reddi-agent-protocol" --report

# Filter by skill
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "skill:product-tour-capture" --report

Verify Traces Appeared

After logging, verify in Langfuse UI: http://localhost:3100

Or check programmatically:

~/.chatterbox-venv/bin/python3 -c "
import subprocess
sk = subprocess.run(
    ['op', 'read', 'op://OpenClaw/Langfuse (Local)/credential'],
    capture_output=True, text=True
).stdout.strip()
from langfuse import Langfuse
lf = Langfuse(public_key='pk-lf-openclaw-local', secret_key=sk, host='http://localhost:3100')
traces = lf.client.trace.list(limit=5)
[print(t.name, t.id[:12]) for t in traces.data]
"

Expected output: last 5 trace names + truncated IDs. If blank, Python version issue (see warning above).


Langfuse Connection Details

SettingValue
UIhttp://localhost:3100
Public keypk-lf-openclaw-local
Secret keyop://OpenClaw/Langfuse (Local)/credential (1Password)
Also in 1Passwordop://OpenClaw/Langfuse (Local)/Secret Key
DockerAlways running (daemon service)

When to Call This Skill

This skill is called during Phase 4 (Traces) of the session-wrap playbook (playbooks/session-wrap/PLAYBOOK.md).

Call once per significant subagent completion. Use data from the task outcome blocks written in Phase 1 (memory file). Don't reconstruct from memory — read what you just wrote.

Minimum threshold for logging: Any subagent run that produced a deliverable (file written, API called, analysis produced). Skip: simple lookups, 1-line tool calls, failed attempts with no output.


Troubleshooting

SymptomCauseFix
Trace doesn't appear in UIWrong Python versionUse ~/.chatterbox-venv/bin/python3
No output, no errorSame — Python 3.14 pydantic v1 incompatibilitySame fix
ImportError: langfuse not foundWrong venvSame fix
Duplicate traces on backfillShouldn't happen — backfill is idempotentCheck if running logger + backfill both for same trace
op: command not found1Password CLI not in PATHRun from shell with OP_SERVICE_ACCOUNT_TOKEN set, or source ~/.zshrc first
Langfuse UI empty after loggingDocker daemon downdocker ps — restart Langfuse container if needed

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

aws-ecs-monitor

AWS ECS production health monitoring with CloudWatch log analysis — monitors ECS service health, ALB targets, SSL certificates, and provides deep CloudWatch...

Registry SourceRecently Updated
Research

Penfield

Persistent memory for OpenClaw agents. Store decisions, preferences, and context that survive across sessions. Build knowledge graphs that compound over time...

Registry SourceRecently Updated
2.6K5Profile unavailable
Research

SEO Optimizer Pro

AI-powered SEO content analysis and optimization for improved Google ranking and visibility in emerging AI search platforms like ChatGPT and Claude.

Registry SourceRecently Updated
Research

Monkeytype Tracker and Advisor

Track and analyze Monkeytype typing statistics with improvement tips. Use when user mentions "monkeytype", "typing stats", "typing speed", "WPM", "typing practice", "typing progress", or wants to check their typing performance. Features on-demand stats, test history analysis, personal bests, progress comparison, leaderboard lookup, and optional automated reports. Requires user's Monkeytype ApeKey for API access.

Registry SourceRecently Updated
1.7K0Profile unavailable