joelclaw — CLI & Event Bus
The joelclaw CLI is the primary operator interface to the entire joelclaw system: event bus (Inngest), gateway, observability (OTEL), agent loops, subscriptions, and more. Built with @effect/cli, returns HATEOAS JSON envelopes.
If the CLI crashes, that's the highest priority fix.
Binary: ~/.bun/bin/joelclaw
Source: ~/Code/joelhooks/joelclaw/packages/cli/
Build: bun build packages/cli/src/cli.ts --compile --outfile ~/.bun/bin/joelclaw
Architecture
┌─ Colima VM (VZ framework, aarch64) ────────────────────────────────┐
│ Talos v1.12.4 → k8s v1.35.0 (single node, namespace: joelclaw) │
│ │
│ inngest-0 StatefulSet ports 8288 (API), 8289 (dash) │
│ redis-0 StatefulSet port 6379 │
│ typesense-0 StatefulSet port 8108 │
│ system-bus-worker Deployment port 3111 (110+ functions) │
│ docs-api Deployment port 3838 │
│ livekit-server Deployment ports 7880, 7881 │
│ bluesky-pds Deployment port 3000 │
│ │
│ ⚠️ Inngest service named inngest-svc (not inngest) │
│ k8s auto-injects INNGEST_PORT env collision otherwise │
└─────────────────────────────────────────────────────────────────────┘
↕ NodePort on localhost
Gateway daemon (always-on pi session, Redis event bridge)
NAS "three-body" (ASUSTOR, 10GbE NFS, 64TB RAID5 + NVMe cache)
Vault ~/Vault (Obsidian, PARA method — ADRs, system log, contacts)
Inngest event key: 37aa349b89692d657d276a40e0e47a15
k8s manifests: ~/Code/joelhooks/joelclaw/k8s/
CLI Command Reference
Health & Status
joelclaw status # Health: server + worker + k8s pods
joelclaw jobs status # Unified runtime workload snapshot (queue + Restate + Dkron + Inngest)
joelclaw inngest status # Inngest server details
joelclaw functions # List all 110+ registered functions
joelclaw refresh # Force re-register with Inngest server
Send Events
joelclaw send "event/name" --data '{"key":"value"}'
joelclaw send "pipeline/video.download" --data '{"url":"https://youtube.com/watch?v=XXX"}'
joelclaw send "agent/story.start" --data '{"prdPath":"/abs/path/prd.json","storyId":"S-1"}'
View Runs
joelclaw runs # Recent 10
joelclaw runs --count 20 --hours 24 # More runs, wider window
joelclaw runs --status FAILED # Just failures
joelclaw run <RUN_ID> # Step trace + errors for one run
View Events
joelclaw events # Last 4 hours
joelclaw events --prefix memory/ --hours 24
joelclaw events --prefix agent/ --hours 24
joelclaw events --count 50 --hours 48
Logs
joelclaw logs # Worker stdout (default 30 lines)
joelclaw logs errors # Worker stderr (stack traces)
joelclaw logs server # Inngest k8s pod logs
joelclaw logs server -n 50 --grep error # Filtered server errors
joelclaw logs worker --grep "observe" # Grep worker logs
Structured Log Writes
joelclaw log write --action configure --tool cli --detail "updated capability adapter config" --reason "ADR-0169 phase 1"
log writes canonical structured entries (slog backend). logs remains runtime log read/analyze.
Secrets
joelclaw secrets status
joelclaw secrets lease <name> --ttl 15m
joelclaw secrets revoke <lease-id>
joelclaw secrets revoke --all
joelclaw secrets audit --tail 50
joelclaw secrets env --dry-run
Notify
joelclaw notify send "Worker restarted and healthy" --priority normal
joelclaw notify send "Immediate action required" --priority urgent --telegram-only
Capability Adapter Paths (ADR-0169 phase 4)
mail, otel, recall, and subscribe now run through the CLI capability registry/adapter runtime while preserving their existing command UX and JSON envelopes.
Deploy
joelclaw deploy worker # dry-run deploy plan
joelclaw deploy worker --restart --execute # execute worker sync deployment
joelclaw deploy worker --restart --execute --force # force with active runs (disruptive)
Heal
joelclaw heal list
joelclaw heal run RUN_FAILED --phase fix --context '{"run-id":"01ABC"}' # dry-run
joelclaw heal run RUN_FAILED --phase fix --context '{"run-id":"01ABC"}' --execute # execute
Gateway
joelclaw gateway status # Gateway health + session info
joelclaw gateway events # Recent gateway events
joelclaw gateway test # Send test message through gateway
joelclaw gateway restart # Restart gateway daemon
joelclaw gateway stream # Live stream gateway events
Async runtime monitoring in pi
The loaded pi extension at packages/pi-extensions/inngest-monitor/index.ts now exposes runtime_jobs_monitor alongside inngest_send / inngest_runs.
Use it when you want a pi session to keep watch on the ADR-0217 runtime substrate while you do other work:
runtime_jobs_monitor {"action":"start","interval":5,"report":true}— start background polling ofjoelclaw jobs statusruntime_jobs_monitor {"action":"status"}— inspect the latest runtime snapshotruntime_jobs_monitor {"action":"stop"}— stop the poller and send a final follow-up summary
The widget shows runtime health first (queue / Restate / Dkron / Inngest), then any followed Inngest runs underneath. Severity changes and meaningful workload-state changes emit OTEL and hidden follow-up messages for asynchronous report-back, so a healthy-but-now-held backlog does not stay silent.
Observability (OTEL)
joelclaw otel list --hours 1 # Recent telemetry events
joelclaw otel search "error" --hours 24 # Search OTEL events
joelclaw otel stats --hours 24 # Aggregate stats
joelclaw otel emit "action.name" --source codex --component agent-loop --success # Emit event
System Knowledge (ADR-0199)
joelclaw knowledge search "query" # Search system knowledge
joelclaw knowledge search "query" --type adr # Filter by type (adr|skill|lesson|pattern|retro|failed_target)
joelclaw knowledge sync # Re-index ADRs + skills from filesystem
joelclaw knowledge clear-failed <target> # Clear resolved failed targets
Brain/codebase patterns (browsable by agents):
ls ~/Vault/system/brain/codebase/ # List established patterns
cat ~/Vault/system/brain/codebase/<name>.md # Read a specific pattern
Subscriptions (ADR-0127)
joelclaw subscribe list # All feed subscriptions
joelclaw subscribe add <url> [--name N] # Add a feed
joelclaw subscribe remove <url> # Remove a feed
joelclaw subscribe check [--url URL] # Check feeds for new items
joelclaw subscribe summary # Summary of recent items
Agent Runtime Validation (ADR-0180)
Use this exact smoke test when validating roster dispatch end-to-end:
joelclaw agent list
joelclaw agent run coder "reply with OK" --timeout 20
joelclaw event <event-id>
Expected signal:
agent listincludes builtincoder,designer,opseventshows oneAgent Task Runwithstatus: COMPLETED- run output contains
{"status":"completed", ...}
Failure handling:
Unknown agent roster entry: codermeans runtime drift, not prompt failure.- Deploy latest
system-bus-worker - Restart host worker process
- Re-run the same 3-step smoke
- Deploy latest
- If Inngest API is unreachable (
localhost:8288), recover local control-plane first (Colima/Talos), then retry validation.
Semantic Recall
joelclaw recall "query about past context" # Search semantic memory
Discovery
joelclaw discover "https://example.com" --context "why this is interesting"
Agent Loops
# Start a loop
joelclaw loop start --project ~/Code/joelhooks/joelclaw \
--goal "Implement feature X" \
--context ~/Vault/docs/decisions/0XXX.md \
--max-retries 2
# Start with existing PRD
joelclaw loop start --project PATH --prd prd.json --max-retries 2
# Monitor
joelclaw loop status <LOOP_ID>
joelclaw loop status <LOOP_ID> -c # Compact: one line per story
joelclaw loop status <LOOP_ID> -v # Verbose: criteria, output paths
joelclaw watch <LOOP_ID> # Live: polls 15s, exits on completion
joelclaw watch # Auto-detects active loop
# Management
joelclaw loop list # All loops in Redis
joelclaw loop cancel <LOOP_ID> # Stop + cleanup
joelclaw loop nuke dead # Remove completed loops from Redis
Other Commands
joelclaw sleep [on|off|status] # Sleep mode for gateway
joelclaw note <text> # Quick note to Vault
joelclaw vault read <ref> # Resolve/read ADR/project/path refs
joelclaw vault search <query> # Search vault markdown
joelclaw vault adr list # ADR inventory (optionally by status)
joelclaw vault adr collisions # ADR number collision report
joelclaw vault adr audit # ADR health + collision + index checks
joelclaw vault adr rank # ADR NRC+novelty ranking
joelclaw skills audit [--deep] # On-demand skill garden report
joelclaw search <query> # Full-text search
joelclaw email [scan|triage] # Email operations
joelclaw x [post|mentions] # X/Twitter operations
joelclaw nas [status|health] # NAS operations
joelclaw diagnose <topic> # System diagnosis
joelclaw langfuse [traces|costs] # Langfuse analytics
joelclaw deploy worker [--restart] [--execute]
joelclaw heal [list|run]
joelclaw inngest sync-worker [--restart] # Worker lifecycle
For vault-heavy or ADR-gardening tasks, use the dedicated vault skill.
Output Modes
Most commands support --compact/-c for plain text. Use compact for monitoring.
JSON (default) returns HATEOAS envelopes with next_actions.
Story Pipeline (ADR-0155)
3-stage pipeline: implement → prove → judge. Each story runs through the stages with Inngest durability.
# Fire a single story
joelclaw send agent/story.start -d '{
"prdPath": "/Users/joel/Code/joelhooks/joelclaw/prd.json",
"storyId": "CFP-2"
}'
# ⚠️ ALWAYS use absolute path for prdPath — worker CWD is packages/system-bus/
PRD format (Zod-validated):
{
"name": "Project Name",
"context": {},
"stories": [
{
"id": "STORY-1",
"title": "What to build",
"description": "Details",
"priority": 1,
"acceptance": ["criterion 1", "criterion 2"],
"files": ["path/to/relevant/file.ts"]
}
]
}
Critical:
contextmust be{}or object — NEVER null or string- Every story needs
priority(number) - NEVER set
retries: 0on Inngest functions — breaks restart safety (ADR-0156)
Event Types
Pipelines
| Event | Chain |
|---|---|
pipeline/video.download | → video-download → transcript-process → content-summarize |
pipeline/transcript.process | → transcript-process → content-summarize |
content/summarize | → content-summarize |
content/updated | → content-sync (git commit vault changes) |
docs/ingest | → docs-ingest (PDF/markdown → vector store) |
Memory
| Event | Chain |
|---|---|
memory/session.compaction.pending | → observe-session |
memory/session.ended | → observe-session |
memory/observations.accumulated | → reflect |
memory/observations.reflected | → promote (if proposals pending) |
Agent Loops
| Event | Flow |
|---|---|
agent/story.start | → story-pipeline (implement → prove → judge) |
agent/loop.started | → plan → story pipeline → complete |
agent/loop.story.passed | → plan (next story) |
agent/loop.story.failed | → plan (retry or next) |
agent/loop.completed | → complete (merge-back + cleanup) |
Gateway & Channels
| Event | Purpose |
|---|---|
gateway/message.received | Incoming message from any channel |
gateway/heartbeat | Gateway health ping |
channel/telegram.callback | Telegram callback queries |
Subscriptions & Discovery
| Event | Purpose |
|---|---|
discovery/noted | URL/idea captured → enrichment pipeline |
subscriptions/check | Poll feeds for new items |
System & Scheduled
| Event | Purpose |
|---|---|
system/log | System log entry |
system/health.check | Scheduled health monitoring |
cron/daily-digest | Morning digest generation |
cron/check-email | Periodic email scan |
cron/check-calendar | Calendar check |
cron/nightly-maintenance | Typesense + system maintenance |
Notifications
| Event | Source |
|---|---|
webhook/github | GitHub webhook events |
webhook/vercel | Vercel deploy events |
webhook/todoist | Todoist webhook events |
webhook/front | Front webhook events |
Debugging Failed Runs
joelclaw runs --status FAILED # 1. Find the failure
joelclaw run <RUN_ID> # 2. Step trace + inline errors
joelclaw logs errors # 3. Worker stderr
joelclaw logs server --grep error # 4. Inngest server errors
joelclaw otel search "error" --hours 1 # 5. OTEL telemetry
Common Failure Patterns
| Symptom | Cause | Fix |
|---|---|---|
| Events accepted but functions never run | Inngest can't reach worker | joelclaw refresh, check worker pod |
| "Unable to reach SDK URL" | Worker unreachable from cluster | Restart worker, joelclaw refresh |
| Loop story SKIPPED | Tests/typecheck failed in worktree | Check attempt output |
| Run stuck in RUNNING | Worker crashed mid-step | joelclaw logs errors, restart worker |
INNGEST_PORT env collision | k8s service named inngest | Service is inngest-svc — keep this |
| Implement step killed on deploy | Worker restart killed in-flight step | ADR-0156: retries: 2 survives this |
Stale RUNNING forensics (SDK unreachable ghosts)
When joelclaw runs --status RUNNING shows old health jobs that never clear:
- Use the operator command first
- Preview:
joelclaw inngest sweep-stale-runs - Apply (backup + transaction):
joelclaw inngest sweep-stale-runs --apply
- Preview:
- Validate the symptom class
joelclaw run <run-id>- Look for trace/finalization errors containing
Unable to reach SDK URLorEOF writing request to SDK.
- Treat list vs detail disagreements as a known mask issue
runslist can show stale metadata.rundetail + trace/history is the source of truth.
- Raw runtime DB edits are last resort only
- Inngest state is in k8s StatefulSet PVC:
inngest-0:/data/main.db. - Backup first:
kubectl -n joelclaw exec inngest-0 -- sqlite3 /data/main.db '.backup /data/main.db.pre-sweep-<ts>.sqlite'.
- Inngest state is in k8s StatefulSet PVC:
- Terminalize stale runs with full contract, not partial edits
- Insert missing
history.type='FunctionCancelled'for stale runs. - Ensure
function_finishesrow exists. - Then set
trace_runs.status=500(cancelled) for stale candidates.
- Insert missing
- Verify after mutation
joelclaw run <run-id>should resolve terminal state.joelclaw runs --status RUNNINGshould only show genuinely active runs.
Never mutate main.db without a point-in-time backup.
Deploying Worker Changes
Use the publish script — it handles build, push, k8s apply, and rollout:
~/Code/joelhooks/joelclaw/k8s/publish-system-bus-worker.sh
See the sync-system-bus skill for the full deploy workflow.
Key Paths
| What | Path |
|---|---|
| CLI source | packages/cli/src/ |
| CLI commands | packages/cli/src/commands/ |
| CLI binary | ~/.bun/bin/joelclaw |
| Worker source | packages/system-bus/ |
| Inngest functions | packages/system-bus/src/inngest/functions/ |
| Function index (host) | packages/system-bus/src/inngest/functions/index.host.ts |
| Function index (cluster) | packages/system-bus/src/inngest/functions/index.cluster.ts |
| Inference utility | packages/system-bus/src/lib/inference.ts |
| Gateway source | packages/gateway/ |
| k8s manifests | k8s/ |
| Deploy script | k8s/publish-system-bus-worker.sh |
| ADRs | ~/Vault/docs/decisions/ |
| System log | ~/Vault/system/system-log.jsonl |
| Loop attempt output | /tmp/agent-loop/{loopId}/{storyId}-{attempt}.out |
Building the CLI
cd ~/Code/joelhooks/joelclaw
bun build packages/cli/src/cli.ts --compile --outfile ~/.bun/bin/joelclaw
Test after every change:
joelclaw status
joelclaw send --help
joelclaw runs --count 1
CLI commands are in packages/cli/src/commands/, one file per command. Follow the cli-design skill. Heavy deps must be lazy-loaded — top-level import crashes are unacceptable.