joelclaw System Health Check

Run scripts/health.sh for a full system health report with 1-10 score.

~/Code/joelhooks/joelclaw/skills/joelclaw-system-check/scripts/health.sh

What It Checks (16 components)

Check	What	Green (10)	Yellow (5-7)	Red (1-3)
k8s cluster	pods in `joelclaw` namespace	4/4 Running, 0 restarts	partial pods	no pods
pds	AT Proto PDS on :2583	version + collections	pod running, port-forward down	pod not running
worker	system-bus on :3111	16+ functions	responding, low count	down
inngest server	:8288 reachable	responding	—	down
redis/gateway	Redis + gateway session queues	connected, low pending queue	connected, backlog rising	unavailable
typesense/otel	Typesense health + OTEL query path	healthy + queryable	healthy, query degraded	unavailable
tests	`bun test` in system-bus	0 fail	—	failures
tsc	`tsc --noEmit`	clean	—	type errors
repo sync	monorepo HEAD vs `origin/main`	in sync	ahead/behind	repo unavailable
memory pipeline	`joelclaw inngest memory-health`	healthy checks	degraded checks	failing checks
pi-tools	extension deps installed	all 3 deps	—	missing
git config	user.name + email set	set	—	missing
active loops	`joelclaw loop list`	queryable	query degraded	unavailable
gogcli	Google Workspace auth	account authed, token valid	token stored, no password	not configured
disk	free space + loop tmp	<80% used	—	>80%
stale tests	`__tests__/` + acceptance tests	clean	—	present

When to Run

Session start — orient on system state before doing work
After loops complete — verify nothing broke
After infra changes — k8s, worker, Redis config
When something feels off — quick triage

Fixing Common Issues

Repo drift: cd ~/Code/joelhooks/joelclaw && git fetch origin && git status -sb

pi-tools broken: cd ~/.pi/agent/git/github.com/joelhooks/pi-tools && bun add @sinclair/typebox @mariozechner/pi-coding-agent @mariozechner/pi-tui @mariozechner/pi-ai

PDS unreachable: kubectl port-forward -n joelclaw svc/bluesky-pds 2583:3000 & (or if pod down: kubectl rollout restart deployment/bluesky-pds -n joelclaw)

Worker down: joelclaw inngest restart-worker --register

Stale tests: rm -rf ~/Code/joelhooks/joelclaw/packages/system-bus/__tests__/ && find ~/Code/joelhooks/joelclaw/packages/system-bus/src -name "*.acceptance.test.ts" -delete

Loop tmp bloat: rm -rf /tmp/agent-loop/loop-*/ (only when no loops are running)

Inngest Hung-Run Quick Triage

When a run appears stuck after first step:

joelclaw run <run-id>

If trace shows Finalization failure with "Unable to reach SDK URL":

Verify registration/health: joelclaw inngest status
Verify function is present where expected: joelclaw functions | rg -i "manifest-archive|<function-name>"
Check for stale app registrations in Inngest UI/API and remove stale SDK URLs.
Assume possible handler blocking (not just network): review recent step code for filesystem/Redis/subprocess blocking before step response.

joelclaw-system-check

Safety Notice

Copy this and send it to your AI assistant to learn

joelclaw System Health Check

What It Checks (16 components)

When to Run

Fixing Common Issues

Inngest Hung-Run Quick Triage

Source Transparency

Related Skills

cli-design

github-bot

codex-prompting