joelclaw-system-check

Run a comprehensive health check of the joelclaw system — k8s cluster, worker, Inngest, Redis, Typesense/OTEL, tests, TypeScript, repo sync, memory pipeline, pi-tools, git config, active loops, disk, stale tests. Outputs a 1-10 score with per-component breakdown. Use when: 'system health', 'health check', 'is everything working', 'system status', 'how's the system', 'check everything', or at session start to orient.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "joelclaw-system-check" with this command: npx skills add joelhooks/joelclaw/joelhooks-joelclaw-joelclaw-system-check

joelclaw System Health Check

Run scripts/health.sh for a full system health report with 1-10 score.

~/Code/joelhooks/joelclaw/skills/joelclaw-system-check/scripts/health.sh

What It Checks (16 components)

CheckWhatGreen (10)Yellow (5-7)Red (1-3)
k8s clusterpods in joelclaw namespace4/4 Running, 0 restartspartial podsno pods
pdsAT Proto PDS on :2583version + collectionspod running, port-forward downpod not running
workersystem-bus on :311116+ functionsresponding, low countdown
inngest server:8288 reachablerespondingdown
redis/gatewayRedis + gateway session queuesconnected, low pending queueconnected, backlog risingunavailable
typesense/otelTypesense health + OTEL query pathhealthy + queryablehealthy, query degradedunavailable
testsbun test in system-bus0 failfailures
tsctsc --noEmitcleantype errors
repo syncmonorepo HEAD vs origin/mainin syncahead/behindrepo unavailable
memory pipelinejoelclaw inngest memory-healthhealthy checksdegraded checksfailing checks
pi-toolsextension deps installedall 3 depsmissing
git configuser.name + email setsetmissing
active loopsjoelclaw loop listqueryablequery degradedunavailable
gogcliGoogle Workspace authaccount authed, token validtoken stored, no passwordnot configured
diskfree space + loop tmp<80% used>80%
stale tests__tests__/ + acceptance testscleanpresent

When to Run

  • Session start — orient on system state before doing work
  • After loops complete — verify nothing broke
  • After infra changes — k8s, worker, Redis config
  • When something feels off — quick triage

Fixing Common Issues

Repo drift: cd ~/Code/joelhooks/joelclaw && git fetch origin && git status -sb

pi-tools broken: cd ~/.pi/agent/git/github.com/joelhooks/pi-tools && bun add @sinclair/typebox @mariozechner/pi-coding-agent @mariozechner/pi-tui @mariozechner/pi-ai

PDS unreachable: kubectl port-forward -n joelclaw svc/bluesky-pds 2583:3000 & (or if pod down: kubectl rollout restart deployment/bluesky-pds -n joelclaw)

Worker down: joelclaw inngest restart-worker --register

Stale tests: rm -rf ~/Code/joelhooks/joelclaw/packages/system-bus/__tests__/ && find ~/Code/joelhooks/joelclaw/packages/system-bus/src -name "*.acceptance.test.ts" -delete

Loop tmp bloat: rm -rf /tmp/agent-loop/loop-*/ (only when no loops are running)

Inngest Hung-Run Quick Triage

When a run appears stuck after first step:

joelclaw run <run-id>

If trace shows Finalization failure with "Unable to reach SDK URL":

  1. Verify registration/health: joelclaw inngest status

  2. Verify function is present where expected: joelclaw functions | rg -i "manifest-archive|<function-name>"

  3. Check for stale app registrations in Inngest UI/API and remove stale SDK URLs.

  4. Assume possible handler blocking (not just network): review recent step code for filesystem/Redis/subprocess blocking before step response.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

cli-design

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

github-bot

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

codex-prompting

No summary provided by upstream source.

Repository SourceNeeds Review