Pueue Job Orchestration

Universal CLI telemetry layer and job management — every command routed through pueue gets precise timing, exit code capture, full stdout/stderr logs, environment snapshots, and callback-on-completion.

Overview

Pueue is a Rust CLI tool for managing shell command queues. It provides:

Daemon persistence - Survives SSH disconnects, crashes, reboots
Disk-backed queue - Auto-resumes after any failure
Group-based parallelism - Control concurrent jobs per group
Easy failure recovery - Restart failed jobs with one command
Full telemetry - Timing, exit codes, stdout/stderr logs, env snapshots per task

When to Route Through Pueue

Operation Route Through Pueue? Why

Any command >30 seconds Always Telemetry, persistence, log capture

Batch operations (>3 items) Always Parallelism control, failure isolation

Build/test pipelines Recommended --after DAGs, group monitoring

Data processing Always Checkpoint resume, state management

Quick one-off commands (<5s) Optional Overhead is ~100ms, but you get logs

Interactive commands (editors, REPLs) Never Pueue can't handle stdin interaction

When to Use This Skill

Use this skill when the user mentions:

Trigger Example

Running tasks on BigBlack/LittleBlack "Run this on bigblack"

Long-running data processing "Populate the cache for all symbols"

Batch/parallel operations "Process these 70 jobs"

SSH remote execution "Execute this overnight on the GPU server"

Cache population "Fill the ClickHouse cache"

Pueue features "Set up a callback", "delay this job"

Quick Reference

Check Status

Local

pueue status

Remote (BigBlack)

ssh bigblack "~/.local/bin/pueue status"

Queue a Job

Local (with working directory)

pueue add -w ~/project -- python long_running_script.py

Local (simple)

pueue add -- python long_running_script.py

Remote (BigBlack)

ssh bigblack "~/.local/bin/pueue add -w ~/project -- uv run python script.py"

With group (for parallelism control)

pueue add --group p1 --label "BTCUSDT@1000" -w ~/project -- python populate.py --symbol BTCUSDT

Monitor Jobs

pueue follow <id> # Watch job output in real-time pueue log <id> # View completed job output pueue log <id> --full # Full output (not truncated)

Manage Jobs

pueue restart <id> # ⚠ Creates NEW task (see warning below) pueue restart --in-place <id> # Restarts task in-place (no new ID) pueue restart --all-failed # ⚠ Restarts ALL failed across ALL groups pueue kill <id> # Kill running job pueue clean # Remove completed jobs from list pueue reset # Clear all jobs (use with caution)

CRITICAL WARNING — pueue restart semantics:

pueue restart <id> does NOT restart the task in-place. It creates a brand new task with a new ID, copying the command from the original. The original stays as Done/Failed. This causes exponential task growth when used in loops or by autonomous agents. In a 2026-03-04 incident, agents calling pueue restart on failed tasks grew 60 jobs to ~12,800.

Use --in-place if you truly need to restart: pueue restart --in-place <id>
Verify before restart: Read pueue log <id> to check if the failure is persistent (missing data, bad args) — retrying will never help
Never use --all-failed without --group filter — it restarts every failed task across ALL groups

Host Configuration

Host Location Parallelism Groups

BigBlack ~/.local/bin/pueue

p1 (16), p2 (2), p3 (3), p4 (1)

LittleBlack ~/.local/bin/pueue

default (2)

Local (macOS) /opt/homebrew/bin/pueue

default

Core Workflows

Queue Single Remote Job

Step 1: Verify daemon is running

ssh bigblack "~/.local/bin/pueue status"

Step 2: Queue the job

ssh bigblack "~/.local/bin/pueue add --label 'my-job' -- cd ~/project && uv run python script.py"

Step 3: Monitor progress

ssh bigblack "~/.local/bin/pueue follow <id>"

Batch Job Submission (Multiple Symbols)

For rangebar cache population or similar batch operations:

Use the pueue-populate.sh script

ssh bigblack "cd ~/rangebar-py && ./scripts/pueue-populate.sh setup" # One-time ssh bigblack "cd ~/rangebar-py && ./scripts/pueue-populate.sh phase1" # Queue Phase 1 ssh bigblack "cd ~/rangebar-py && ./scripts/pueue-populate.sh status" # Check progress

Configure Parallelism Groups

Create groups with different parallelism limits

pueue group add fast # Create 'fast' group pueue parallel 4 --group fast # Allow 4 parallel jobs

pueue group add slow pueue parallel 1 --group slow # Sequential execution

Queue jobs to specific groups

pueue add --group fast -- echo "fast job" pueue add --group slow -- echo "slow job"

Handle Failed Jobs

Check what failed

pueue status | grep Failed

View error output FIRST (distinguish transient vs persistent)

pueue log <id>

Restart specific job IN-PLACE (no new task created)

pueue restart --in-place <id>

⚠ NEVER blindly restart all failed — classify failures first

pueue restart --all-failed # DANGEROUS: no group filter, creates duplicates

Failure classification before restart: Exit code 1 (app error) may be persistent (missing data, bad args — will never succeed). Exit code 137 (OOM) or 143 (SIGTERM) may be transient. Always read pueue log before restarting.

Troubleshooting

Issue Cause Solution

pueue: command not found

Not in PATH Use full path: ~/.local/bin/pueue

Connection refused

Daemon not running Start with pueued -d

Jobs stuck in Queued Group paused or at limit Check pueue status , pueue start

SSH disconnect kills jobs Not using Pueue Queue via Pueue instead of direct SSH

Job fails immediately Wrong working directory Use pueue add -w /path or cd /path && pueue add

Priority Scheduling (--priority )

Higher priority number = runs first when a queue slot opens:

Urgent validation (runs before queued lower-priority jobs)

pueue add --priority 10 -- python validate_critical.py

Normal compute (default priority is 0)

pueue add -- python train_model.py

Low-priority background task

pueue add --priority -5 -- python cleanup_logs.py

Priority only affects queued jobs waiting for an open slot. Running jobs are not preempted.

Per-Task Environment Override (pueue env )

Inject or override environment variables on stashed or queued tasks:

Create a stashed job

JOB_ID=$(pueue add --stashed --print-task-id -- python train.py)

Set environment variables (NOTE: separate args, NOT KEY=VALUE)

pueue env set "$JOB_ID" BATCH_SIZE 64 pueue env set "$JOB_ID" LEARNING_RATE 0.001

Enqueue when ready

pueue enqueue "$JOB_ID"

Syntax: pueue env set <id> KEY VALUE -- the key and value are separate positional arguments.

Constraint: Only works on stashed/queued tasks. Cannot modify environment of running tasks.

Relationship to mise.toml [env] : mise [env] remains the SSoT for default environment. Use pueue env set only for one-off overrides (e.g., hyperparameter sweeps) without modifying config files.

Blocking Wait (pueue wait )

Block until tasks complete -- simpler than polling loops for scripts:

Wait for specific task

pueue wait 42

Wait for all tasks in a group

pueue wait --group mygroup

Wait for ALL tasks across all groups

pueue wait --all

Wait quietly (no progress output)

pueue wait 42 --quiet

Wait for tasks to reach a specific status

pueue wait --status queued

Script Integration Pattern

Queue -> wait -> process results

TASK_ID=$(pueue add --print-task-id -- python etl_pipeline.py) pueue wait "$TASK_ID" --quiet EXIT_CODE=$(pueue status --json | jq -r ".tasks["$TASK_ID"].status.Done.result" 2>/dev/null) if [ "$EXIT_CODE" = "Success" ]; then echo "Pipeline succeeded" pueue log "$TASK_ID" --full else echo "Pipeline failed" pueue log "$TASK_ID" --full >&2 fi

Companion CLI Tools (bigblack)

Four lightweight CLI tools complement pueue for monitoring and notifications. All installed on bigblack at /usr/local/bin/ (noti, mprocs) or via apt (ntfy, task-spooler).

noti — Wrap Any Command with Telegram Notification

Best for one-off "notify me when this finishes" workflows. Native Telegram support.

Wrap any command — sends Telegram when it finishes

noti -g mise run kintsugi:catchup

With execution time in message

noti -g -e python long_script.py

Custom title/message

noti -g -t "Deploy done" -m "bigblack updated" mise run deploy:bigblack

Config: ~/.config/noti/noti.yaml

env vars NOTI_TELEGRAM_TOKEN , NOTI_TELEGRAM_CHAT_ID , NOTI_DEFAULT=telegram in ~/.bashrc .

ntfy — Push Notifications from Scripts/Callbacks

Best for programmatic notifications from scripts, pueue callbacks, or curl one-liners.

One-liner from any script

ntfy pub --title "Backfill done" odb-alerts "BTCUSDT@500 complete"

With priority and tags

ntfy pub --priority high --tags "warning" odb-alerts "Job failed"

curl variant (works anywhere, no binary needed)

curl -d "Backup finished" ntfy.sh/my-topic

Subscribe to topic (poll mode)

ntfy sub --poll odb-alerts

Pueue callback integration — add to ~/.config/pueue/pueue.yml :

callback: 'ntfy pub --title "pueue: {{label}}" --tags "{{status}}" odb-alerts "Task #{{id}} {{status}} ({{command}})"'

mprocs — Multi-Process TUI

Best for interactive SSH sessions watching multiple processes side-by-side.

Watch multiple pueue jobs in split panes

mprocs "pueue follow 3" "pueue follow 7" "pueue follow 8"

Monitor services

mprocs "journalctl -fu opendeviationbar-sidecar" "journalctl -fu opendeviationbar-kintsugi"

Requires interactive TTY (SSH directly, not via scripts).

task-spooler (tsp) — Simplest Job Queue

Best for quick ad-hoc sequential/parallel tasks without pueue group setup.

Queue tasks (default: 1 parallel)

tsp python train.py tsp python validate.py

Set parallel slots

tsp -S 3

List queue

tsp -l

View job output

tsp -c 0

Combine with noti

tsp bash -c 'python train.py && noti -g -m "training done"'

Tool Selection Guide

Need to... ├── Get notified when a command finishes? → noti -g <command> ├── Send notification from a script/callback? → ntfy pub topic "msg" ├── Watch multiple logs/processes live? → mprocs "cmd1" "cmd2" ├── Quick sequential queue (no groups needed)? → tsp <command> └── Persistent queue with groups, priorities? → pueue add <command>

Deep-Dive References

Topic Reference

Installation (macOS, Linux, systemd, launchd) Installation Guide

Production lessons: --after chaining, forensic audit, per-year parallelization, pipeline monitoring Production Lessons

State bloat prevention, bulk xargs -P submission, two-tier architecture (300K+ jobs), crash recovery State Management & Bulk Submission

ClickHouse thread tuning, parallelism sizing formula, live tuning ClickHouse Tuning

Callback hooks, template variables, delayed scheduling (--delay ) Callbacks & Scheduling

python-dotenv secrets pattern, rangebar-py integration Environment & Secrets

Claude Code integration, synchronous wrapper, telemetry queries Claude Code Integration

All pueue.yml settings (shared, client, daemon, profiles) Pueue Config Reference

Pueue vs Temporal: When to Use Which

For structured, repeatable workflows needing durability and dedup, consider Temporal (pip install temporalio , brew install temporal ):

Dimension Pueue Temporal

Best for Ad-hoc shell commands, SSH remote jobs Structured, repeatable workflows

Setup Single binary, zero infra Server + database (dev: temporal server start-dev )

Dedup None — restart creates new tasks Workflow ID uniqueness (built-in)

Retry policy None — manual or external agent Per-activity: max attempts, backoff, non-retryable errors

Parallelism pueue parallel N --group G

max_concurrent_activities=N on worker

Visibility pueue status (text, scales poorly) Web UI with pagination, search, event history

Recommendation: Keep pueue for ad-hoc work. Migrate structured pipelines to Temporal when dedup and retry policies matter. See devops-tools:distributed-job-safety for invariants that apply to both.

Hook: itp-hooks/posttooluse-reminder.ts
Reminds to use Pueue for detected long-running commands
Reference: Pueue GitHub
Companion tools: noti (command wrapper → Telegram), ntfy (pub-sub notifications), mprocs (multi-process TUI), task-spooler (minimal job queue)
SOTA Alternative: Temporal — durable workflow orchestration with built-in dedup, retry, visibility
Issue: rangebar-py#77 - Original implementation
Issue: rangebar-py#88 - Production deployment lessons

pueue-job-orchestration

Safety Notice

Copy this and send it to your AI assistant to learn

Local

Remote (BigBlack)

Local (with working directory)

Local (simple)

Remote (BigBlack)

With group (for parallelism control)

Step 1: Verify daemon is running

Step 2: Queue the job

Step 3: Monitor progress

Use the pueue-populate.sh script

Create groups with different parallelism limits

Queue jobs to specific groups

Check what failed

View error output FIRST (distinguish transient vs persistent)

Restart specific job IN-PLACE (no new task created)

⚠ NEVER blindly restart all failed — classify failures first

pueue restart --all-failed # DANGEROUS: no group filter, creates duplicates

Urgent validation (runs before queued lower-priority jobs)

Normal compute (default priority is 0)

Low-priority background task

Create a stashed job

Set environment variables (NOTE: separate args, NOT KEY=VALUE)

Enqueue when ready

Wait for specific task

Wait for all tasks in a group

Wait for ALL tasks across all groups

Wait quietly (no progress output)

Wait for tasks to reach a specific status

Queue -> wait -> process results

Wrap any command — sends Telegram when it finishes

With execution time in message

Custom title/message

One-liner from any script

With priority and tags

curl variant (works anywhere, no binary needed)

Subscribe to topic (poll mode)

Watch multiple pueue jobs in split panes

Monitor services

Queue tasks (default: 1 parallel)

Set parallel slots

List queue

View job output

Combine with noti

Source Transparency

Related Skills

pandoc-pdf-generation

mql5-indicator-patterns

mise-tasks

semantic-release