gpu-keepalive-with-keepgpu

Install and operate KeepGPU for GPU keep-alive with both blocking CLI and non-blocking service workflows. Use when users ask for keep-gpu command construction, start/status/stop session control, dashboard usage, tuning (--vram, --interval, --busy-threshold), installation from this repository, or troubleshooting keep sessions; do not use for repository development, code refactoring, or unrelated Python tooling.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gpu-keepalive-with-keepgpu" with this command: npx skills add wangmerlyn/gpu-keepalive-with-keepgpu

KeepGPU CLI Operator

Use this workflow to run keep-gpu safely and effectively.

Prerequisites

  • Confirm at least one GPU is visible (python -c "import torch; print(torch.cuda.device_count())").
  • Run commands in a shell where CUDA/ROCm drivers are already available.
  • Use Ctrl+C to stop KeepGPU and release memory cleanly.

Install KeepGPU

Install PyTorch first for your platform, then install KeepGPU.

Option A: Install from package index

# CUDA example (change cu121 to your CUDA version)
pip install --index-url https://download.pytorch.org/whl/cu121 torch
pip install keep-gpu
# ROCm example (change rocm6.1 to your ROCm version)
pip install --index-url https://download.pytorch.org/whl/rocm6.1 torch
pip install keep-gpu[rocm]

Option B: Install directly from Git URL (no local clone)

Prefer this option when users only need the CLI and do not need local source edits. This avoids checkout directory and cleanup overhead.

pip install "git+https://github.com/Wangmerlyn/KeepGPU.git"

If SSH access is configured:

pip install "git+ssh://git@github.com/Wangmerlyn/KeepGPU.git"

ROCm variant from Git URL:

pip install "keep_gpu[rocm] @ git+https://github.com/Wangmerlyn/KeepGPU.git"

Option C: Install from a local source checkout (explicit path)

Use this option only when users already have a local checkout or plan to edit source.

git clone https://github.com/Wangmerlyn/KeepGPU.git
cd KeepGPU
pip install -e .

If the checkout already exists somewhere else, install by absolute path:

pip install -e /absolute/path/to/KeepGPU

For ROCm users from local checkout:

pip install -e ".[rocm]"

Verify installation:

keep-gpu --help

Command model

KeepGPU supports two execution modes.

Blocking mode (compatibility)

keep-gpu --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25

Use when users intentionally want one foreground process and manual Ctrl+C stop.

Non-blocking mode (recommended for agents)

keep-gpu start --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
keep-gpu status
keep-gpu stop --all
keep-gpu service-stop

start auto-starts local service when unavailable.

Ctrl+C stops only foreground blocking runs. For service mode sessions started by keep-gpu start, use keep-gpu status, keep-gpu stop, and keep-gpu service-stop.

CLI options to tune:

  • --gpu-ids: comma-separated IDs (0, 0,1). If omitted, KeepGPU uses all visible GPUs.
  • --vram: VRAM to hold per GPU (512MB, 1GiB, or raw bytes).
  • --interval: seconds between keep-alive cycles.
  • --busy-threshold (--util-threshold alias): if utilization is above this percent, KeepGPU backs off.

Legacy compatibility:

  • --threshold is deprecated but still accepted.
  • Numeric --threshold maps to busy threshold.
  • String --threshold maps to VRAM.

Agent workflow

  1. Collect workload intent: target GPUs, hold duration, and whether node is shared.
  2. Choose mode:
    • blocking mode for manual shell sessions,
    • non-blocking mode for agent pipelines (default recommendation).
  3. Choose safe defaults when unspecified: --vram 1GiB, --interval 60-120, --busy-threshold 25.
  4. Provide command sequence with verification and stop command.
  5. For non-blocking mode, include status, stop, and daemon shutdown (service-stop).

Command templates

Single GPU while preprocessing (blocking):

keep-gpu --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25

All visible GPUs with lighter load (blocking):

keep-gpu --vram 512MB --interval 180

Agent-friendly non-blocking sequence:

keep-gpu start --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
keep-gpu status
keep-gpu stop --job-id <job_id>
keep-gpu service-stop

Open dashboard:

http://127.0.0.1:8765/

Remote sessions (preferred: tmux for visibility and control):

tmux new -s keepgpu
keep-gpu --gpu-ids 0 --vram 1GiB --interval 300
# Detach with Ctrl+b then d; reattach with: tmux attach -t keepgpu

Fallback when tmux is unavailable:

nohup keep-gpu --gpu-ids 0 --vram 1GiB --interval 300 > keepgpu.log 2>&1 &
echo $! > keepgpu.pid
# Monitor: tail -f keepgpu.log
# Stop: kill "$(cat keepgpu.pid)"

Troubleshooting

  • Invalid --gpu-ids: ensure comma-separated integers only.
  • Allocation failure / OOM: reduce --vram or free memory first.
  • No utilization telemetry: ensure nvidia-ml-py works and nvidia-smi is available.
  • No GPUs detected: verify drivers, CUDA/ROCm runtime, and torch.cuda.device_count().

Example

User request: "Install KeepGPU from GitHub and keep GPU 0 alive while I preprocess."

Suggested response shape:

  1. Install: pip install "git+https://github.com/Wangmerlyn/KeepGPU.git"
  2. Run: keep-gpu start --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
  3. Verify: keep-gpu status or dashboard http://127.0.0.1:8765/; stop session with keep-gpu stop --job-id <job_id> and daemon with keep-gpu service-stop.

Limitations

  • KeepGPU is not a scheduler; it only keeps already accessible GPUs active.
  • KeepGPU behavior depends on cluster policy; some schedulers require higher VRAM or tighter intervals.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Power Automate Debug

Debug failing Power Automate cloud flows using the FlowStudio MCP server. The Graph API only shows top-level status codes. This skill gives your agent action...

Registry SourceRecently Updated
Coding

Power Automate Mcp

Foundation skill for Power Automate via FlowStudio MCP — auth setup, the reusable MCP helper (Python + Node.js), tool discovery via `list_skills` / `tool_sea...

Registry SourceRecently Updated
Coding

data-analyst

Expert data analyst specializing in business intelligence, data visualization, and statistical analysis. Masters SQL, Python, and BI tools to transform raw d...

Registry SourceRecently Updated
Coding

DeviantArt Post

Post artwork, journals, and status updates to a user's DeviantArt account through the official DeviantArt API using OAuth 2.1 Authorization Code with PKCE, S...

Registry SourceRecently Updated