run-experiment

Deploy and run ML experiments on local or remote GPU servers. Use when user says "run experiment", "deploy to server", "跑实验", or needs to launch training jobs.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "run-experiment" with this command: npx skills add wanshuiyin/auto-claude-code-research-in-sleep/wanshuiyin-auto-claude-code-research-in-sleep-run-experiment

Run Experiment

Deploy and run ML experiment: $ARGUMENTS

Workflow

Step 1: Detect Environment

Read the project's CLAUDE.md to determine the experiment environment:

  • Local GPU: Look for local CUDA/MPS setup info
  • Remote server: Look for SSH alias, conda env, code directory

If no server info is found in CLAUDE.md, ask the user.

Step 2: Pre-flight Check

Check GPU availability on the target machine:

Remote:

ssh <server> nvidia-smi --query-gpu=index,memory.used,memory.total --format=csv,noheader

Local:

nvidia-smi --query-gpu=index,memory.used,memory.total --format=csv,noheader
# or for Mac MPS:
python -c "import torch; print('MPS available:', torch.backends.mps.is_available())"

Free GPU = memory.used < 500 MiB.

Step 3: Sync Code (Remote Only)

Check the project's CLAUDE.md for a code_sync setting. If not specified, default to rsync.

Option A: rsync (default)

Only sync necessary files — NOT data, checkpoints, or large files:

rsync -avz --include='*.py' --exclude='*' <local_src>/ <server>:<remote_dst>/

Option B: git (when code_sync: git is set in CLAUDE.md)

Push local changes to remote repo, then pull on the server:

# 1. Push from local
git add -A && git commit -m "sync: experiment deployment" && git push

# 2. Pull on server
ssh <server> "cd <remote_dst> && git pull"

Benefits: version-tracked, multi-server sync with one push, no rsync include/exclude rules needed.

Step 4: Deploy

Remote (via SSH + screen)

For each experiment, create a dedicated screen session with GPU binding:

ssh <server> "screen -dmS <exp_name> bash -c '\
  eval \"\$(<conda_path>/conda shell.bash hook)\" && \
  conda activate <env> && \
  CUDA_VISIBLE_DEVICES=<gpu_id> python <script> <args> 2>&1 | tee <log_file>'"

Local

# Linux with CUDA
CUDA_VISIBLE_DEVICES=<gpu_id> python <script> <args> 2>&1 | tee <log_file>

# Mac with MPS (PyTorch uses MPS automatically)
python <script> <args> 2>&1 | tee <log_file>

For local long-running jobs, use run_in_background: true to keep the conversation responsive.

Step 5: Verify Launch

Remote:

ssh <server> "screen -ls"

Local: Check process is running and GPU is allocated.

Step 6: Feishu Notification (if configured)

After deployment is verified, check ~/.claude/feishu.json:

  • Send experiment_done notification: which experiments launched, which GPUs, estimated time
  • If config absent or mode "off": skip entirely (no-op)

Key Rules

  • ALWAYS check GPU availability first — never blindly assign GPUs
  • Each experiment gets its own screen session + GPU (remote) or background process (local)
  • Use tee to save logs for later inspection
  • Run deployment commands with run_in_background: true to keep conversation responsive
  • Report back: which GPU, which screen/process, what command, estimated time
  • If multiple experiments, launch them in parallel on different GPUs

CLAUDE.md Example

Users should add their server info to their project's CLAUDE.md:

## Remote Server
- SSH: `ssh my-gpu-server`
- GPU: 4x A100 (80GB each)
- Conda: `eval "$(/opt/conda/bin/conda shell.bash hook)" && conda activate research`
- Code dir: `/home/user/experiments/`
- code_sync: rsync          # default. Or set to "git" for git push/pull workflow

## Local Environment
- Mac MPS / Linux CUDA
- Conda env: `ml` (Python 3.10 + PyTorch)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

idea-creator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

analyze-results

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

auto-review-loop

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

research-lit

No summary provided by upstream source.

Repository SourceNeeds Review