Remote Execution Skill
This skill handles running code on remote GPU or TPU clusters via SkyPilot.
- Determine Target Device
Identify the target device from the user's request:
Target Cluster name file Launch script UV extra Env prefix
GPU .cluster_name_gpu
launch_gpu.sh
gpu
export CUDA_VISIBLE_DEVICES=0;
TPU .cluster_name_tpu
launch_tpu.sh
tpu
(none)
Execution Instructions: Before running the launch script, you must find its absolute path. It is located in the scripts/ directory alongside this skill definition. Use your file search tools (e.g., glob or find ) to locate launch_gpu.sh or launch_tpu.sh before executing it.
If the user does not specify a device, ask them which one to use.
- Prerequisites
-
The cluster must already be provisioned. Check that the corresponding cluster name file (.cluster_name_gpu or .cluster_name_tpu ) exists and is non-empty in the project root.
-
If the file does not exist or is empty, ask the user to provision a cluster first using the appropriate launch script.
- Cluster Management
Provisioning
Note: First locate the scripts as instructed above, then run them.
GPU — common accelerator types: H100:1, A100:1, L4:1
bash <absolute_path_to_launch_gpu.sh> <accelerator_type> <experiment_name>
TPU — common accelerator types: tpu-v4-8, tpu-v4-16, tpu-v6e-1, tpu-v6e-4
bash <absolute_path_to_launch_tpu.sh> <accelerator_type> <experiment_name>
The launch script automatically updates the corresponding .cluster_name_* file.
Teardown
GPU
sky down $(cat .cluster_name_gpu) -y
TPU
sky down $(cat .cluster_name_tpu) -y
- Execution Command
GPU
sky exec $(cat .cluster_name_gpu) --workdir . "export CUDA_VISIBLE_DEVICES=0; uv run --extra gpu python <PATH_TO_SCRIPT> [ARGS]"
-
export CUDA_VISIBLE_DEVICES=0; ensures deterministic single-GPU execution. Adjust for multi-GPU jobs.
-
--extra gpu activates GPU optional dependencies (e.g. jax[cuda] ).
TPU
sky exec $(cat .cluster_name_tpu) --workdir . "uv run --extra tpu python <PATH_TO_SCRIPT> [ARGS]"
- --extra tpu activates TPU optional dependencies (e.g. jax[tpu] ).
Common flags
-
--workdir . syncs the current local directory to the remote instance before running.
-
For pytest, use python -m pytest <test_path> instead of calling pytest directly.
- Usage Examples
Run a benchmark on GPU:
sky exec $(cat .cluster_name_gpu) --workdir . "export CUDA_VISIBLE_DEVICES=0; uv run --extra gpu python src/lynx/perf/benchmark_train.py"
Run tests on TPU:
sky exec $(cat .cluster_name_tpu) --workdir . "uv run --extra tpu python -m pytest src/lynx/test/"
- Operational Notes
-
Logs: SkyPilot streams stdout and stderr directly to the terminal.
-
Interruption: Ctrl+C may not kill the remote process; check SkyPilot docs for cleanup if needed.