Vast.ai Workflow

Overview

Use this skill for provider-level Vast.ai automation only. Keep it project-agnostic and parameter-driven.

Scope Boundaries

Handle only Vast API workflow logic: offers, create, poll, SSH attach, lifecycle, billing.
Do not embed project-specific repo paths, branch names, labels, or training commands.
If the user asks for workload-specific execution (for example Reparo training), switch to the corresponding workload skill after infrastructure provisioning.

Dialog-First Required Fields

Before any create/order call, confirm required runtime fields in dialog. If the user does not provide them, propose defaults and ask for confirmation.

Required fields and default suggestions:

api_key_source: default VAST_API_KEY environment variable.
- If missing, suggest loading from a local file path the user confirms (for example keys/.vast_env).
instance_type (offer filter): default gpu_name="RTX 4090".
- Ask whether to lock by exact GPU, VRAM minimum, region, max price, and reliability.
image: default pytorch/pytorch:latest.
disk_gb: default 64.
count: default 1.
label: do not ask by default.
- Derive from Vast account identity when possible (nickname or email local-part).
- Fallback label: vast-user.

Workflow

1) Preflight

Check active instances first.
If any instances are unintentionally active, ask whether to stop/destroy before creating a new one.
Ensure API key is loaded and never echo it.

2) Find offers

Query offers (/bundles is commonly reliable in practice).
Apply user filters plus mandatory rentable=true and rented=false.
Sort by user objective (price, performance, reliability).
Use user policy for selection:
- Default: choose cheapest valid offer.
- Optional conservative mode: choose second cheapest.

3) Create instance

Create from a single selected offer (PUT /asks/{id}).
Use confirmed image, resolved label, and disk_gb.
Parse and store the returned instance ID.
If create fails (no_such_ask or already taken), re-query and retry once with next candidate.

Label resolution order:

Load profile/account endpoint data and use nickname when present.
If nickname missing, use the email local-part when available.
If neither field is available, use vast-user.

4) Readiness and access

Poll instance status until running.
Add SSH key to account and attach to instance.
Retry SSH readiness for up to 2 minutes before declaring failure.

5) Lifecycle and billing

Support start/stop/restart when available.
Use destroy promptly when requested to prevent costs.
Verify final state and check usage/invoices when needed.

6) Runtime preflight (optional, when user plans to run workloads)

Validate runtime inside the instance instead of trusting image names/tags.
Confirm GPU visibility and basic CUDA readiness before starting long jobs.
Verify required Python package versions explicitly and report them in logs.
If workload depends on optional CUDA extensions (for example flash_attn or xformers), check availability first and fail fast on mismatch.
Prefer explicit environment checks over silent fallbacks to slower backends.

Request Templates

Use explicit, reproducible requests and validate JSON before chaining calls.

curl -sS -L -G "https://console.vast.ai/api/v0/<endpoint>/" \
  --data-urlencode "api_key=$VAST_API_KEY" \
  --data-urlencode "<param>=<value>"

curl -sS -L -X PUT "https://console.vast.ai/api/v0/asks/$OFFER_ID/?api_key=$VAST_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"image\":\"$IMAGE\",\"disk\":$DISK_GB,\"label\":\"$LABEL\"}"

Error Handling

401/403: API key missing, invalid, or not authorized for requested org/action.
429: rate limit; retry with backoff.
4xx: invalid endpoint or params; re-check request shape and required fields.
5xx: provider-side issue; retry with backoff and re-validate state.
Empty/parse failures: retry once, then save response to a temp file and parse from file.

Resources

references/api.md: concise endpoint map and safe calling checklist.
Treat references/api.md as the working source for endpoint details and refresh it against official docs regularly.

vastai-workflow

Safety Notice

Copy this and send it to your AI assistant to learn