RunPod Cloud GPU
Run open-source AI models on cloud GPUs via RunPod serverless. Pay-per-second, no minimums.
Setup
1. Create account at https://runpod.io
2. Add API key to .env
echo "RUNPOD_API_KEY=your_key_here" >> .env
3. Deploy any tool with --setup
python tools/image_edit.py --setup python tools/upscale.py --setup python tools/dewatermark.py --setup python tools/sadtalker.py --setup python tools/qwen3_tts.py --setup
Each --setup command:
-
Creates a RunPod template from the Docker image
-
Creates a serverless endpoint with appropriate GPU
-
Saves the endpoint ID to .env (e.g. RUNPOD_QWEN_EDIT_ENDPOINT_ID )
Available Images
All images are public on GHCR — no authentication needed.
Tool Docker Image GPU VRAM Typical Cost
image_edit ghcr.io/conalmullan/video-toolkit-qwen-edit:latest
A6000/L40S 48GB+ ~$0.05-0.15/job
upscale ghcr.io/conalmullan/video-toolkit-realesrgan:latest
RTX 3090/4090 24GB ~$0.01-0.05/job
dewatermark ghcr.io/conalmullan/video-toolkit-propainter:latest
RTX 3090/4090 24GB ~$0.05-0.30/job
sadtalker ghcr.io/conalmullan/video-toolkit-sadtalker:latest
RTX 4090 24GB ~$0.05-0.15/job
qwen3_tts ghcr.io/conalmullan/video-toolkit-qwen3-tts:latest
ADA 24GB 24GB ~$0.01-0.05/job
Total monthly cost: Rarely exceeds $10 even with heavy use.
How It Works
All tools follow the same pattern:
Local CLI → Upload input to cloud storage → RunPod API → Poll for result → Download output
-
File transfer: Tools use Cloudflare R2 when configured (R2_ACCOUNT_ID , R2_ACCESS_KEY_ID , R2_SECRET_ACCESS_KEY , R2_BUCKET_NAME ), falling back to free upload services
-
RunPod API: Tools call the /run endpoint, then poll /status/{job_id} until complete
-
Cold vs warm start: First request after idle spins up a worker (~30-90s). Subsequent requests are fast (~5-15s)
Endpoint Management
Workers
workersMin: 0 — Scale to zero when idle (no cost) workersMax: 1 — Max concurrent jobs (increase for throughput) idleTimeout: 5 — Seconds before worker scales down
Across all endpoints, you share a total worker pool based on your RunPod plan. If you hit limits, reduce workersMax on endpoints you're not actively using.
Checking Endpoint Status
Each tool stores its endpoint ID in .env :
Tool Env Var
image_edit RUNPOD_QWEN_EDIT_ENDPOINT_ID
upscale RUNPOD_UPSCALE_ENDPOINT_ID
dewatermark RUNPOD_DEWATERMARK_ENDPOINT_ID
sadtalker RUNPOD_SADTALKER_ENDPOINT_ID
qwen3_tts RUNPOD_QWEN3_TTS_ENDPOINT_ID
Disabling an Endpoint
To free worker slots without deleting the endpoint, set workersMax=0 via the RunPod dashboard or GraphQL API.
RunPod API Reference
Use these to query and manage endpoints programmatically. RunPod disables GraphQL introspection, so these field names are verified and must be exact.
Authentication
All API calls require Authorization: Bearer $RUNPOD_API_KEY .
-
GraphQL: POST https://api.runpod.io/graphql
-
REST (Serverless): https://api.runpod.ai/v2/{endpoint_id}/...
GraphQL Queries
List all endpoints:
query { myself { endpoints { id name gpuIds templateId workersMax workersMin } } }
Current spend rate:
query { myself { currentSpendPerHr spendDetails { localStoragePerHour networkStoragePerHour gpuComputePerHour } } }
List pods:
query { myself { pods { id name runtime { uptimeInSeconds } machine { gpuDisplayName } desiredStatus } } }
Common mistakes: Field names are camelCase with full words — localStoragePerHour not localStoragePerHr . Endpoints are endpoints not serverlessWorkers . spending is not a field — use currentSpendPerHr and spendDetails .
GraphQL Mutations
Update endpoint GPU or config:
mutation { saveEndpoint(input: { id: "endpoint_id", name: "endpoint-name", templateId: "template_id", gpuIds: "AMPERE_24", workersMin: 0, workersMax: 1 }) { id gpuIds } }
saveEndpoint requires name and templateId even for updates — query first to get current values.
REST API (Serverless)
Action Method URL
Submit job POST /v2/{id}/run
Check status GET /v2/{id}/status/{job_id}
Cancel job POST /v2/{id}/cancel/{job_id}
List pending GET /v2/{id}/requests
Health/stats GET /v2/{id}/health
Health response includes job counts and worker state:
{ "jobs": { "completed": 16, "failed": 1, "inProgress": 0, "inQueue": 2, "retried": 0 }, "workers": { "idle": 0, "initializing": 1, "ready": 0, "running": 0, "throttled": 0 } }
Note: /requests only returns pending/queued jobs. Completed job history is not available via the API — check the RunPod web console for logs.
GPU Type IDs
ID GPU VRAM Typical Cost
AMPERE_24
RTX 3090 24GB ~$0.34/hr
ADA_24
RTX 4090 24GB ~$0.69/hr
AMPERE_48
A6000 48GB ~$0.76/hr
AMPERE_80
A100 80GB ~$1.99/hr
Availability note: ADA_24 (4090) is frequently throttled/unavailable on RunPod. Always configure endpoints with multiple fallback GPU types (comma-separated) to avoid jobs getting stuck in queue indefinitely:
gpuIds: "AMPERE_24,ADA_24" # Try 3090 first, fall back to 4090
All toolkit tools also enforce a 5-minute queue timeout — if no GPU is available within 300 seconds, the job is automatically cancelled to prevent runaway billing from failed initialization cycles.
Cloudflare R2 via AWS CLI
R2 uses the S3-compatible API but requires --region auto :
AWS_ACCESS_KEY_ID="$R2_ACCESS_KEY_ID"
AWS_SECRET_ACCESS_KEY="$R2_SECRET_ACCESS_KEY"
aws s3api list-objects-v2
--bucket "$R2_BUCKET_NAME"
--endpoint-url "https://${R2_ACCOUNT_ID}.r2.cloudflarestorage.com"
--region auto
Common mistake: Omitting --region auto causes InvalidRegionName error. R2 valid regions: wnam , enam , weur , eeur , apac , oc , auto .
Troubleshooting
Force Image Pull
When you push a new Docker image version, RunPod may still use the cached old one. To force a pull:
-
Update the template's imageName to use @sha256:DIGEST notation
-
Wait for the worker to restart
-
Revert to :latest tag after confirming
Cold Start Too Slow
-
qwen3-tts: ~70s cold start, ~7s warm
-
sadtalker: ~60s cold start, ~10s warm
-
image_edit: ~90s cold start, ~15s warm
If cold starts are a problem, set workersMin: 1 (costs money when idle).
Job Fails with OOM
The model needs more VRAM than the GPU provides. Options:
-
Use a larger GPU tier
-
For dewatermark: reduce --resize-ratio (default 0.5 for safety)
-
For image_edit: reduce --steps
"No workers available"
You've hit your plan's concurrent worker limit. Either:
-
Wait for a running job to finish
-
Set workersMax=0 on endpoints you're not using
-
Upgrade your RunPod plan
Docker Images
All Dockerfiles live in docker/runpod-*/ . Images use runpod/pytorch as the base to share layers across tools.
Building for RunPod (from Apple Silicon Mac):
docker buildx build --platform linux/amd64 -t ghcr.io/conalmullan/video-toolkit-<name>:latest docker/runpod-<name>/ docker push ghcr.io/conalmullan/video-toolkit-<name>:latest
GHCR packages default to private — you must manually make them public for RunPod to pull them. Go to GitHub > Packages > Package Settings > Change Visibility.
Cost Optimization
-
Keep workersMin: 0 on all endpoints (scale to zero)
-
Only deploy endpoints you actively need
-
Use workersMax=0 to disable idle endpoints without deleting them
-
Qwen3-TTS is significantly cheaper than ElevenLabs for voiceovers
-
Check the RunPod dashboard for usage and billing