RunPod Media Skill

Generate AI images and videos using RunPod public endpoints. All output is saved to ~/runpod-media/.

API Keys

One key required — add to ~/.openclaw/secrets.json:

Key path	Purpose	Get it from
`/runpod/apiKey`	Call RunPod endpoints	runpod.io/console/user/settings

Local images are uploaded to Cloudflare R2 as presigned URLs (1 min expiry) before being sent to RunPod endpoints. R2 credentials are read from /cloudflare/r2 in secrets.json — already configured ✅

imgbb is no longer used. R2 presigned URLs replace it for all local file uploads.

R2 cleanup: Objects in uploads/ are auto-deleted after 1 day via a lifecycle rule on the openclaw bucket. Presigned URLs expire after 1 min (no access), objects are cleaned up within 24h.

Keys are resolved in this order:

OpenClaw secrets.json — ~/.openclaw/secrets.json ✅ (already configured)
Env vars — RUNPOD_API_KEY

How Users Ask (Natural Language Examples)

The user will never type CLI commands — translate their natural requests into the right script call.

Generate an image:

"Generate an image of a samurai cat in neon Tokyo" → generate_image --prompt "..."
"Make me a 16:9 image of a stormy ocean at sunset" → generate_image --prompt "..." --aspect-ratio 16:9
"Create an image using Nano Banana — a futuristic city" → call_endpoint --endpoint google-nano-banana-2-edit --prompt "..."

Edit an image:

"Edit this image — add snow falling" → edit_image --images <file> --prompt "add snow falling"
"Use Qwen to edit this photo, make it look like a painting" → call_endpoint --endpoint qwen-image-edit --image <file> --prompt "make it look like a painting"

Animate to video:

"Animate this image — slow camera pan" → image_to_video --image <file> --prompt "slow camera pan"
"Make a video from this with Kling" → image_to_video --image <file> --model kling --prompt "..."
"Turn this into a 10 second clip with Sora 2" → call_endpoint --endpoint sora-2-pro-i2v --image <file> --prompt "..." --duration 10

Text to video:

"Generate a video of a wolf howling at the moon" → text_to_video --prompt "..."

List available models:

"What image/video models do you have?"
"List the available endpoints"
"Show me what RunPod models are available" → Run list_endpoints and summarize the output in plain language for the user

Add a new endpoint:

"Add this RunPod endpoint: https://console.runpod.io/hub/playground/voice/kokoro-tts"
"Probe and add these endpoints: kokoro-tts, flux-kontext-pro" → Run discover_endpoints add --candidates "<url-or-id>"

Capabilities & Cost

Task	Command	Cost	Time
Text → Image	`generate_image`	~$0.005/image	3–8s
Edit image(s)	`edit_image`	~$0.005/image	5–15s
Image → Video	`image_to_video`	$0.03–$0.90/clip	30–120s
Text → Video	`text_to_video`	$0.04–$1.22/clip	30–120s
Any endpoint	`call_endpoint`	varies	varies

The built-in commands use default endpoints. For more models (Nano Banana Pro, FLUX, Sora 2, Kling, TTS, etc.) use call_endpoint with any RunPod public endpoint ID.

Endpoint Registry

All known public endpoints are in scripts/endpoints.json. List them:

$SKILL_DIR/run.sh list_endpoints

Call Any Endpoint

$SKILL_DIR/run.sh call_endpoint \
  --endpoint <ENDPOINT_ID> \
  [--prompt "TEXT"] \
  [--image PATH_OR_URL] \
  [--audio PATH_OR_URL] \
  [--duration 5] \
  [--aspect-ratio 16:9] \
  [--input '{"key": "value"}']   # full JSON override

Examples:

# Nano Banana Pro image generation
$SKILL_DIR/run.sh call_endpoint --endpoint nano-banana-pro --prompt "a golden retriever in space"

# Nano Banana Pro image editing
$SKILL_DIR/run.sh call_endpoint --endpoint nano-banana-pro --prompt "make it nighttime" --image photo.jpg

# Sora 2 Pro video from image
$SKILL_DIR/run.sh call_endpoint --endpoint sora-2-pro-i2v --image photo.jpg --prompt "camera slowly pulls back" --duration 5

# Kokoro TTS
$SKILL_DIR/run.sh call_endpoint --endpoint kokoro-tts --text "Hello world"

# FLUX Schnell
$SKILL_DIR/run.sh call_endpoint --endpoint flux-schnell --prompt "cyberpunk city" --input '{"width":1024,"height":1024}'

Adding New Endpoints

When the user asks to use an endpoint not in the registry, or the runpod skill reveals a new one:

Call it directly with --endpoint <id> — no registry entry needed
Optionally add it to scripts/endpoints.json for future sessions

With runpod skill: Use the runpod skill to browse/discover endpoint IDs on the RunPod hub, then pass that ID to call_endpoint here.

Generate Image

$SKILL_DIR/run.sh generate_image \
  --prompt "PROMPT" \
  [--aspect-ratio 1:1|16:9|9:16|4:3|3:4] \
  [--seed 42]

Edit Image

$SKILL_DIR/run.sh edit_image \
  --images PATH_OR_URL [PATH_OR_URL ...] \
  --prompt "EDIT INSTRUCTION" \
  [--aspect-ratio 1:1] \
  [--seed 42]

Accepts 1–5 images (local paths or URLs)
Local files are auto-uploaded via imgbb (requires /imgbb/apiKey in secrets.json)

Animate Image → Video

$SKILL_DIR/run.sh image_to_video \
  --image PATH_OR_URL \
  --prompt "MOTION DESCRIPTION" \
  [--model wan25|kling|seedance] \
  [--duration 5|10] \
  [--negative-prompt "TEXT"]

Models:

wan25 (default) — WAN 2.5, ~$0.026/5s
kling — Kling v2.1 Pro, $0.45/5s (highest quality)
seedance — Seedance 1.0 Pro, ~$0.12/5s

Generate Video from Text

$SKILL_DIR/run.sh text_to_video \
  --prompt "VIDEO DESCRIPTION" \
  [--model wan26|seedance] \
  [--duration 5|10|15] \
  [--size 1920x1080] \
  [--negative-prompt "TEXT"]

Models:

wan26 (default) — WAN 2.6, ~$0.04/5s
seedance — Seedance 1.0 Pro, ~$0.12/5s

Defaults

Delete after send — always delete the local file after successful delivery. Only keep if the user explicitly asks ("keep it", "save it", "--keep").
Captions — keep them short and natural. Do NOT include render time or cost unless the user asks. Example: 🦊 Fox under the aurora not 🦊 Fox — 105s render (~$0.026).

Delivering Media to the User

After generating an image or video, always deliver it to the user via their active channel.

The Problem

The message tool with a local media path may fail in sandboxed agent modes due to SecretRef resolution not being available for media sends. This is a known OpenClaw limitation.

The Solution: Use curl + Telegram Bot API directly

Read the bot token from secrets and send via curl — this always works regardless of sandbox mode:

TOKEN=$(cat ~/.openclaw/secrets.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('telegram',{}).get('botToken',''))")

# Send photo
curl -s \
  -F "chat_id=CHAT_ID" \
  -F "photo=@$HOME/.openclaw/workspace/runpod-media/OUTPUT_FILE.jpg" \
  -F "caption=YOUR CAPTION" \
  "https://api.telegram.org/bot${TOKEN}/sendPhoto"

# Send video (.mp4)
curl -s \
  -F "chat_id=CHAT_ID" \
  -F "video=@$HOME/.openclaw/workspace/runpod-media/OUTPUT_FILE.mp4" \
  -F "caption=YOUR CAPTION" \
  "https://api.telegram.org/bot${TOKEN}/sendVideo"

Where to find CHAT_ID: Use the chat_id from the inbound message metadata (e.g. telegram:1231438498 → use 1231438498).

Delivery Checklist

✅ Generate the file (note the saved path from script output)
✅ Send via message tool with a short, natural caption (no cost/time unless asked)
✅ Delete the local file after successful send: rm <path>
⏭️ Skip deletion only if user said "keep it" / "save it" / passed --keep

Why not `message` tool?

Try it first — if it works, great. If it returns a SecretRef error, fall back to the curl method above.

Notes

Video generation takes 30–120 seconds — inform the user to wait
Output files saved to ~/.openclaw/workspace/runpod-media/ — accessible in both sandboxed and elevated agent modes
Shared utilities in scripts/_utils.py — do not call directly

runpod-media

Safety Notice

Copy this and send it to your AI assistant to learn

RunPod Media Skill

API Keys

How Users Ask (Natural Language Examples)

Capabilities & Cost

Endpoint Registry

Call Any Endpoint

Adding New Endpoints

Generate Image

Edit Image

Animate Image → Video

Generate Video from Text

Defaults

Delivering Media to the User

The Problem

The Solution: Use curl + Telegram Bot API directly

Delivery Checklist

Why not `message` tool?

Notes

Source Transparency

Related Skills

Cclaw

Bird Recognition Tool | 鸟类识别工具

Image Amazon Product Image Suite

SearchOnlineAssets

runpod-media

Safety Notice

Copy this and send it to your AI assistant to learn

RunPod Media Skill

API Keys

How Users Ask (Natural Language Examples)

Capabilities & Cost

Endpoint Registry

Call Any Endpoint

Adding New Endpoints

Generate Image

Edit Image

Animate Image → Video

Generate Video from Text

Defaults

Delivering Media to the User

The Problem

The Solution: Use curl + Telegram Bot API directly

Delivery Checklist

Why not message tool?

Notes

Source Transparency

Related Skills

Cclaw

Bird Recognition Tool | 鸟类识别工具

Image Amazon Product Image Suite

SearchOnlineAssets

Why not `message` tool?