runpod-media

Generate images from text, edit images with text instructions, animate images to video, and generate video from text — all via RunPod public AI endpoints. Use when the user asks to generate/create/make an image or video, edit a photo, animate an image, or produce AI media. Requires a RunPod API key.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "runpod-media" with this command: npx skills add ItamarCoh3n/runpod-media

RunPod Media Skill

Generate AI images and videos using RunPod public endpoints. All output is saved to ~/runpod-media/.

API Keys

One key required — add to ~/.openclaw/secrets.json:

Key pathPurposeGet it from
/runpod/apiKeyCall RunPod endpointsrunpod.io/console/user/settings

Local images are uploaded to Cloudflare R2 as presigned URLs (1 min expiry) before being sent to RunPod endpoints. R2 credentials are read from /cloudflare/r2 in secrets.json — already configured ✅

imgbb is no longer used. R2 presigned URLs replace it for all local file uploads.

R2 cleanup: Objects in uploads/ are auto-deleted after 1 day via a lifecycle rule on the openclaw bucket. Presigned URLs expire after 1 min (no access), objects are cleaned up within 24h.

Keys are resolved in this order:

  1. OpenClaw secrets.json~/.openclaw/secrets.json ✅ (already configured)
  2. Env varsRUNPOD_API_KEY

How Users Ask (Natural Language Examples)

The user will never type CLI commands — translate their natural requests into the right script call.

Generate an image:

  • "Generate an image of a samurai cat in neon Tokyo" → generate_image --prompt "..."
  • "Make me a 16:9 image of a stormy ocean at sunset" → generate_image --prompt "..." --aspect-ratio 16:9
  • "Create an image using Nano Banana — a futuristic city" → call_endpoint --endpoint google-nano-banana-2-edit --prompt "..."

Edit an image:

  • "Edit this image — add snow falling" → edit_image --images <file> --prompt "add snow falling"
  • "Use Qwen to edit this photo, make it look like a painting" → call_endpoint --endpoint qwen-image-edit --image <file> --prompt "make it look like a painting"

Animate to video:

  • "Animate this image — slow camera pan" → image_to_video --image <file> --prompt "slow camera pan"
  • "Make a video from this with Kling" → image_to_video --image <file> --model kling --prompt "..."
  • "Turn this into a 10 second clip with Sora 2" → call_endpoint --endpoint sora-2-pro-i2v --image <file> --prompt "..." --duration 10

Text to video:

  • "Generate a video of a wolf howling at the moon" → text_to_video --prompt "..."

List available models:

  • "What image/video models do you have?"
  • "List the available endpoints"
  • "Show me what RunPod models are available" → Run list_endpoints and summarize the output in plain language for the user

Add a new endpoint:


Capabilities & Cost

TaskCommandCostTime
Text → Imagegenerate_image~$0.005/image3–8s
Edit image(s)edit_image~$0.005/image5–15s
Image → Videoimage_to_video$0.03–$0.90/clip30–120s
Text → Videotext_to_video$0.04–$1.22/clip30–120s
Any endpointcall_endpointvariesvaries

The built-in commands use default endpoints. For more models (Nano Banana Pro, FLUX, Sora 2, Kling, TTS, etc.) use call_endpoint with any RunPod public endpoint ID.

Endpoint Registry

All known public endpoints are in scripts/endpoints.json. List them:

$SKILL_DIR/run.sh list_endpoints

Call Any Endpoint

$SKILL_DIR/run.sh call_endpoint \
  --endpoint <ENDPOINT_ID> \
  [--prompt "TEXT"] \
  [--image PATH_OR_URL] \
  [--audio PATH_OR_URL] \
  [--duration 5] \
  [--aspect-ratio 16:9] \
  [--input '{"key": "value"}']   # full JSON override

Examples:

# Nano Banana Pro image generation
$SKILL_DIR/run.sh call_endpoint --endpoint nano-banana-pro --prompt "a golden retriever in space"

# Nano Banana Pro image editing
$SKILL_DIR/run.sh call_endpoint --endpoint nano-banana-pro --prompt "make it nighttime" --image photo.jpg

# Sora 2 Pro video from image
$SKILL_DIR/run.sh call_endpoint --endpoint sora-2-pro-i2v --image photo.jpg --prompt "camera slowly pulls back" --duration 5

# Kokoro TTS
$SKILL_DIR/run.sh call_endpoint --endpoint kokoro-tts --text "Hello world"

# FLUX Schnell
$SKILL_DIR/run.sh call_endpoint --endpoint flux-schnell --prompt "cyberpunk city" --input '{"width":1024,"height":1024}'

Adding New Endpoints

When the user asks to use an endpoint not in the registry, or the runpod skill reveals a new one:

  1. Call it directly with --endpoint <id> — no registry entry needed
  2. Optionally add it to scripts/endpoints.json for future sessions

With runpod skill: Use the runpod skill to browse/discover endpoint IDs on the RunPod hub, then pass that ID to call_endpoint here.

Generate Image

$SKILL_DIR/run.sh generate_image \
  --prompt "PROMPT" \
  [--aspect-ratio 1:1|16:9|9:16|4:3|3:4] \
  [--seed 42]

Edit Image

$SKILL_DIR/run.sh edit_image \
  --images PATH_OR_URL [PATH_OR_URL ...] \
  --prompt "EDIT INSTRUCTION" \
  [--aspect-ratio 1:1] \
  [--seed 42]
  • Accepts 1–5 images (local paths or URLs)
  • Local files are auto-uploaded via imgbb (requires /imgbb/apiKey in secrets.json)

Animate Image → Video

$SKILL_DIR/run.sh image_to_video \
  --image PATH_OR_URL \
  --prompt "MOTION DESCRIPTION" \
  [--model wan25|kling|seedance] \
  [--duration 5|10] \
  [--negative-prompt "TEXT"]

Models:

  • wan25 (default) — WAN 2.5, ~$0.026/5s
  • kling — Kling v2.1 Pro, $0.45/5s (highest quality)
  • seedance — Seedance 1.0 Pro, ~$0.12/5s

Generate Video from Text

$SKILL_DIR/run.sh text_to_video \
  --prompt "VIDEO DESCRIPTION" \
  [--model wan26|seedance] \
  [--duration 5|10|15] \
  [--size 1920x1080] \
  [--negative-prompt "TEXT"]

Models:

  • wan26 (default) — WAN 2.6, ~$0.04/5s
  • seedance — Seedance 1.0 Pro, ~$0.12/5s

Defaults

  • Delete after send — always delete the local file after successful delivery. Only keep if the user explicitly asks ("keep it", "save it", "--keep").
  • Captions — keep them short and natural. Do NOT include render time or cost unless the user asks. Example: 🦊 Fox under the aurora not 🦊 Fox — 105s render (~$0.026).

Delivering Media to the User

After generating an image or video, always deliver it to the user via their active channel.

The Problem

The message tool with a local media path may fail in sandboxed agent modes due to SecretRef resolution not being available for media sends. This is a known OpenClaw limitation.

The Solution: Use curl + Telegram Bot API directly

Read the bot token from secrets and send via curl — this always works regardless of sandbox mode:

TOKEN=$(cat ~/.openclaw/secrets.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('telegram',{}).get('botToken',''))")

# Send photo
curl -s \
  -F "chat_id=CHAT_ID" \
  -F "photo=@$HOME/.openclaw/workspace/runpod-media/OUTPUT_FILE.jpg" \
  -F "caption=YOUR CAPTION" \
  "https://api.telegram.org/bot${TOKEN}/sendPhoto"

# Send video (.mp4)
curl -s \
  -F "chat_id=CHAT_ID" \
  -F "video=@$HOME/.openclaw/workspace/runpod-media/OUTPUT_FILE.mp4" \
  -F "caption=YOUR CAPTION" \
  "https://api.telegram.org/bot${TOKEN}/sendVideo"

Where to find CHAT_ID: Use the chat_id from the inbound message metadata (e.g. telegram:1231438498 → use 1231438498).

Delivery Checklist

  1. ✅ Generate the file (note the saved path from script output)
  2. ✅ Send via message tool with a short, natural caption (no cost/time unless asked)
  3. Delete the local file after successful send: rm <path>
  4. ⏭️ Skip deletion only if user said "keep it" / "save it" / passed --keep

Why not message tool?

Try it first — if it works, great. If it returns a SecretRef error, fall back to the curl method above.

Notes

  • Video generation takes 30–120 seconds — inform the user to wait
  • Output files saved to ~/.openclaw/workspace/runpod-media/ — accessible in both sandboxed and elevated agent modes
  • Shared utilities in scripts/_utils.py — do not call directly

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

SERP Outline Extractor

Turn a target keyword or query into a search-informed content outline with likely subtopics, questions, and comparison angles. Useful for SEO briefs, blog pl...

Registry SourceRecently Updated
General

Multi-Model Response Comparator

Compare responses from multiple AI models for the same task and summarize differences in quality, style, speed, and likely cost. Best for model selection, ev...

Registry SourceRecently Updated
General

API Pricing Comparator

Compare AI API or model pricing across providers and produce a structured summary for product pages, blog posts, or buyer guides. Works with OpenAI-compatibl...

Registry SourceRecently Updated