Vidu Video and Image Generation Skill
Generate AI videos and images with Vidu via vidu-cli — text-to-image, text-to-video, image-to-video, start-end frame, reference-based generation, and material elements, up to 1080p/2K/4K.
Execution model: use vidu CLI
All execution is done via the vidu-cli CLI tool. Parameters are CLI flags (not raw JSON bodies).
Environment variables
VIDU_TOKEN(required) — Vidu API tokenVIDU_BASE_URL(optional) — Defaulthttps://service.vidu.cn(mainland China); usehttps://service.vidu.comfor overseasVIDU_DEBUG(optional) — Set to1to print full response body to stderr for debugging
Stdout contract
- Every command prints one line of JSON to stdout.
- Success:
{"ok": true, "trace_id": "...", ...}— exit code0 - Failure:
{"ok": false, "error": {"type": "...", "http_status": ..., "code": "...", "message": "..."}}— exit code1 trace_idappears on API-backed responses for support/debugging.- CRITICAL: Never guess why an error happened. Copy fields from
errorexactly. Full shapes and edge cases: references/parameters.md.
Error type values
http_error— API 4xx/5xx (http_status,code,message)network_error— Connection failure or timeoutparse_error— Response is not valid JSONclient_error— Local issues (missing token, bad path, validation)
Main commands
| Command | Purpose |
|---|---|
vidu-cli upload <image_path> | Upload image → upload_id, ssupload_uri |
vidu-cli task submit --type ... --prompt ... [options] | Submit task → task_id. --image: local path, URL, or ssupload:?id=... (auto-upload). |
vidu-cli task get <task_id> [--output/-o <dir>] | Query task → state, type, model; use --output to download media on success |
vidu-cli task compose --timeline <json> [--width N --height N] | Compose video from timeline → task_id. Query with task get. MUST read references/compose.md before building the timeline JSON — do not guess the schema. |
vidu-cli task lip-sync --video <path> --text <text> [options] | Lip-sync with text-to-speech → task_id. Supports --schedule-mode (auto-detected if omitted). |
vidu-cli task lip-sync --video <path> --audio <path> | Lip-sync with audio file → task_id. Supports --schedule-mode (auto-detected if omitted). |
vidu-cli task lip-sync-voices | List available lip-sync voices (~86, Chinese/English/Cantonese/Cartoon etc.) |
vidu-cli task tts --prompt ... --voice-id ... | Text-to-speech → task_id. Supports --schedule-mode (auto-detected if omitted). |
vidu-cli task tts-voices | List available TTS voices (300+, 20+ languages) |
vidu-cli task cost --type ... --model-version ... --duration ... | Query task credit cost (estimate before submitting) |
vidu-cli quota pass | Query claw-pass daily quota status |
vidu-cli quota credit | Query user credit balance |
vidu-cli element create --name ... --image ... [--description ...] [--style ...] | Create reference element (check → preprocess → create). Returns id, version. |
vidu-cli element check --name ... | Check name availability |
vidu-cli element list [--keyword kw] | List personal elements |
vidu-cli element search --keyword kw | Search community elements |
Smart image handling (task submit --image, element create --image)
- Local path → auto-upload (auto-compress when file is larger than 10MB)
http(s):URL → download then uploadssupload:?id=...→ use as-is
Key Capabilities
- text-to-image — Text-only image generation
- text-to-video — Text-only video generation
- image-to-video — One image + text → video
- head-tail-image-to-video — Start + end frames + text
- reference-to-image — Images + materials: 1–7 total; text prompt required; can be images-only, materials-only, or mixed; images-only needs no
element create - reference-to-video — Same rule: 1–7 total; text prompt required
- lip-sync — Drive video mouth movement with text-to-speech or audio file
- text-to-speech — Convert text to speech audio via
task tts - video-compose — Compose multi-track timeline (video/audio/subtitle/effect) into a single exported video via
task compose - create-references —
element create(single command) - search-community-references —
element search - query-task —
task get [--output <dir>]
Setup
npm install -g vidu-cli@latest(requires Node.js >=14; postinstall auto-downloads the platform binary)- Obtain
VIDU_TOKEN(e.g. Vidu console). - Set
VIDU_TOKENenvironment variable (required); setVIDU_BASE_URLif not using default region. - Verify:
vidu-cli task submit --help
Data usage and privacy (summary)
Content you send (prompts, images, task settings) goes to Vidu’s API. Confirm this meets your privacy and IP needs. Prefer least-privilege tokens for testing. Terms: https://www.vidu.com/terms (overseas), https://www.vidu.cn/terms (mainland China).
Async workflow (short)
- Vidu generation is asynchronous:
task submit→task_id→ polltask get <task_id>until terminal state. - Model nicknames: Q1 →
3.0, Q2 →3.1, Q3 →3.2. Additional variants exist:3.1_pro,3.2_fast_m,3.2_pro_m— see references/parameters.md for the complete per-task model version list. - Task-type summaries, task support matrix, copy-paste CLI examples, prompt tips, and element create/list/search details are in references/parameters.md.
- Task lifecycle, retries, and polling guidance: references/errors_and_retry.md.
Implementation guide
For task submit (generation tasks)
- Pick capability → map to
--typeand options using references/parameters.md (matrix + validation). - Prepare inputs: for reference2image / character2video,
--imageand/or--materialso combined count is 1–7; optional[@name]in prompt per references/parameters.md. vidu-cli task submit ...→ storetask_idandtrace_id.- schedule-mode auto-detection: if
--schedule-modeis omitted, CLI queries claw-pass status and usesclaw_passwhen user has an active pass, otherwisenormal. If submit fails withClawPassExplicitModeRequired, tell the user their daily claw-pass quota is exhausted. Do not retry automatically.
- schedule-mode auto-detection: if
vidu-cli task get <task_id>untilsuccessorfailed; use--output <dir>to download media on success.- On success return
downloaded_files(if--outputused) or prompt user to re-run with--output; on task failure returnerr_code/err_msg; on CLIok: falsereturnerrorfields verbatim.
For task compose (video composition)
CRITICAL: Before constructing the --timeline JSON, you MUST read references/compose.md first. The timeline has a specific JSON schema with exact field names, nesting structure, and media_url rules. Do NOT guess the structure — always refer to compose.md for the complete schema, supported fields, and examples.
- Read references/compose.md to understand the timeline JSON schema, media_url rules, and limits.
- Build the timeline JSON following the exact structure:
video_tracks[].video_track_clips[],audio_tracks[].audio_track_clips[],subtitle_tracks[].subtitle_track_clips[],effect_tracks[].effect_track_items[]. - For
media_url: usessupload:?id=xxx, http URL, or local file path (auto-uploaded by CLI). - For
file_url(subtitles): usessupload:?id=xxx, http URL, or local .srt file path. vidu-cli task compose --timeline <file_or_json> [--width N --height N]→ returnstask_id.vidu-cli task get <task_id>to poll status, same as other tasks.
Output to the user
- After submit: return
task_idandtrace_id; state that processing is in progress. - After query: if
stateis success, returndownloaded_files(if--outputwas used) or thetask_idwith a note to re-run with--output <dir>to download; if failed, returnerr_codeanderr_msgexactly (note: response may still haveok: truewhilestateisfailed). - On CLI failure (
ok: false): reporterror.type,http_status,code,messageexactly — do not infer causes.
References (bundled)
| File | Contents |
|---|---|
| references/parameters.md | Task matrix, CLI flags, examples, prompt tips, validation |
| references/errors_and_retry.md | States, retries, polling |
| references/compose.md | Timeline schema, media_url rules, clip compose examples |
Fallback (no Node.js / npm)
If node / npm / vidu-cli cannot be installed, this skill cannot run. Require vidu-cli latest (via npm install -g vidu-cli@latest, Node.js >=14) and point users to references/parameters.md for parameter details.