bytedance-video-generator

generate text prompts or clips into AI-generated videos with this bytedance-video-generator skill. Works with MP4, MOV, WebM, GIF files up to 500MB. TikTok creators use it for generating short videos from text prompts or images — processing takes 1-2 minutes on cloud GPUs and you get 1080p MP4 files.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bytedance-video-generator" with this command: npx skills add tk8544-b/bytedance-video-generator

Getting Started

Drop your text prompts or clips here and tell me what to do with it. Describe your idea if you don't have files yet.

Try saying:

  • "generate a short text description of a product scene into a 1080p MP4"
  • "generate a 15-second video clip from this text prompt about a city at sunset"
  • "generating short videos from text prompts or images for TikTok creators"

Quick Start Setup

This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

  • Generate a UUID as client identifier
  • POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
  • Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

Bytedance Video Generator — What You Get

What does this do? It takes your text prompts or clips and runs AI video generation on a cloud backend. Nothing to install.

Real example: I threw in a short text description of a product scene, typed "generate a 15-second video clip from this text prompt about a city at sunset", and 1-2 minutes later had a clean MP4 file. Default output is 1080p.

Quick note: shorter and more specific prompts tend to produce more accurate video output.

Request Routing

Your request is matched to one of several actions depending on what you typed.

User says...ActionSkip SSE?
"export" / "导出" / "download" / "send me the video"→ §3.5 Export
"credits" / "积分" / "balance" / "余额"→ §3.3 Credits
"status" / "状态" / "show tracks"→ §3.4 State
"upload" / "上传" / user sends file→ §3.2 Upload
Everything else (generate, edit, add BGM…)→ §3.1 SSE

Technical Details

Processing runs on remote GPUs through NemoVideo's API. The skill sends your input, waits for the render, and hands back the result — all server-side.

Base URL: https://mega-api-prod.nemovideo.ai

EndpointMethodPurpose
/api/tasks/me/with-session/nemo_agentPOSTStart a new editing session. Body: {"task_name":"project","language":"<lang>"}. Returns session_id.
/run_ssePOSTSend a user message. Body includes app_name, session_id, new_message. Stream response with Accept: text/event-stream. Timeout: 15 min.
/api/upload-video/nemo_agent/me/<sid>POSTUpload a file (multipart) or URL.
/api/credits/balance/simpleGETCheck remaining credits (available, frozen, total).
/api/state/nemo_agent/me/<sid>/latestGETFetch current timeline state (draft, video_infos, generated_media).
/api/render/proxy/lambdaPOSTStart export. Body: {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll status every 30s.

Accepted file types: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Three attribution headers are required on every request and must match this file's frontmatter:

HeaderValue
X-Skill-Sourcebytedance-video-generator
X-Skill-Versionfrontmatter version
X-Skill-Platformauto-detect: clawhub / cursor / unknown from install path

Every API call needs Authorization: Bearer <NEMO_TOKEN> plus the three attribution headers above. If any header is missing, exports return 402.

Error Codes

  • 0 — success, continue normally
  • 1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
  • 1002 — session not found; create a new one
  • 2001 — out of credits; anonymous users get a registration link with ?bind=<id>, registered users top up
  • 4001 — unsupported file type; show accepted formats
  • 4002 — file too large; suggest compressing or trimming
  • 400 — missing X-Client-Id; generate one and retry
  • 402 — free plan export blocked; not a credit issue, subscription tier
  • 429 — rate limited; wait 30s and retry once

SSE Event Handling

EventAction
Text responseApply GUI translation (§4), present to user
Tool call/resultProcess internally, don't forward
heartbeat / empty data:Keep waiting. Every 2 min: "⏳ Still working..."
Stream closesProcess final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend saysYou do
"click [button]" / "点击"Execute via API
"open [panel]" / "打开"Query session state
"drag/drop" / "拖拽"Send edit via SSE
"preview in timeline"Show track summary
"Export button" / "导出"Execute export workflow

Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Quick Start Guide

First time? Just upload a text prompts or clips and describe what you need. I'll run it through NemoVideo's backend and hand you back a 1080p MP4.

Processing takes about 1-2 minutes depending on video length. You start with 100 free credits — most edits cost 1-3.

Tips and Tricks

Keep your source files under 500MB for fastest processing. If you're working with longer content, split it into chunks first.

For best results at 1080p, make sure your input is at least 720p. Upscaling from 480p works but you'll notice it.

Export as MP4 for widest compatibility across social platforms.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

进出口许可文档智能预审系统

进出口许可文档智能预审系统。支持 PDF 和图片处理:自动提取合同号、出口国、进口商、总金额、数量、重量、合格证编号、生产商、报关口岸等字段,检测公章,按审核规则执行审核,生成 MD 和 JSON 审核报告。支持 CLI 和对话交互两种方式触发。

Registry SourceRecently Updated
Coding

generate-developer-ad-creative-brief

Plan campaign visuals and hooks for developer promotions. Use when working on paid campaign planning for developers, technical founders, product engineers.

Registry SourceRecently Updated
Coding

DOOMSCROLLR

Manage DOOMSCROLLR audience hubs by publishing posts, handling subscribers, creating products, connecting feeds, and retrieving embed codes securely.

Registry SourceRecently Updated
Coding

generate-plumbing-service-company-client-education-handout

Create a polished explainer handout with visuals, FAQs, and clear next steps for a plumbing service company. Use when handling client education work...

Registry SourceRecently Updated