video-generator

Automated text-to-video pipeline with multi-provider TTS/ASR support - OpenAI, Azure, Aliyun, Tencent | 多厂商 TTS/ASR 支持的自动化文本转视频系统

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "video-generator" with this command: npx skills add zhenstaff/video-generator

🎬 Video Generator Skill

Automated text-to-video generation system that transforms text scripts into professional short videos with AI-powered voiceover, precise timing, and cyber-wireframe visuals.

Cost: ~$0.003 per 15-second video | License: MIT | Package: openclaw-video-generator


📦 Package Information

PropertyValue
npm Packageopenclaw-video-generator
Version1.6.2
Repositorygithub.com/ZhenRobotics/openclaw-video-generator
Commit Hash6279034
LicenseMIT

Verification:

npm info openclaw-video-generator version repository.url
# Expected: 1.6.2 and https://github.com/ZhenRobotics/openclaw-video-generator

🔐 Provider Setup (Choose ONE)

This tool supports 4 alternative TTS/ASR providers. You only need ONE configured:

Option 1: OpenAI (Recommended)

export OPENAI_API_KEY="sk-..."
  • Pros: Best quality, simple setup
  • Cost: ~$0.003 per 15s video

Option 2: Azure

export AZURE_SPEECH_KEY="..."
export AZURE_SPEECH_REGION="eastasia"
  • Pros: Enterprise reliability
  • Cost: Similar to OpenAI

Option 3: Aliyun (阿里云)

export ALIYUN_ACCESS_KEY_ID="..."
export ALIYUN_ACCESS_KEY_SECRET="..."
export ALIYUN_APP_KEY="..."
  • Pros: China connectivity, Chinese voices
  • Cost: ~¥0.02 per 15s video

Option 4: Tencent (腾讯云)

export TENCENT_SECRET_ID="..."
export TENCENT_SECRET_KEY="..."
export TENCENT_APP_ID="..."
  • Pros: China connectivity
  • Cost: ~¥0.02 per 15s video

Why multiple providers? Fallback support for network restrictions, regional preferences, and cost optimization.


🚀 Quick Start

Prerequisites

node --version  # Need >= 18
npm --version
ffmpeg -version

Installation

Option 1: npm Global Install

npm install -g openclaw-video-generator@1.6.2
export OPENAI_API_KEY="sk-..."  # Or add to ~/.bashrc
openclaw-video-generator --version

Option 2: From Source

git clone https://github.com/ZhenRobotics/openclaw-video-generator.git
cd openclaw-video-generator
npm install

# Configure provider
cp .env.example .env
nano .env  # Add your API key
chmod 600 .env

First Video

cd ~/openclaw-video-generator
cat > test.txt << 'EOF'
AI makes development easier
Saving time and boosting efficiency
EOF

./scripts/script-to-video.sh test.txt --voice nova --speed 1.15
# Output: out/test.mp4

💻 Agent Usage

When to Use

Auto-trigger when user mentions: video, generate video, create video, 生成视频

Standard Command

cd ~/openclaw-video-generator && \
./scripts/script-to-video.sh <script-file> \
  --voice nova \
  --speed 1.15

With Background Video

cd ~/openclaw-video-generator && \
./scripts/script-to-video.sh <script-file> \
  --voice nova \
  --bg-video "backgrounds/tech.mp4" \
  --bg-opacity 0.6

Example Flow

User: "Generate video: AI makes development easier"

Agent:

# 1. Check project
ls ~/openclaw-video-generator || echo "Not installed"

# 2. Create script
cat > ~/openclaw-video-generator/scripts/user-script.txt << 'EOF'
AI makes development easier
EOF

# 3. Generate
cd ~/openclaw-video-generator && \
./scripts/script-to-video.sh scripts/user-script.txt

# 4. Show result
echo "Video: ~/openclaw-video-generator/out/user-script.mp4"

Guidelines

Do:

  • Verify project exists before running
  • Check .env configuration
  • Show output file location

Don't:

  • Clone without user confirmation
  • Hardcode API keys in commands
  • Create new Remotion projects

🎯 Core Features

  • Multi-Provider TTS: OpenAI, Azure, Aliyun, Tencent with auto-fallback
  • Timestamp Extraction: Precise speech-to-text segmentation
  • Scene Detection: 6 intelligent scene types with auto-styling
  • Video Rendering: Remotion with cyber-wireframe aesthetics
  • Background Videos: Custom backgrounds with opacity control
  • Local Processing: Video rendering happens on your machine

⚙️ Configuration

TTS Voices

OpenAI:

  • nova (recommended), alloy, echo, shimmer

Azure:

  • zh-CN-XiaoxiaoNeural, zh-CN-YunxiNeural

Speech Speed

Range: 0.25 - 4.0 | Recommended: 1.15

Background Video

  • --bg-video <path> - Video file
  • --bg-opacity <0-1> - Transparency
  • --bg-overlay <rgba> - Text overlay

Recommended:

Use CaseOpacityOverlay
Text-focused0.3-0.4rgba(10,10,15,0.6)
Balanced0.5-0.6rgba(10,10,15,0.4)
Visual-focused0.7-1.0rgba(10,10,15,0.25)

📊 Video Specs

  • Resolution: 1080 x 1920 (vertical)
  • Frame Rate: 30 fps
  • Format: MP4 (H.264 + AAC)
  • Style: Cyber-wireframe with neon colors
  • Duration: Auto-calculated

🎨 Scene Types

TypeEffectTrigger
titleGlitch + scaleFirst segment
emphasisPop-up zoomNumbers/percentages
painShake + warningProblems mentioned
contentFade-inRegular text
circleRotating ringListed points
endSlide-upLast segment

💰 Cost

Per 15-second video: ~$0.003 (< 1 cent)

  • TTS: ~$0.001
  • Whisper: ~$0.0015
  • Rendering: Free (local)

🔧 Troubleshooting

Project Not Found

ls ~/openclaw-video-generator || \
git clone https://github.com/ZhenRobotics/openclaw-video-generator.git ~/openclaw-video-generator && \
cd ~/openclaw-video-generator && npm install

API Key Error

# Verify .env
cat ~/openclaw-video-generator/.env

# Create if missing
cd ~/openclaw-video-generator
echo 'OPENAI_API_KEY="sk-..."' > .env
chmod 600 .env

Provider Test

cd ~/openclaw-video-generator && ./scripts/test-providers.sh

🔒 Privacy

Local Processing:

  • Video rendering
  • Scene orchestration
  • File management

Cloud Processing (via configured provider):

  • Text-to-Speech (text sent to API)
  • Speech recognition (audio sent to API)

API keys are stored in .env file (600 permissions, never committed to git).


📚 Documentation


📊 Tech Stack

Remotion · OpenAI · Azure · Aliyun · Tencent · TypeScript · Node.js · FFmpeg


🆕 Version History

v1.6.2 (2026-03-25) - Current

  • Chinese TTS integration (Aliyun)
  • Dual subtitle styles
  • Medical content examples

v1.6.0 (2026-03-18)

  • Premium styles system
  • Poster generator
  • Design tokens

v1.2.0 (2026-03-07)

  • Background video support
  • Multi-provider architecture
  • Auto-fallback

v1.0.0 (2026-03-03)

  • Initial release

License: MIT | Author: @ZhenStaff | Support: GitHub Issues

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Seedance + Waoo 短视频流水线

自动化短视频工作流(story-to-video pipeline):从剧本/分镜到生成、字幕 ASR、TTS、合并交付,支持 Seedance / Vidu / MiniMax 多厂商路由。

Registry SourceRecently Updated
1721Profile unavailable
Automation

Create Video

Create videos from a text prompt using HeyGen's Video Agent (POST /v3/video-agents). The default for most video requests — AI handles script, avatar, visuals...

Registry SourceRecently Updated
3491Profile unavailable
Automation

Video Agent (Deprecated)

[DEPRECATED — uses outdated v1/v2 endpoints] Use `create-video` for prompt-based video generation (v3 Video Agent) or `avatar-video` for precise avatar/scene...

Registry SourceRecently Updated
5.3K7Profile unavailable
General

Omnicast

A local multi-modal podcast pipeline. Ingests media, drafts scripts, synthesizes audio, renders cover art, and uploads to YouTube.

Registry SourceRecently Updated
2811Profile unavailable