video-subtitle-generator

Generate and translate video subtitles using WhisperX and LLM translation. Use when processing video files to create .srt subtitle files. Supports multilingual transcription (auto-detect source language), translation to any target language, and bilingual subtitle generation.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "video-subtitle-generator" with this command: npx skills add Jianhua-Cui/video-subtitle-generator

Video Subtitle Generator

Multilingual video subtitle generation and translation toolkit built on WhisperX.

Features

  • Speech transcription: Extract audio from video and transcribe it into subtitles with automatic source language detection
  • Multilingual translation: Translate subtitles from any source language into a configurable target language
  • Bilingual subtitles: Generate source + target bilingual subtitles

Prerequisites

  • Python 3.9+
  • ffmpeg (required by WhisperX for audio extraction)
# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Windows (Chocolatey)
choco install ffmpeg

# Windows (Scoop)
scoop install ffmpeg

Resource requirements

Before running, confirm the user is aware of the following costs:

ResourceDetails
Diskffmpeg ~80 MB; Python packages (torch, whisperx, etc.) 2–5 GB; Whisper model weights 39 MB – 1.5 GB depending on model size
CPU / GPUWhisperX runs model inference locally. A CUDA GPU is strongly recommended for medium and large models. CPU and Apple MPS also work but are significantly slower
Network / APITranslation step calls a remote LLM API and incurs token-based charges. No network is needed for the transcription step once the model is downloaded

Always confirm with the user before installing packages or downloading models, as these operations consume storage and bandwidth.

Translation requires an LLM API and will incur costs. Before executing the translation step:

  1. Ask the user for the API provider, key, and base URL — or present any auto-discovered configuration for review
  2. Inform the user that translation calls a remote LLM and will consume tokens (i.e. real money)
  3. Do NOT proceed with translation until the user explicitly confirms the provider and acknowledges the cost

Usage

1. Environment setup

# Install dependencies (requires ~2–5 GB disk space for PyTorch and WhisperX)
pip install -r requirements.txt

# Set the API key (used for translation)
# macOS / Linux
export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"  # Optional, defaults to OpenRouter

# Windows (PowerShell)
$env:OPENAI_API_KEY="your-api-key"
$env:OPENAI_BASE_URL="https://openrouter.ai/api/v1"

On Windows, use python instead of python3 in all commands below.

2. Transcribe video (auto-detect language)

python3 scripts/transcribe.py "/path/to/video.mp4" -o ./output -m small

Output: video.{detected_lang}.srt (e.g. video.en.srt, video.ja.srt)

Arguments:

  • -o: Output directory
  • -m: Model size (tiny, base, small, medium, large)
  • -d: Device (cuda, cpu, mps), auto-detected by default
  • -l: Force source language code (e.g. en, ja, zh). Auto-detect if omitted

3. Batch-process a directory

python3 scripts/transcribe.py "/path/to/video/folder" -o ./output -m small

4. Translate subtitles

Cost warning: This step calls a remote LLM API. Ensure the user has confirmed the API provider, key, and billing awareness before running.

# Translate to Chinese (default)
python3 scripts/translate.py ./output -o ./translated

# Translate to Japanese
python3 scripts/translate.py ./output -o ./translated -t ja

# Only generate bilingual subtitles
python3 scripts/translate.py ./output -o ./translated --bilingual

Arguments:

  • -t, --target-lang: Target language code (default: zh)
  • --bilingual: Generate bilingual (source + target) subtitles
  • --target-only: Generate target-language-only subtitles
  • --model: Translation model (default: google/gemini-3-flash-preview)
  • --batch-size: Batch size (default: 10)

When neither --bilingual nor --target-only is specified, both are generated.

5. Run the full pipeline

python3 scripts/run.py

# Customize via environment variables
VIDEO_DIR="/path/to/videos" TARGET_LANG=en python3 scripts/run.py

Environment variables for run.py:

  • VIDEO_DIR: Video source directory (default: ./videos)
  • OUTPUT_DIR: Transcription output directory (default: ./output)
  • TRANSLATED_DIR: Translation output directory (default: ./translated)
  • TARGET_LANG: Target language code (default: zh)
  • WHISPER_MODEL: Whisper model size (default: medium)

Model selection

ModelSizeSpeedAccuracyBest for
tiny39 MBFastestFairQuick tests
base74 MBFastGoodReal-time usage
small244 MBMediumGoodRecommended
medium769 MBSlowerVery goodHigher quality
large1550 MBSlowBestProfessional use

Output files

For each video, the tool generates:

  • *.{lang}.srt - Source-language subtitles (language auto-detected, e.g. video.en.srt)
  • *.json - Full transcription data with timestamps
  • *.bilingual.srt - Bilingual subtitles (source + target) after translation
  • *.{target}.srt - Target-language-only subtitles after translation (e.g. video.zh.srt)

Script overview

scripts/transcribe.py

Uses WhisperX for transcription and supports:

  • Automatic source language detection (or manual override via -l)
  • Timestamp alignment
  • Batch processing with model reuse across files

scripts/translate.py

Uses an LLM API to translate subtitles and supports:

  • Configurable target language (-t)
  • Batch translation for better efficiency
  • Bilingual or target-language-only output
  • Custom models and API endpoints
  • Automatic retry with exponential backoff on API failures

scripts/run.py

Cross-platform one-command runner that executes the transcription and translation pipeline automatically. Paths, target language, and model size are configurable via environment variables.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Sendflare

通过 Sendflare SDK 发送带附件的电子邮件,管理联系人列表,支持 CC/BCC 和安全 API 认证。

Registry SourceRecently Updated
General

Playtomic - Book courts using padel-tui

This skill should be used when the user asks to "book a padel court", "find available padel courts", "search padel courts near me", "reserve a Playtomic cour...

Registry SourceRecently Updated
General

Fund Keeper

国内场外基金智能顾问 + 股票行情查询。实时估值、买卖建议、收益统计、定投计划、OCR 识图、股票 - 基金联动。支持离线模式、多数据源缓存。

Registry SourceRecently Updated