video-clip-extractor

Video Clip Extractor Skill

Run the video orchestrator to process videos and extract engaging highlights.

When Triggered

Get the source — if the user didn't provide a video URL or file path, ask for it.
Clarify intent (optional) — if the user wants clips focused on a specific topic, capture it for --user-intent . If unclear, ask: "Any specific topic or moments to focus on? (e.g. 'funny moments', 'key arguments')"
Check environment — does video_orchestrator.py exist in the current directory? If yes, run directly. Otherwise use the global install at ~/.local/share/openclip .
Verify prerequisites — check ffmpeg is installed and at least one API key is set. Warn if missing before running.
Run the command and stream output to user.
Report results — after completion, list the generated clips with timestamps and titles.

Setup (first use only)

Before running, determine the execution context:

Inside openclip repo — if video_orchestrator.py exists in the current directory, skip setup and run directly.
Global install — if ~/.local/share/openclip does not exist, run these steps:

Prerequisites: git and uv must be installed.

Install uv if missing: macOS: brew install uv · Linux/Windows: pip install uv

git clone https://github.com/linzzzzzz/openclip.git ~/.local/share/openclip cd ~/.local/share/openclip && uv sync

To update openclip later:

git -C ~/.local/share/openclip pull && cd ~/.local/share/openclip && uv sync

Execution

If inside the openclip repo (current directory contains video_orchestrator.py ):

uv run python video_orchestrator.py [options] <source>

If running globally (from any other directory):

cd ~/.local/share/openclip && uv run python video_orchestrator.py -o "$OLDPWD/processed_videos" [options] <source>

$OLDPWD captures the user's original directory so clips are saved there, not inside the openclip install.

Where <source> is a video URL (Bilibili/YouTube) or local file path (MP4, WebM, AVI, MOV, MKV).

For local files with existing subtitles, place the .srt file in the same directory with the same filename (e.g. video.mp4 → video.srt ).

Preflight Checklist

Inside openclip repo: run from the repo root so relative paths (e.g. references/ , prompts/ ) resolve correctly
ffmpeg must be installed (required for all clip generation):
macOS: brew install ffmpeg
Ubuntu: sudo apt install ffmpeg
Windows: download from ffmpeg.org
If using --burn-subtitles : needs ffmpeg with libass (see README for details)
Set one API key:
QWEN_API_KEY (default provider: qwen), or
OPENROUTER_API_KEY (if --llm-provider openrouter )
If using --speaker-references : run uv sync --extra speakers and set HUGGINGFACE_TOKEN

CLI Reference

Required

Argument Description

source

Video URL or local file path

Optional

Flag Default Description

-o , --output <dir>

processed_videos

Output directory

--max-clips <n>

Maximum number of highlight clips

--browser <browser>

firefox

Browser for cookies: chrome , firefox , edge , safari

--title-style <style>

fire_flame

Title style: gradient_3d , neon_glow , metallic_gold , rainbow_3d , crystal_ice , fire_flame , metallic_silver , glowing_plasma , stone_carved , glass_transparent

--title-font-size <size>

medium

Font size preset for artistic titles. Options: small(30px), medium(40px), large(50px), xlarge(60px)

--cover-text-location <loc>

center

Cover text position: top , upper_middle , bottom , center

--cover-fill-color <color>

yellow

Cover text fill color: yellow , red , white , cyan , green , orange , pink , purple , gold , silver

--cover-outline-color <color>

black

Cover text outline color: yellow , red , white , cyan , green , orange , pink , purple , gold , silver , black

--language <lang>

Output language: zh (Chinese), en (English)

--llm-provider <provider>

qwen

LLM provider: qwen , openrouter

--user-intent <text>

— Free-text focus description (e.g. "moments about AI risks"). Steers LLM clip selection toward this topic

--subtitle-translation <lang>

— Translate subtitles to this language before burning (e.g. "Simplified Chinese" ). Requires --burn-subtitles and QWEN_API_KEY

--speaker-references <dir>

— Directory of reference WAV files (one per speaker, filename = speaker name) for speaker diarization. Requires uv sync --extra speakers and HUGGINGFACE_TOKEN

-f , --filename <template>

— yt-dlp template: %(title)s , %(uploader)s , %(id)s , etc.

Flags

Flag Description

--force-whisper

Ignore platform subtitles, use Whisper

--skip-download

Use existing downloaded video

--skip-transcript

Skip transcript generation, use existing transcript file

--skip-analysis

Skip analysis, use existing analysis file for clip generation

--use-background

Include background info (streamer names/nicknames) in analysis prompts

--skip-clips

Skip clip generation

--add-titles

Add artistic titles to clips (disabled by default)

--skip-cover

Skip cover image generation

--burn-subtitles

Burn SRT subtitles into video. Output goes to clips_post_processed/ . Requires ffmpeg with libass

-v , --verbose

Enable verbose logging

--debug

Export full prompts sent to LLM (saved to debug_prompts/ )

Custom Filename Template (-f )

Uses yt-dlp template syntax. Common variables: %(title)s , %(uploader)s , %(upload_date)s , %(id)s , %(ext)s , %(duration)s .

Example: -f "%(upload_date)s_%(title)s.%(ext)s"

Environment Variables

Set the appropriate API key for the chosen --llm-provider :

QWEN_API_KEY — for --llm-provider qwen
OPENROUTER_API_KEY — for --llm-provider openrouter

Workflow

The orchestrator runs this pipeline automatically:

Download — fetch video + platform subtitles (Bilibili/YouTube) or accept local file
Split — divide videos longer than the built-in threshold into segments for parallel analysis
Transcribe — use platform subtitles or Whisper AI; --force-whisper overrides
Analyze — LLM scores transcript segments for engagement; --user-intent steers selection
Generate clips — ffmpeg cuts the video at identified timestamps
Add titles (opt-in) — render artistic text overlay using --title-style
Generate covers — create thumbnail image for each clip

Use --skip-clips , --skip-cover to skip specific steps. Use --add-titles to enable artistic titles. Use --skip-download and --skip-analysis to resume from intermediate results.

Output Example

After a successful run, report results like this:

✅ Processing complete — 5 clips generated 📁 processed_videos/video_name/clips/

clip_01.mp4 [00:12:34 – 00:15:20] "Title of the moment" clip_02.mp4 [00:28:45 – 00:31:10] "Another highlight" clip_03.mp4 [00:45:00 – 00:47:30] "Key discussion point" ...

Cover images: clips/*.jpg

Output Structure

processed_videos/{video_name}/ ├── downloads/ # Original video, subtitles, and metadata (URL sources) ├── local_videos/ # Copied video and subtitles (local file sources) ├── splits/ # Split parts and AI analysis results ├── clips/ # Generated highlight clips + cover images └── clips_post_processed/ # Post-processed clips when using --add-titles and/or --burn-subtitles

Option Selection Guide

Whisper model — Default base works for clear audio. Use small for background noise, multiple speakers, or accents. Use turbo for speed + accuracy. Use large /medium only when transcript quality is critical.

--force-whisper — Use when platform subtitles are auto-generated (often inaccurate), when "no engaging moments found" occurs (better transcripts improve analysis), or for non-native language content where platform captions are unreliable.

--use-background — Use for content featuring recurring personalities (streamers, hosts) where nicknames and community references matter. Reads from prompts/background/background.md .

Multi-part analysis — Videos that get split are analyzed per-segment, then aggregated to the top 5 engaging moments across all segments.

--user-intent — Steers LLM clip selection at both the per-segment and cross-segment aggregation stages. Useful when you want to find clips about a specific topic (e.g. "AI safety predictions", "funny moments").

--burn-subtitles — Hardcodes the SRT subtitle into the video frame. Use when you want subtitles always visible (e.g. for social media). Combine with --subtitle-translation to add a translated subtitle track below the original.

--speaker-references — Enables speaker diarization for interviews/podcasts. Provide a directory of 10–30 second clean WAV clips (one per speaker), named after the speaker (e.g. references/Host.wav ).

Troubleshooting

Error Fix

"ffmpeg not found" / clip generation fails silently Install ffmpeg: brew install ffmpeg (macOS) or sudo apt install ffmpeg (Ubuntu)

"No API key provided" Set QWEN_API_KEY or OPENROUTER_API_KEY env var

"Video download failed" Check network/URL; try different --browser ; or use local file

"Transcript generation failed" Try --force-whisper or check audio quality

"No engaging moments found" Try --force-whisper for better transcript accuracy

"Clip generation failed" Ensure analysis completed; check for existing analysis file

video-clip-extractor

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

Webhook Tester

Azuredatastudio

Awesome Postgres

Awesome Cheatsheets