Video Understanding

Multi-provider video understanding with automatic fallback and model selection.

Quick Start

Check available providers

python3 scripts/check_providers.py

Process a video (auto-selects best provider)

python3 scripts/process_video.py "https://youtube.com/watch?v=..." python3 scripts/process_video.py /path/to/video.mp4

Custom prompt

python3 scripts/process_video.py video.mp4 -p "List all products shown with timestamps"

Use specific provider/model

python3 scripts/process_video.py video.mp4 --provider openrouter -m google/gemini-3-pro-preview

List available models

python3 scripts/process_video.py --list-models

Provider Hierarchy

Automatically selects the best available provider:

Priority Provider Capability Env Var Default Model

1 Gemini Full video GEMINI_API_KEY

gemini-3-flash-preview

2 Vertex AI Full video GOOGLE_APPLICATION_CREDENTIALS

gemini-3-flash-preview

3 OpenRouter Full video OPENROUTER_API_KEY

google/gemini-3-flash-preview

4 FFMPEG Frames + ASR None (requires ffmpeg + whisper) scene

5 OpenAI ASR only OPENAI_API_KEY

whisper-1

6 AssemblyAI ASR + analysis ASSEMBLYAI_API_KEY

best

7 Deepgram ASR DEEPGRAM_API_KEY

nova-2

8 Groq ASR (fast) GROQ_API_KEY

whisper-large-v3-turbo

9 Local Whisper ASR (offline) None base

Full video = visual + audio analysis. Frames + ASR = extracted screenshots + audio transcription (free, offline). ASR = audio transcription only.

CLI Options

python3 scripts/process_video.py [OPTIONS] SOURCE

Arguments: SOURCE YouTube URL, video URL, or local file path

Options: -p, --prompt TEXT Custom prompt for video understanding --provider NAME Force specific provider -m, --model NAME Force specific model --asr-only Force ASR-only mode (skip visual analysis) -o, --output FILE Write JSON to file instead of stdout -q, --quiet Suppress progress messages --list-models Show available models per provider --list-providers Show available providers as JSON

Model Selection

Each provider supports multiple models. Use --list-models to see options:

python3 scripts/process_video.py --list-models

OpenRouter models:

google/gemini-3-flash-preview (default) - Fast, free tier
google/gemini-3-pro-preview
Higher quality

Gemini models:

gemini-3-flash-preview (default) - Latest, fast
gemini-3-pro-preview
Highest quality
gemini-2.5-flash
Stable production fallback

Local Whisper models:

tiny , base (default), small , medium , large , large-v3

FFMPEG modes (frame extraction strategy):

scene (default) - Extract frames when scene changes (smart, efficient)
keyframe
Extract I-frames only (fastest)
interval
Extract frames at regular intervals (predictable)

Quick Reference

Task Reference

Setup & API keys setup-guide.md

Use Gemini for video gemini.md

Use OpenRouter openrouter.md

FFMPEG frames (free) ffmpeg-frames.md

ASR providers asr-providers.md

Output JSON schema output-format.md

Video sources & downloading video-sources.md

Verify Setup

python3 scripts/setup.py # Check dependencies and API keys

Output Format

All providers return consistent JSON:

{ "source": { "type": "youtube|url|local", "path": "...", "duration_seconds": 120.5, "size_mb": 15.2 }, "provider": "openrouter", "model": "google/gemini-3-flash-preview", "capability": "full_video", "response": "...", "transcript": [{"start": 0.0, "end": 2.5, "text": "..."}], "text": "Full transcript..." }

Features

Automatic provider selection based on available API keys
Model selection per provider with sensible defaults
Robust path handling for macOS special characters and unicode
Progress output (use -q for quiet mode)
File size warnings for API limits
Sane frame defaults for offline mode (downscaled/compressed images instead of huge 4k frames)
Auto-conversion of video formats when needed
YouTube URL support (direct or via download)

Requirements

For full video understanding:

pip install google-generativeai # Gemini pip install openai # OpenRouter

For ASR fallback:

brew install yt-dlp ffmpeg # Video tools pip install openai # OpenAI Whisper pip install groq # Groq Whisper pip install assemblyai # AssemblyAI pip install deepgram-sdk # Deepgram pip install openai-whisper # Local Whisper

video-understand

Safety Notice

Copy this and send it to your AI assistant to learn

Check available providers

Process a video (auto-selects best provider)

Custom prompt

Use specific provider/model

List available models

Source Transparency

Related Skills

pinescript

session-export

find-skills