video-understand

Multi-provider video understanding with automatic fallback and model selection.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "video-understand" with this command: npx skills add ajoslin/dot/ajoslin-dot-video-understand

Video Understanding

Multi-provider video understanding with automatic fallback and model selection.

Quick Start

Check available providers

python3 scripts/check_providers.py

Process a video (auto-selects best provider)

python3 scripts/process_video.py "https://youtube.com/watch?v=..." python3 scripts/process_video.py /path/to/video.mp4

Custom prompt

python3 scripts/process_video.py video.mp4 -p "List all products shown with timestamps"

Use specific provider/model

python3 scripts/process_video.py video.mp4 --provider openrouter -m google/gemini-3-pro-preview

List available models

python3 scripts/process_video.py --list-models

Provider Hierarchy

Automatically selects the best available provider:

Priority Provider Capability Env Var Default Model

1 Gemini Full video GEMINI_API_KEY

gemini-3-flash-preview

2 Vertex AI Full video GOOGLE_APPLICATION_CREDENTIALS

gemini-3-flash-preview

3 OpenRouter Full video OPENROUTER_API_KEY

google/gemini-3-flash-preview

4 FFMPEG Frames + ASR None (requires ffmpeg + whisper) scene

5 OpenAI ASR only OPENAI_API_KEY

whisper-1

6 AssemblyAI ASR + analysis ASSEMBLYAI_API_KEY

best

7 Deepgram ASR DEEPGRAM_API_KEY

nova-2

8 Groq ASR (fast) GROQ_API_KEY

whisper-large-v3-turbo

9 Local Whisper ASR (offline) None base

Full video = visual + audio analysis. Frames + ASR = extracted screenshots + audio transcription (free, offline). ASR = audio transcription only.

CLI Options

python3 scripts/process_video.py [OPTIONS] SOURCE

Arguments: SOURCE YouTube URL, video URL, or local file path

Options: -p, --prompt TEXT Custom prompt for video understanding --provider NAME Force specific provider -m, --model NAME Force specific model --asr-only Force ASR-only mode (skip visual analysis) -o, --output FILE Write JSON to file instead of stdout -q, --quiet Suppress progress messages --list-models Show available models per provider --list-providers Show available providers as JSON

Model Selection

Each provider supports multiple models. Use --list-models to see options:

python3 scripts/process_video.py --list-models

OpenRouter models:

  • google/gemini-3-flash-preview (default) - Fast, free tier

  • google/gemini-3-pro-preview

  • Higher quality

Gemini models:

  • gemini-3-flash-preview (default) - Latest, fast

  • gemini-3-pro-preview

  • Highest quality

  • gemini-2.5-flash

  • Stable production fallback

Local Whisper models:

  • tiny , base (default), small , medium , large , large-v3

FFMPEG modes (frame extraction strategy):

  • scene (default) - Extract frames when scene changes (smart, efficient)

  • keyframe

  • Extract I-frames only (fastest)

  • interval

  • Extract frames at regular intervals (predictable)

Quick Reference

Task Reference

Setup & API keys setup-guide.md

Use Gemini for video gemini.md

Use OpenRouter openrouter.md

FFMPEG frames (free) ffmpeg-frames.md

ASR providers asr-providers.md

Output JSON schema output-format.md

Video sources & downloading video-sources.md

Verify Setup

python3 scripts/setup.py # Check dependencies and API keys

Output Format

All providers return consistent JSON:

{ "source": { "type": "youtube|url|local", "path": "...", "duration_seconds": 120.5, "size_mb": 15.2 }, "provider": "openrouter", "model": "google/gemini-3-flash-preview", "capability": "full_video", "response": "...", "transcript": [{"start": 0.0, "end": 2.5, "text": "..."}], "text": "Full transcript..." }

Features

  • Automatic provider selection based on available API keys

  • Model selection per provider with sensible defaults

  • Robust path handling for macOS special characters and unicode

  • Progress output (use -q for quiet mode)

  • File size warnings for API limits

  • Sane frame defaults for offline mode (downscaled/compressed images instead of huge 4k frames)

  • Auto-conversion of video formats when needed

  • YouTube URL support (direct or via download)

Requirements

For full video understanding:

pip install google-generativeai # Gemini pip install openai # OpenRouter

For ASR fallback:

brew install yt-dlp ffmpeg # Video tools pip install openai # OpenAI Whisper pip install groq # Groq Whisper pip install assemblyai # AssemblyAI pip install deepgram-sdk # Deepgram pip install openai-whisper # Local Whisper

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

pinescript

No summary provided by upstream source.

Repository SourceNeeds Review
General

session-export

No summary provided by upstream source.

Repository SourceNeeds Review
General

find-skills

No summary provided by upstream source.

Repository SourceNeeds Review