video-understanding

Analyze videos with Google Gemini multimodal AI. Download from any URL (Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, 1000+ sites) and get transcripts, descriptions, and answers to questions. Use when asked to watch, analyze, summarize, or transcribe a video, or answer questions about video content. Triggers on video URLs or requests involving video understanding.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "video-understanding" with this command: npx skills add bill492/video-understanding

Video Understanding (Gemini)

Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.

Requirements

  • yt-dlpbrew install yt-dlp / pip install yt-dlp
  • ffmpegbrew install ffmpeg (for merging video+audio streams)
  • GEMINI_API_KEY environment variable

Default Output

Returns structured JSON:

  • transcript — Verbatim transcript with [MM:SS] timestamps
  • description — Visual description (people, setting, UI, text on screen, flow)
  • summary — 2-3 sentence summary
  • duration_seconds — Estimated duration
  • speakers — Identified speakers

Usage

Analyze a video (structured JSON output)

uv run {baseDir}/scripts/analyze_video.py "<video-url>"

Ask a question (adds "answer" field)

uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?"

Override prompt entirely

uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw

Download only (no analysis)

uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4

Options

FlagDescriptionDefault
-q / --questionQuestion to answer (added to default fields)none
-p / --promptOverride entire prompt (ignores -q)structured JSON
-m / --modelGemini modelgemini-2.5-flash
-o / --outputSave output to filestdout
--keepKeep downloaded video filefalse
--download-onlyDownload only, skip analysisfalse
--max-sizeMax file size in MB500
--rawRaw text output instead of JSONfalse

How It Works

  1. YouTube URLs → Passed directly to Gemini (no download needed)
  2. All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
  3. Gemini analyzes video with structured prompt → returns JSON
  4. Temp files and Gemini uploads cleaned up automatically

Supported Sources

Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.

Tips

  • Use -q for targeted questions on top of the full analysis
  • YouTube is fastest (no download step)
  • Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
  • The script auto-installs Python dependencies via uv

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Accelo

Accelo integration. Manage Organizations, Leads, Pipelines, Users, Goals, Filters. Use when the user wants to interact with Accelo data.

Registry SourceRecently Updated
General

8X8

8x8 integration. Manage Persons, Organizations, Deals, Leads, Activities, Notes and more. Use when the user wants to interact with 8x8 data.

Registry SourceRecently Updated
General

7Shifts

7shifts integration. Manage Companies. Use when the user wants to interact with 7shifts data.

Registry SourceRecently Updated
General

46Elks

46elks integration. Manage Organizations. Use when the user wants to interact with 46elks data.

Registry SourceRecently Updated