VideoDB Skill
Perception + memory + actions for video, live streams, and desktop sessions.
Use this skill when you need to:
- Desktop Perception
-
Start/stop a desktop session capturing screen, mic, and system audio
-
Stream live context and store episodic session memory
-
Run real-time alerts/triggers on what’s spoken and what's happening on screen
-
Produce session summaries, a searchable timeline, and playable evidence links
- Video ingest + stream
-
Ingest a file or URL and return a playable web stream link
-
Transcode/normalize: codec, bitrate, fps, resolution, aspect ratio
- Index + search (timestamps + evidence)
-
Build visual, spoken, and keyword indexes
-
Search and return exact moments with timestamps and playable evidence
-
Auto-create clips from search results
- Timeline editing + generation
-
Subtitles: generate, translate, burn-in
-
Overlays: text/image/branding, motion captions
-
Audio: background music, voiceover, dubbing
-
Programmatic composition and exports via timeline operations
- Live streams (RTSP) + monitoring
-
Connect RTSP/live feeds
-
Run real-time visual and spoken understanding and emit events/alerts for monitoring workflows
Common inputs
-
Local file path, public URL, or RTSP URL
-
Desktop capture request: start / stop / summarize session
-
Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules
Common outputs
-
Stream URL — make it playable: https://console.videodb.io/player?url={STREAM_URL}
-
Search results with timestamps and evidence links
-
Generated assets: subtitles, audio, images, clips
-
Event/alert payloads for live streams
-
Desktop session summaries and memory entries
Canonical prompts (examples)
-
“Start desktop capture and alert when a password field appears.”
-
“Record my session and produce an actionable summary when it ends.”
-
“Ingest this file and return a playable stream link.”
-
“Index this folder and find every scene with people, return timestamps.”
-
“Generate subtitles, burn them in, and add light background music.”
-
“Connect this RTSP URL and alert when a person enters the zone.”
Running Python code
CRITICAL: Always cd to the user's project directory before running Python code. This ensures load_dotenv(".env") finds the correct .env file.
from dotenv import load_dotenv load_dotenv(".env")
import videodb conn = videodb.connect()
This reads VIDEO_DB_API_KEY from:
-
Environment (if already exported)
-
Project's .env file in current directory
If the key is missing, videodb.connect() raises AuthenticationError automatically.
Do NOT write a script file when a short inline command works.
When writing inline Python (python -c "..." ), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:
python << 'EOF' from dotenv import load_dotenv load_dotenv(".env")
import videodb conn = videodb.connect() coll = conn.get_collection() print(f"Videos: {len(coll.get_videos())}") EOF
Setup
When the user asks to "setup videodb" or similar:
- Install SDK
pip install "videodb[capture]" python-dotenv
If videodb[capture] fails on Linux, install without the capture extra:
pip install videodb python-dotenv
- Configure API key
The user must set VIDEO_DB_API_KEY using either method:
-
Export in terminal (recommended): export VIDEO_DB_API_KEY=your-key
-
Project .env file: Save VIDEO_DB_API_KEY=your-key in the project's .env file
Get a free API key at https://console.videodb.io (50 free uploads, no credit card).
Do NOT read, write, or handle the API key yourself. Always let the user set it.
Quick Reference
Upload media
URL
video = coll.upload(url="https://example.com/video.mp4")
YouTube
video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")
Local file
video = coll.upload(file_path="/path/to/video.mp4")
Transcript + subtitle
force=True skips the error if the video is already indexed
video.index_spoken_words(force=True) text = video.get_transcript_text() stream_url = video.add_subtitle()
Search inside videos
from videodb.exceptions import InvalidRequestError
video.index_spoken_words(force=True)
search() raises InvalidRequestError when no results are found.
Always wrap in try/except and treat "No results found" as empty.
try: results = video.search("product demo") shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise
Scene search
import re from videodb import SearchType, IndexType, SceneExtractionType from videodb.exceptions import InvalidRequestError
index_scenes() has no force parameter — it raises an error if a scene
index already exists. Extract the existing index ID from the error.
try: scene_index_id = video.index_scenes( extraction_type=SceneExtractionType.shot_based, prompt="Describe the visual content in this scene.", ) except Exception as e: match = re.search(r"id\s+([a-f0-9]+)", str(e)) if match: scene_index_id = match.group(1) else: raise
Use score_threshold to filter low-relevance noise (recommended: 0.3+)
try: results = video.search( query="person writing on a whiteboard", search_type=SearchType.semantic, index_type=IndexType.scene, scene_index_id=scene_index_id, score_threshold=0.3, ) shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise
Timeline editing
Important: Always validate timestamps before building a timeline:
-
start must be >= 0 (negative values are silently accepted but produce broken output)
-
start must be < end
-
end must be <= video.length
from videodb.timeline import Timeline from videodb.asset import VideoAsset, TextAsset, TextStyle
timeline = Timeline(conn) timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30)) timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36))) stream_url = timeline.generate_stream()
Transcode video (resolution / quality change)
from videodb import TranscodeMode, VideoConfig, AudioConfig
Change resolution, quality, or aspect ratio server-side
job_id = conn.transcode( source="https://example.com/video.mp4", callback_url="https://example.com/webhook", mode=TranscodeMode.economy, video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"), audio_config=AudioConfig(mute=False), )
Reframe aspect ratio (for social platforms)
Warning: reframe() is a slow server-side operation. For long videos it can take several minutes and may time out. Best practices:
-
Always limit to a short segment using start /end when possible
-
For full-length videos, use callback_url for async processing
-
Trim the video on a Timeline first, then reframe the shorter result
from videodb import ReframeMode
Always prefer reframing a short segment:
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
Async reframe for full-length videos (returns None, result via webhook):
video.reframe(target="vertical", callback_url="https://example.com/webhook")
Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9)
reframed = video.reframe(start=0, end=60, target="square")
Custom dimensions
reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})
Generative media
image = coll.generate_image( prompt="a sunset over mountains", aspect_ratio="16:9", )
Error handling
from videodb.exceptions import AuthenticationError, InvalidRequestError
try: conn = videodb.connect() except AuthenticationError: print("Check your VIDEO_DB_API_KEY")
try: video = coll.upload(url="https://example.com/video.mp4") except InvalidRequestError as e: print(f"Upload failed: {e}")
Common pitfalls
Scenario Error message Solution
Indexing an already-indexed video Spoken word index for video already exists
Use video.index_spoken_words(force=True) to skip if already indexed
Scene index already exists Scene index with id XXXX already exists
Extract the existing scene_index_id from the error with re.search(r"id\s+([a-f0-9]+)", str(e))
Search finds no matches InvalidRequestError: No results found
Catch the exception and treat as empty results (shots = [] )
Reframe times out Blocks indefinitely on long videos Use start /end to limit segment, or pass callback_url for async
Negative timestamps on Timeline Silently produces broken stream Always validate start >= 0 before creating VideoAsset
generate_video() / create_collection() fails Operation not allowed or maximum limit
Plan-gated features — inform the user about plan limits
Additional docs
Reference documentation is in the reference/ directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.
-
reference/api-reference.md - Complete VideoDB Python SDK API reference
-
reference/search.md - In-depth guide to video search (spoken word and scene-based)
-
reference/editor.md - Timeline editing, assets, and composition
-
reference/streaming.md - HLS streaming and instant playback
-
reference/generative.md - AI-powered media generation (images, video, audio)
-
reference/rtstream.md - Live stream ingestion workflow (RTSP/RTMP)
-
reference/rtstream-reference.md - RTStream SDK methods and AI pipelines
-
reference/capture.md - Desktop capture workflow
-
reference/capture-reference.md - Capture SDK and WebSocket events
-
reference/use-cases.md - Common video processing patterns and examples
Screen Recording (Desktop Capture)
Use ws_listener.py to capture WebSocket events during recording sessions. Desktop capture supports macOS only.
Quick Start
-
Start listener: python scripts/ws_listener.py &
-
Get WebSocket ID: cat /tmp/videodb_ws_id
-
Run capture code (see reference/capture.md for full workflow)
-
Events written to: /tmp/videodb_events.jsonl
Query Events
import json events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]
Get all transcripts
transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
Get visual descriptions from last 5 minutes
import time cutoff = time.time() - 300 recent_visual = [e for e in events if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]
Utility Scripts
- scripts/ws_listener.py - WebSocket event listener (dumps to JSONL)
For complete capture workflow, see reference/capture.md.
Do not use ffmpeg, moviepy, or local encoding tools when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing).
When to use what
Problem VideoDB solution
Platform rejects video aspect ratio or resolution video.reframe() or conn.transcode() with VideoConfig
Need to resize video for Twitter/Instagram/TikTok video.reframe(target="vertical") or target="square"
Need to change resolution (e.g. 1080p → 720p) conn.transcode() with VideoConfig(resolution=720)
Need to overlay audio/music on video AudioAsset on a Timeline
Need to add subtitles video.add_subtitle() or CaptionAsset
Need to combine/trim clips VideoAsset on a Timeline
Need to generate voiceover, music, or SFX coll.generate_voice() , generate_music() , generate_sound_effect()