Video Processing

Overview

This skill provides guidance for video processing tasks involving frame-level analysis, event detection, and motion tracking using computer vision libraries like OpenCV. It emphasizes verification-first approaches and guards against common pitfalls in video analysis workflows.

Core Approach: Verify Before Implementing

Before writing detection algorithms, establish ground truth understanding of the video content:

Extract and inspect sample frames - Save key frames as images to visually verify what is happening at specific frame numbers
Understand video metadata - Frame count, FPS, duration, resolution
Map expected events to frame ranges - If test data exists, understand what frames correspond to which events
Build diagnostic tools first - Frame extraction and visualization utilities provide critical insight

Workflow for Event Detection Tasks

Phase 1: Video Exploration

Essential first steps for any video analysis task

import cv2

cap = cv2.VideoCapture(video_path) frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) fps = cap.get(cv2.CAP_PROP_FPS) duration = frame_count / fps

print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")

Critical: Extract frames at expected event locations to verify understanding:

def save_frame(video_path, frame_num, output_path): cap = cv2.VideoCapture(video_path) cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num) ret, frame = cap.read() if ret: cv2.imwrite(output_path, frame) cap.release()

Save frames at expected event times for visual inspection

save_frame("video.mp4", 50, "frame_050.png") save_frame("video.mp4", 60, "frame_060.png")

Phase 2: Algorithm Development

When developing detection algorithms:

Start simple - Basic frame differencing or thresholding before complex approaches
Use configurable thresholds - Avoid hardcoded magic numbers; derive from data
Test on known frames first - Verify algorithm produces expected results on frames with known ground truth
Log intermediate values - Track metrics at each frame to understand algorithm behavior

Phase 3: Validation

Before finalizing:

Sanity check outputs - Do detected events occur in reasonable order and timing?
Test on multiple videos - Verify generalization across different inputs
Compare against expected ranges - If ground truth exists, verify detection accuracy

Common Detection Approaches

Frame Differencing

Compares frames against a reference (first frame or previous frame) to detect motion:

Background subtraction approach

first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY) first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)

For each subsequent frame

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (21, 21), 0) diff = cv2.absdiff(first_frame, gray)

Pitfall: First frame may not be a suitable reference if scene changes or camera moves.

Contour-Based Detection

Identifies objects by finding contours in thresholded images:

_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY) contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

Pitfall: Threshold values (e.g., 25) and minimum contour areas are arbitrary without calibration.

Tracking Position Over Time

For detecting events like jumps or gestures, track object position across frames:

positions = [] # (frame_num, x, y, area) tuples for frame_num in range(frame_count): # ... detection code ... if detected: positions.append((frame_num, cx, cy, area))

Pitfall: Coordinate systems matter. In image coordinates, Y increases downward, so "higher in frame" means smaller Y values.

Verification Strategies

Visual Inspection

Save frames at detected event times to verify correctness:

After detecting takeoff at frame N

save_frame(video_path, detected_takeoff, "detected_takeoff.png") save_frame(video_path, detected_takeoff - 5, "before_takeoff.png") save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")

Timing Reasonableness

Check if detected events make temporal sense:

duration_seconds = frame_count / fps event_time = detected_frame / fps

Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds

if event_time > duration_seconds - 0.5: print("WARNING: Event detected very late in video - verify correctness")

Sequence Validation

Ensure events occur in logical order:

if detected_landing <= detected_takeoff: print("ERROR: Landing cannot occur before or at takeoff")

Multi-Video Testing

Test on multiple inputs early to catch overfitting to single video characteristics.

Common Pitfalls

No Ground Truth Verification

Problem: Relying entirely on computed metrics without visual confirmation.

Solution: Always save and inspect frames at detected event locations.

Confirmation Bias in Data Interpretation

Problem: When data shows unexpected patterns, inventing explanations that fit preconceptions rather than questioning assumptions.

Solution: When detection results seem wrong, investigate root causes rather than rationalizing unexpected behavior.

Magic Number Thresholds

Problem: Using arbitrary thresholds (500 for contour area, 25 for binary threshold) without empirical basis.

Solution: Derive thresholds from actual video data or make them configurable with sensible defaults.

Ignoring Detection Gaps

Problem: When detection fails for a range of frames, assuming this is expected behavior without investigation.

Solution: Investigate why detection fails - it may indicate algorithm flaws rather than expected behavior.

Coordinate System Confusion

Problem: Misinterpreting Y coordinates (smaller Y = higher in frame in image coordinates).

Solution: Explicitly document coordinate system assumptions and verify with visual inspection.

Ignoring Timing Reasonableness

Problem: Accepting detections that don't make temporal sense (e.g., event detected in last 0.8 seconds of a 4-second video).

Solution: Implement sanity checks on output timing.

Single Video Overfitting

Problem: Algorithm works on one video but fails on others.

Solution: Test on multiple videos early in development.

Output Format Considerations

When outputting results (e.g., to TOML, JSON):

import numpy as np

Convert numpy types to Python native types for serialization

result = { "takeoff_frame": int(takeoff_frame), # Not np.int64 "landing_frame": int(landing_frame), }

Debugging Checklist

When detection results are incorrect:

Have I visually inspected frames at the expected event times?
Have I visually inspected frames at my detected event times?
Do my detected times make temporal sense given video duration?
Have I verified my algorithm on frames with known ground truth?
Am I correctly interpreting the coordinate system?
Have I tested on multiple videos?
Are my thresholds derived from data or arbitrary?
When detection fails on some frames, do I understand why?

video-processing

Safety Notice

Copy this and send it to your AI assistant to learn

Essential first steps for any video analysis task

Save frames at expected event times for visual inspection

Background subtraction approach

For each subsequent frame

After detecting takeoff at frame N

Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds

Convert numpy types to Python native types for serialization

Source Transparency

Related Skills

feal-differential-cryptanalysis

feal-linear-cryptanalysis

extracting-pdf-text