video-subtitle-cutter

Automate video editing by:

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "video-subtitle-cutter" with this command: npx skills add different-ai/agent-bank/different-ai-agent-bank-video-subtitle-cutter

What I Do

Automate video editing by:

  • Transcribing video to timestamped subtitles (Whisper)

  • Analyzing transcript with AI to identify cuts (filler words, pauses, mistakes)

  • Generating FFmpeg commands to cut and concatenate clean segments

  • Generating subtitles (SRT) for the final video

CRITICAL: Always Re-encode (Never Use -c copy )

The #1 mistake is using -c copy for cutting. This causes:

  • Frozen frames at cut points (1-8 seconds of freeze)

  • Audio/video sync issues

  • Glitchy playback

Why? H.264 video uses keyframes (I-frames) every 2-10 seconds. -c copy can only cut at keyframes, so FFmpeg includes extra frames that display as frozen.

Solution: Always re-encode segments with quality settings:

WRONG - causes freeze frames

ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4

CORRECT - smooth cuts at any timestamp

ffmpeg -ss 10 -i video.mp4 -t 5
-c:v libx264 -preset fast -crf 18
-c:a aac -b:a 192k
-avoid_negative_ts make_zero
segment.mp4

Quality presets (CRF = Constant Rate Factor):

  • crf 15-17 = Near lossless (large files)

  • crf 18-20 = High quality (recommended)

  • crf 21-23 = Good quality (smaller files)

  • crf 24-28 = Medium quality (much smaller)

Prerequisites

Install Whisper (choose one)

pip install openai-whisper # Local (requires Python 3.9+)

OR use OpenAI API (no local install needed)

Install FFmpeg

brew install ffmpeg # macOS sudo apt install ffmpeg # Linux

Quick Start

Step 1: Transcribe Video

Option A: Local Whisper (free, slower)

whisper video.mp4 --model medium --output_format json --output_dir ./

Option B: OpenAI Whisper API (fast, paid)

curl https://api.openai.com/v1/audio/transcriptions
-H "Authorization: Bearer $OPENAI_API_KEY"
-F file="@video.mp4"
-F model="whisper-1"
-F response_format="verbose_json"
-F timestamp_granularities[]="segment" \

transcript.json

Option C: Use ffmpeg to extract audio first (for large files)

Extract audio (much smaller file to upload)

ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 audio.mp3

Then transcribe the audio

whisper audio.mp3 --model medium --output_format json

Step 2: Analyze Transcript for Cuts

Feed the transcript to the AI with this prompt:

Analyze this video transcript and identify segments to CUT (remove).

TRANSCRIPT: {paste transcript.json segments here}

Identify these issues:

  1. FILLER WORDS: "um", "uh", "like", "you know", "basically", "actually", "so", "right"
  2. FALSE STARTS: Incomplete sentences that restart ("I think— actually, let me...")
  3. LONG PAUSES: Gaps > 1.5 seconds between segments
  4. REPETITIONS: Same word/phrase repeated ("really really really")
  5. CORRECTIONS: "Wait, I meant...", "Sorry, let me rephrase..."
  6. TANGENTS: Off-topic rambling (use judgment)

Return a JSON array of segments to KEEP (not cut): [ {"start": 0.0, "end": 2.5, "text": "Welcome to this video"}, {"start": 3.1, "end": 8.4, "text": "Today we're going to cover..."}, ... ]

Rules:

  • Merge adjacent keep segments if gap < 0.3s
  • Ensure cuts don't happen mid-word (check word boundaries)
  • Preserve natural speech rhythm (don't over-cut)
  • When in doubt, keep the segment

Step 3: Generate FFmpeg Commands (High Quality)

Once you have the keep segments, use this Python script for smooth cuts:

import json import subprocess import os

VIDEO_INPUT = "video.mp4" VIDEO_OUTPUT = "video_clean.mp4" SEGMENTS_FILE = "keep_segments.json"

with open(SEGMENTS_FILE) as f: segments = json.load(f)

segment_files = [] for i, seg in enumerate(segments): outfile = f"temp_seg_{i:04d}.mp4" segment_files.append(outfile)

# MUST re-encode for smooth cuts (no -c copy!)
cmd = [
    'ffmpeg', '-y',
    '-ss', str(seg['start']),      # Seek BEFORE input (fast)
    '-i', VIDEO_INPUT,
    '-t', str(seg['end'] - seg['start']),  # Duration
    '-c:v', 'libx264',
    '-preset', 'fast',              # fast/medium/slow
    '-crf', '18',                   # Quality (lower = better, 15-23 recommended)
    '-c:a', 'aac',
    '-b:a', '192k',
    '-avoid_negative_ts', 'make_zero',  # Fix timestamp issues
    '-async', '1',                  # Sync audio
    outfile
]
subprocess.run(cmd, capture_output=True)
print(f"✓ Segment {i+1}/{len(segments)}")

Create concat file

with open('temp_concat.txt', 'w') as f: for sf in segment_files: f.write(f"file '{sf}'\n")

Concatenate (can use -c copy here since all segments match)

subprocess.run([ 'ffmpeg', '-y', '-f', 'concat', '-safe', '0', '-i', 'temp_concat.txt', '-c', 'copy', VIDEO_OUTPUT ])

Cleanup

for sf in segment_files: os.remove(sf) os.remove('temp_concat.txt') print(f"✓ Created: {VIDEO_OUTPUT}")

Key flags explained:

  • -ss before -i : Fast seek (doesn't decode entire video)

  • -t : Duration of segment (not end time)

  • -crf 18 : High quality encoding

  • -avoid_negative_ts make_zero : Fixes concat timestamp issues

  • -async 1 : Keeps audio in sync

Step 4: Generate Subtitles

After creating the final video, generate fresh subtitles with Whisper:

Generate SRT subtitles for the cleaned video

whisper video_clean.mp4 --model medium --output_format srt --output_dir ./

For higher accuracy (slower):

whisper video_clean.mp4 --model large --output_format srt --language en

Output: video_clean.srt

Burn subtitles into video (optional):

Embed subtitles permanently

ffmpeg -i video_clean.mp4 -vf "subtitles=video_clean.srt:force_style='FontSize=24,FontName=Arial,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2'" -c:a copy video_with_subs.mp4

Subtitle styling options:

  • FontSize=24

  • Text size

  • FontName=Arial

  • Font face

  • PrimaryColour=&HFFFFFF

  • White text (BGR format)

  • OutlineColour=&H000000

  • Black outline

  • Outline=2

  • Outline thickness

  • MarginV=50

  • Distance from bottom

Complete Workflow Script (High Quality)

#!/usr/bin/env python3 """ video_clean.py - Clean up video by removing filler words/pauses Uses re-encoding for smooth cuts (no freeze frames) """

import json import subprocess import os import sys

def get_duration(filepath): """Get video duration in seconds""" result = subprocess.run([ 'ffprobe', '-v', 'quiet', '-print_format', 'json', '-show_format', filepath ], capture_output=True, text=True) return float(json.loads(result.stdout)['format']['duration'])

def extract_segment(input_file, start, end, output_file, crf=18, preset='fast'): """Extract a segment with re-encoding for smooth cuts""" cmd = [ 'ffmpeg', '-y', '-ss', str(start), '-i', input_file, '-t', str(end - start), '-c:v', 'libx264', '-preset', preset, '-crf', str(crf), '-c:a', 'aac', '-b:a', '192k', '-avoid_negative_ts', 'make_zero', '-async', '1', output_file ] return subprocess.run(cmd, capture_output=True, text=True)

def concatenate_segments(segment_files, output_file): """Concatenate segments into final video""" with open('temp_concat.txt', 'w') as f: for sf in segment_files: f.write(f"file '{sf}'\n")

subprocess.run([
    'ffmpeg', '-y', '-f', 'concat', '-safe', '0',
    '-i', 'temp_concat.txt',
    '-c', 'copy',
    output_file
], capture_output=True)

os.remove('temp_concat.txt')

def generate_subtitles(video_file, model='medium'): """Generate SRT subtitles using Whisper""" subprocess.run([ 'whisper', video_file, '--model', model, '--output_format', 'srt', '--output_dir', './' ])

def main(video_input, segments, output_name, crf=18): """Main workflow""" segment_files = []

print(f"\n{'='*50}")
print(f"Processing: {video_input}")
print(f"Quality: CRF {crf} (lower=better, 15-23 recommended)")
print(f"{'='*50}\n")

# Extract segments with re-encoding
for i, seg in enumerate(segments):
    outfile = f"temp_seg_{i:04d}.mp4"
    segment_files.append(outfile)

    result = extract_segment(video_input, seg['start'], seg['end'], outfile, crf)
    if result.returncode == 0:
        duration = seg['end'] - seg['start']
        print(f"✓ Segment {i+1}/{len(segments)}: {duration:.1f}s")
    else:
        print(f"✗ Error on segment {i+1}")
        print(result.stderr[-500:])

# Concatenate
print("\nConcatenating segments...")
concatenate_segments(segment_files, output_name)

# Cleanup temp segments
for sf in segment_files:
    os.remove(sf)

# Generate subtitles
print("\nGenerating subtitles...")
generate_subtitles(output_name)

# Stats
orig_duration = get_duration(video_input)
new_duration = get_duration(output_name)
orig_size = os.path.getsize(video_input) / (1024*1024)
new_size = os.path.getsize(output_name) / (1024*1024)

print(f"\n{'='*50}")
print(f"COMPLETE")
print(f"{'='*50}")
print(f"Original:  {orig_duration:.0f}s | {orig_size:.1f} MB")
print(f"Output:    {new_duration:.0f}s | {new_size:.1f} MB")
print(f"Removed:   {orig_duration - new_duration:.0f}s ({((orig_duration - new_duration)/orig_duration)*100:.0f}%)")
print(f"Video:     {output_name}")
print(f"Subtitles: {output_name.replace('.mp4', '.srt')}")

if name == 'main': # Example usage VIDEO = "input.mp4" SEGMENTS = [ {"start": 0.0, "end": 10.5}, {"start": 12.3, "end": 25.0}, # ... add your segments ] main(VIDEO, SEGMENTS, "output_clean.mp4", crf=18)

AI Analysis Prompt Templates

Basic Cleanup (Filler Words Only)

Remove filler words from this transcript. Return segments to KEEP.

Filler words to remove: um, uh, like, you know, basically, actually, so, right, I mean

TRANSCRIPT SEGMENTS: {segments}

Return JSON: [{"start": float, "end": float, "text": "cleaned text"}, ...]

Aggressive Cleanup (Podcast/Interview)

Clean this podcast transcript for a tight, professional edit.

REMOVE:

  • All filler words (um, uh, like, you know, basically, so, right)
  • False starts and restarts
  • Pauses longer than 1 second
  • Repetitions
  • Off-topic tangents
  • "That's a great question" type filler responses
  • Excessive laughter/reactions (keep some for naturalness)

KEEP:

  • Core content and insights
  • Natural transitions
  • Important reactions that add context

TRANSCRIPT: {segments}

Return JSON array of segments to KEEP with cleaned text.

Light Cleanup (Preserve Natural Feel)

Lightly clean this transcript while preserving natural speech patterns.

ONLY REMOVE:

  • "Um" and "uh" when standalone (not part of thinking pause)
  • Obvious mistakes followed by corrections
  • Technical issues (coughs, phone rings, etc.)

PRESERVE:

  • Natural "like" and "you know" that add personality
  • Thinking pauses that feel authentic
  • Personality quirks

TRANSCRIPT: {segments}

Return JSON array of segments to KEEP.

Transcript Format Reference

Whisper JSON Output

{ "text": "Full transcript text...", "segments": [ { "id": 0, "start": 0.0, "end": 2.5, "text": " Welcome to this video.", "tokens": [50364, 5765, ...], "temperature": 0.0, "avg_logprob": -0.25, "compression_ratio": 1.2, "no_speech_prob": 0.01 }, { "id": 1, "start": 2.5, "end": 5.8, "text": " Um, so today we're going to...", ... } ], "language": "en" }

Keep Segments Format (for FFmpeg)

[ { "start": 0.0, "end": 2.5, "text": "Welcome to this video." }, { "start": 3.2, "end": 5.8, "text": "Today we're going to..." } ]

Advanced: Word-Level Timestamps

For precise filler word removal, use word-level timestamps:

Whisper with word timestamps

whisper video.mp4 --model medium --word_timestamps True --output_format json

This gives you:

{ "segments": [ { "start": 0.0, "end": 2.5, "text": "Um welcome to this video", "words": [ { "word": "Um", "start": 0.0, "end": 0.3 }, { "word": "welcome", "start": 0.5, "end": 0.9 }, { "word": "to", "start": 0.9, "end": 1.0 }, { "word": "this", "start": 1.0, "end": 1.2 }, { "word": "video", "start": 1.2, "end": 1.6 } ] } ] }

Now you can cut precisely around "Um" (0.0-0.3) and keep "welcome to this video" (0.5-1.6).

Troubleshooting

Frozen Frames at Cut Points (MOST COMMON)

Cause: Using -c copy which can only cut at keyframes.

Solution: Always re-encode with -c:v libx264 -crf 18 (see examples above).

Audio/Video Sync Issues

Add these flags when extracting segments:

ffmpeg -ss 10 -i video.mp4 -t 5
-c:v libx264 -crf 18
-c:a aac -b:a 192k
-avoid_negative_ts make_zero \ # Fix negative timestamps -async 1 \ # Sync audio to video segment.mp4

Cuts Sound Abrupt

Add audio fade in/out to each segment:

ffmpeg -ss 10 -i video.mp4 -t 5
-c:v libx264 -crf 18
-af "afade=t=in:st=0:d=0.05,afade=t=out:st=4.95:d=0.05"
-c:a aac segment.mp4

Large Files Take Forever

  • Use -preset fast or -preset veryfast (trades quality for speed)

  • Extract audio first for transcription (much smaller)

  • Use Whisper API instead of local model

  • Process in parallel (multiple segments at once)

Faster encoding (slightly lower quality)

ffmpeg ... -preset veryfast -crf 20 ...

Even faster for previews

ffmpeg ... -preset ultrafast -crf 23 ...

Whisper Misses Words

  • Use --model large for better accuracy

  • Use --language en to force English

  • Normalize audio first:

ffmpeg -i video.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11" -c:v copy normalized.mp4

File Size Too Large After Re-encoding

Increase CRF value (higher = smaller file, lower quality):

Original quality (large)

-crf 18

Good quality (medium)

-crf 22

Acceptable quality (small)

-crf 26

Integration with OpenCode

When using this skill in OpenCode:

Extract audio (faster transcription):

ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 temp_audio.mp3 -y

Transcribe with Whisper:

whisper temp_audio.mp3 --model medium --output_format json --output_dir ./

Read transcript.json and analyze segments

Identify segments to KEEP based on:

  • Removing filler words (um, uh, like, you know)

  • Removing long pauses (>1.5s gaps)

  • Removing false starts and repetitions

  • For "shorts style": Keep only hook + key points + CTA

Re-encode and concatenate (MUST re-encode, never -c copy):

Use the Python script above with crf=18 for quality

Generate subtitles for final video:

whisper output.mp4 --model medium --output_format srt

Report results with before/after stats

Quality Settings Reference

Use Case CRF Preset Notes

Archive/Master 15-17 slow Near lossless, large files

YouTube/Vimeo 18-20 medium High quality, recommended

Social Media 21-23 fast Good quality, smaller

Preview/Draft 24-28 veryfast Quick renders

Anti-Patterns (DO NOT DO)

WRONG: -c copy causes freeze frames

ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4

WRONG: -to instead of -t with -ss before -i

ffmpeg -ss 10 -i video.mp4 -to 15 ... # -to is absolute, not relative

WRONG: Missing timestamp fix flags

ffmpeg ... -c:v libx264 ... # Missing -avoid_negative_ts

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

debug prod issues

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

linkedin-post

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

self-improve

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

skill-reinforcement

No summary provided by upstream source.

Repository SourceNeeds Review