video-narration

Generate narration for silent screen-recording videos. Extracts key frames, analyzes on-screen content, writes a presentation-style voiceover script, synthesizes natural-sounding speech with Microsoft Edge neural TTS, and merges the audio onto the original video. Outputs a narrated video and a companion voiceover script.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "video-narration" with this command: npx skills add ryanzhang-oss/video-auto-narration

Video Narration Skill

You generate professional voiceover narration for silent screen-recording demo videos.

Workflow

1. Analyze the Video

Extract key frames and understand what is happening on screen:

# Extract one frame every 5 seconds
./scripts/extract-frames.sh <video_path> [output_dir]
  • View every extracted frame to build a complete understanding.
  • Identify the narrative arc: what is the setup, the action, the key moments, and the result.

2. Write the Voiceover Script

Write a presentation-style narration — not a dry description. Follow this structure:

SectionPurpose
ContextTell the audience what they're about to see and why it matters
BackgroundExplain the setup / scenario
Prompt / ActionShow what the user actually did (keep it minimal)
WalkthroughNarrate each major step, highlighting insights and turning points
ResultLand the payoff — what was found, what was fixed, why it's impressive

Guidelines:

  • Use conversational, confident language — like presenting to peers.
  • Use short punchy sentences for emphasis. Vary sentence length for rhythm.
  • Highlight "aha moments" — the points where something surprising or clever happens.
  • Name specific tools, commands, and values when they matter to the story.
  • End with a memorable one-liner that captures the value.
  • Total word count must fit the video duration (~2.5 words/second at normal pace).

Save the script as <video_name>_voiceover.md alongside the video.

3. Generate TTS Audio

Use the generate script to synthesize each narration segment:

./scripts/generate-tts.sh <script_sections_file> <output_dir> [voice] [rate]

Or generate directly via Python with edge-tts:

  • Voice: en-US-GuyNeural (natural male) or en-US-AvaNeural (natural female)
  • Rate: Adjust between +0% and +10% to fit the video duration
  • Generate each section as a separate audio segment for precise timing control
  • Concatenate segments with ~0.4s silence gaps between them
  • Verify total duration matches the video (±2 seconds)

4. Merge Audio onto Video

./scripts/merge-audio.sh <video_path> <narration_audio> [output_path]
  • Preserves the original video codec (copy, no re-encode)
  • Encodes audio as AAC at 192kbps
  • Uses -shortest to match the shorter of video/audio
  • Verify the output has both video and audio streams

Timing Strategy

  1. Generate all segments, measure each duration
  2. Sum total speech time vs video duration
  3. If speech is >105% of video: tighten the script or increase rate
  4. If speech is <85% of video: the pacing may feel rushed — add detail or slow the rate
  5. Concatenate with 0.3–0.5s gaps, target total within ±2s of video

Prerequisites

Install dependencies if not present:

pip3 install edge-tts   # Microsoft neural TTS (free, no API key)
brew install ffmpeg      # or apt-get install ffmpeg

Output

The skill produces:

  1. <name> (with narration).mov — the video with baked-in voiceover
  2. <name>_voiceover.md — the timestamped script for reference or re-recording

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

GigaChat (Sber AI) Proxy

Integrate GigaChat (Sber AI) with OpenClaw via gpt2giga proxy

Registry SourceRecently Updated
3600smvlx
General

TencentCloud Video Face Fusion

通过提取两张人脸核心特征并实现自然融合,支持多种风格适配,提升创意互动性和内容传播力,广泛应用于创意营销、娱乐互动和社交分享场景。

Registry SourceRecently Updated
General

TencentCloud Image Face Fusion

图片人脸融合(专业版)为同步接口,支持自定义美颜、人脸增强、牙齿增强、拉脸等参数,最高支持8K分辨率,有多个模型类型供选择。

Registry SourceRecently Updated
General

YoudaoNote News

有道云笔记资讯推送:基于收藏笔记分析关注话题,推送最新相关资讯。支持对话触发与每日定时推送(如早上9点)。触发词:资讯推送、设置资讯推送、生成资讯推送。

Registry SourceRecently Updated
1.5K1lephix