douyin-transcribe

Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local files, and image notes. Trigger when user sends a Douyin link and asks for transcription, summary, or analysis.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "douyin-transcribe" with this command: npx skills add don068589/douyin-video-transcribe

Douyin Transcribe - Video Transcription Suite

A complete solution for transcribing Douyin (抖音/TikTok China) videos. Extracts audio, transcribes speech to text, and generates structured summaries.

Version History

VersionChanges
2.0.0Modular architecture, improved workflow, browser DOM extraction
1.0.0Initial release, basic transcription

Architecture

\
User Input (Douyin Link/File) │ ▼ ┌─────────────────────────────────────────┐ │ Workflow Orchestrator │ ├─────────────────────────────────────────┤ │ Step 1: Fetcher → Get video file │ │ Step 2: Transcriber → Extract & convert│ │ Step 3: Analyzer → Structure output │ │ Step 4: Output → Save results │ └─────────────────────────────────────────┘ \\

Core Features

  • Video Fetching: Browser-based DOM extraction for CDN URLs
  • Audio Extraction: ffmpeg-powered audio conversion
  • Speech-to-Text: Whisper ASR with multiple model options
  • Content Analysis: Auto-structured transcripts with key points
  • Multi-format Support: Video links, local files, image notes

Prerequisites

ToolPurposeInstall
curlDownload filesBuilt-in (Windows: \curl.exe)
ffmpegAudio extraction/merge\winget install Gyan.FFmpeg\
WhisperTranscription\pip install openai-whisper\ or Docker
BrowserVideo extractionOpenClaw profile required

Docker Whisper (Recommended): \\ash docker run -d -p 9000:9000 --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest \\

Workflow

Step 0: Input Classification

Input TypeDetectionAction
Video link (/video/)URL patternFull workflow
Image note (/note/)URL patternSnapshot only
Local video fileFile pathStart from Step 2
Text inputPlain textStart from Step 3

Step 1: Fetch Video

1.1 Resolve Short URL

\\ash

Windows PowerShell

curl.exe -sL -o NUL -w "%{url_effective}" "https://v.douyin.com/xxx/"

macOS/Linux

curl -sL -o /dev/null -w '%{url_effective}' "https://v.douyin.com/xxx/" \\

Output: \https://www.douyin.com/video/7616020798351871284\

1.2 Open Video Page

\
browser(action='open', profile='openclaw', url='https://www.douyin.com/video/{VIDEO_ID}') \\

Wait 10-15 seconds for page to load completely.

1.3 Extract Video URL (Browser DOM Method)

\\javascript browser(action='act', targetId='PAGE_ID', request={ "kind": "evaluate", "fn": "(() => { const entries = performance.getEntriesByType('resource'); const videoEntries = entries.filter(e => { const name = e.name.toLowerCase(); return name.includes('douyinvod') && (name.includes('.mp4') || name.includes('video')); }); if (videoEntries.length > 0) { const video = videoEntries[videoEntries.length - 1]; return { url: video.name, type: video.name.includes('.mp4') ? 'mp4' : 'dash' }; } return null; })()" }) \\

Important Notes:

  • \ct\ action requires nested
    equest\ object with \kind\ and \ n\
  • Wrong: \rowser(action='act', fn='...')\
  • Correct: \rowser(action='act', request={"kind": "evaluate", "fn": "..."})\

1.4 Download Video

\\ash curl.exe -L -H "Referer: https://www.douyin.com/" -o video.mp4 "<CDN_URL>" \\

Referer header is required, otherwise 403.

Step 2: Transcribe Audio

2.1 Extract Audio

\\ash

For MP4 videos

ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

For DASH videos (need merge)

ffmpeg -i video.mp4 -i audio.mp4 -c copy merged.mp4 -y ffmpeg -i merged.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y \\

Parameters:

  • -ar 16000: 16kHz sample rate (Whisper requirement)
  • -ac 1: Mono channel
  • -c:a pcm_s16le: 16-bit PCM

2.2 Transcribe with Docker Whisper

\\ash curl.exe -X POST "http://localhost:PORT/asr" -F "audio_file=@audio.wav" \\

2.3 Alternative: Local Whisper

\\ash python -m whisper audio.wav --model small --language zh \\

Model Selection:

ModelSize5-min Video (CPU)AccuracyUse Case
tiny75MB~30sFairQuick preview
base142MB~1minGoodDaily use
small466MB~3minBetterRecommended
medium1.5GB~8minBestHigh accuracy

Step 3: Analyze Content

Agent processes transcript and generates:

  1. Fix transcription errors

    • Correct homophones
    • Fix speaker names
    • Remove filler words
  2. Structure content

    • Add paragraph breaks
    • Create sections
  3. Extract key points

    • Main ideas
    • Important quotes
  4. Generate tags

    • 3-5 topic tags

Step 4: Save Output

Transcript Format

\\markdown

{Title}

作者: {Author} 来源: 抖音 日期: {Date} 转录时间: {Transcription Date}


摘要

{Summary}


正文

{Transcript content with paragraphs}


要点

  • {Key point 1}
  • {Key point 2}
  • {Key point 3}

标签

#{tag1} #{tag2} #{tag3} \\

File Naming Convention

\
{VIDEO_ID}-抖音转录.md \\

Troubleshooting

StageIssueSolution
Step 1Short URL failsCheck link completeness, remove share text
Step 1JS returns nullWait 15-20s and retry, increase timeout
Step 1Download 403URL expired, re-fetch from browser
Step 1DASH no audioMerge with \ fmpeg -i video -i audio -c copy\
Step 2ffmpeg not installed\winget install Gyan.FFmpeg\
Step 2Whisper service down\docker start whisper-asr\
Step 2Transcription slow10-min video takes 15-20 min on CPU
Step 2Poor qualityUse larger model (medium)

Image Note Handling

Image notes (/note/) don't need transcription:

\\

  1. browser(action='open', profile='openclaw', url='IMAGE_NOTE_URL')
  2. browser(action='snapshot')
  3. Extract content from snapshot
  4. Save to output directory \\

Edge Cases

  • Article links (/article/): Use browser snapshot, no transcription
  • Douyin AI summary: Extract from page as supplement
  • Other platforms: Use yt-dlp for YouTube/Bilibili
  • Live streams: Not supported

Related Modules

This skill can be extended with standalone modules:

ModulePurpose
douyin-fetcherVideo fetching only
douyin-transcriberAudio transcription only
douyin-analyzerContent analysis only
douyin-orchestratorWorkflow coordination

License

MIT-0 License - Free to use, modify, and redistribute.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

说人话

去AI味写作技能:将AI生成的文案改写成真人风格,适用于小红书、抖音、微信公众号、微博。 触发词:去AI味、润色文案、生成正文、human写作。 适用场景:AI生成初稿后,执行去AI味处理,使文案读起来自然、有人味、不像机器写的。

Registry SourceRecently Updated
General

Interview Prep Coach

Prepare candidates for technical, system-design, behavioral, case-study, and leadership interviews through diagnostic intake, timeline-based prep plans, mock...

Registry SourceRecently Updated
General

Cold Call Script Generator

Generate personalized cold call openers, voicemail scripts, cold email opening lines, and full multi-touch outbound sequences for B2B sales reps, SDRs, and f...

Registry SourceRecently Updated
General

Postzee Skill

Generate AI images/videos and post to 30+ social media platforms with Postzee. Use when the user wants to create AI media, generate images or videos, optimiz...

Registry SourceRecently Updated