viral-video-replicator

视频逆向复刻 — 分析参考视频(FFmpeg帧提取+Vision LLM) + 生成复刻Seedance 2.0 Prompt + 4种素材替换模式。支持单个和批量。Use when: '复刻这个视频', '分析爆款视频', 'replicate this video', '视频逆向', '反编译视频', '批量分析视频'. Do NOT use for creating from scratch — use fashion-video-creator instead.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "viral-video-replicator" with this command: npx skills add dingtom336-gif/viral-video-replicator

Skill: viral-video-replicator

Overview

Reverse-engineer reference videos (e.g., competitor viral content) into replicable Seedance 2.0 prompts. The pipeline: FFmpeg frame extraction -> contact sheet grids -> audio extraction + ASR transcription -> Vision LLM structured analysis -> Seedance prompt assembly with optional material replacement (face/body/clothing). Supports single and batch modes.

When to Activate

User query contains any of:

  • "视频复刻", "视频逆向", "反编译视频", "复刻爆款"
  • "分析这个视频", "replicate this video", "video analysis"
  • "批量分析", "批量复刻"
  • "这个视频怎么拍的", "帮我分析一下这个爆款", "我想拍一个类似的视频"
  • "reverse engineer this video", "analyze this fashion video"

Do NOT activate for:

  • Creating fashion videos from scratch (no reference video) -> use fashion-video-creator
  • "帮我做个穿搭视频", "生成模特图" -> use fashion-video-creator
  • Pure video editing / trimming -> not applicable
  • Non-fashion video analysis -> not applicable

Prerequisites

Local tools (REQUIRED):

# macOS
brew install ffmpeg

# Linux
apt install ffmpeg

# Verify
ffmpeg -version && ffprobe -version

Cloud APIs (collected via clarification):

REQUIRED: ARK_API_KEY + ARK_VISION_MODEL (Vision LLM for frame analysis)
CONDITIONAL: ASR_ACCESS_TOKEN (if video has dialogue)
CONDITIONAL: TOS credentials (if ASR is needed — audio transfers through TOS)

Clarification Flow

Phase 1: API Key Acquisition

Ask IN ORDER. Use plain language — explain WHY each service is needed.

Q1: Vision Analysis (REQUIRED)

"分析视频内容需要一个能'看懂图片'的 AI 模型。 它会看视频的截图,识别出人物长相、服装细节、场景布局、动作时间轴。 你有火山方舟的账号和 API Key 吗?还需要视觉模型的ID。"

If no API key -> STOP. Guide user to 火山方舟. Do NOT proceed.

Q2: Speech Transcription (CONDITIONAL)

"参考视频里的人有说话吗? 如果有对话,需要用语音转文字来提取台词 — 这样复刻出的视频才能有完整的对白内容。 没有对话的纯画面视频可以跳过这步。"

If yes -> ask for ASR_ACCESS_TOKEN.

Q3: Audio Storage (CONDITIONAL — only if Q2 = yes)

"语音转文字需要通过云存储传输音频文件。 需要火山引擎对象存储(TOS)的 4 个信息:Access Key, Secret Key, Bucket, Region。"

If user has ASR but no TOS -> warn: "没有 TOS 则 ASR 无法工作,等同于没有语音转录。"

Phase 2: Mandatory Recommendations

MUST show. Each item has WHY explanation:

============================================================
API Configuration — Mandatory Recommendations
============================================================

[REQUIRED] Vision model: doubao-seed-1-6-vision-250815 or newer
  WHY: Older models cannot distinguish clothing fabric textures
  (acetate vs chiffon), stitching details (overlocked vs raw edge),
  or fit nuances (slim vs A-line). Analysis quality drops ~60%.

[REQUIRED] If video has dialogue: configure BOTH ASR + TOS
  WHY: Without ASR, all spoken content is lost. The generated prompt
  will only contain visual descriptions. Video fidelity drops from
  ~90% to ~50% because dialogue drives 40%+ of viewer engagement.
  TOS is the audio transfer pipeline — no TOS means no ASR.

[REQUIRED] Video resolution: 720p or higher
  WHY: Frames are extracted at 360x640 thumbnails. Source below 480p
  means thumbnails are upscaled garbage — clothing patterns and
  textures become unrecognizable blobs.

[RECOMMENDED] Exact mode for same-category replacement
  WHY: "exact" does nested structured analysis (10 fields with typed
  subobjects) — precision matters when replacing one dress with another.
  "rewrite" does flat analysis (10 string fields) — better for
  extracting viral logic across different product categories.
============================================================

Phase 3: Mode Selection

"你要分析几个视频?单个还是批量?"

Q5: Replicate Mode (per video if batch)

"你想怎么复刻?

  • 精确复刻: 逐帧分析每个细节,尽可能1:1还原
  • 提取改写: 提取爆款节奏和逻辑,用新方式重新演绎"

Q6: Material Replacement (per video if batch)

"要替换视频中的哪些元素?

  • 不换(纯复刻)
  • 换人脸/身材(上传模特参考图)
  • 换衣服(上传商品图)
  • 都换(上传模特图 + 商品图)"

Batch-Specific Recommendations

============================================================
Batch Mode — Additional Recommendations
============================================================

[REQUIRED] ALL videos should be 720p+
  WHY: One low-res video doesn't just fail for itself — it wastes
  API costs on a Vision LLM call that returns unusable analysis.

[RECOMMENDED] Pre-sort by replicate mode
  WHY: exact mode takes 2-3 min/video (nested analysis), rewrite
  takes 1-2 min/video (flat analysis). Grouping avoids context switches.

[WARNING] Each video runs the FULL pipeline independently.
  N videos = approximately N * 2-3 minutes. Plan accordingly.
============================================================

Four Replacement Modes

ModeWhat User Uploads@image Tags in PromptWhat Gets Replaced
cloneNothingNone (pure text)Nothing — exact replication
face_swapFace/body reference@image1 = face refPerson replaced, clothing preserved
outfit_swapGarment product image@image1 = garmentClothing replaced, person preserved
full_swapGarment + face reference@image1 = garment, @image2 = face refBoth replaced

Mode auto-determination:

has_person_ref AND has_garment_ref -> full_swap
has_garment_ref only -> outfit_swap
has_person_ref only -> face_swap
neither -> clone

Core Workflow

Step 0: Environment Check (mandatory, never skip)

ffmpeg -version && ffprobe -version
  • Returns version -> proceed to Step 1
  • command not found -> guide install (brew/apt/choco). Still fails after install -> Soft fallback: Ask user: "FFmpeg 不可用,你能手动提供视频截图和音频文件吗?" If user provides frames manually -> skip FFmpeg steps, proceed from Step 4 (Vision analysis) with user-provided images. Quality warning (MUST show to user): "手动截图模式下分析质量会显著降低:无精确时间戳标注、无均匀3fps采样、帧数可能不足导致动作时间轴不准确。建议安装 FFmpeg 以获得最佳效果。" If user cannot provide frames -> STOP. FFmpeg is required for automated extraction.

Step 0b: Verify API Key (before reaching Step 4)

Validate ARK_API_KEY early to avoid wasting FFmpeg processing time on an invalid key:

If bash/Python available:

resp = httpx.get(f"{ARK_API_BASE}/api/v3/models",
                 headers={"Authorization": f"Bearer {ARK_API_KEY}"}, timeout=10)
  • 200 -> proceed
  • 401/403 -> STOP. Key invalid. Fix before continuing.

If no code execution: Trust user-provided key, validate on first Vision API call.

Single Mode

Step 1: Collect API keys + mode + replacement materials
Step 2: Extract frame grids (3fps) + extract audio — PARALLEL via asyncio.gather()
        (Both are FFmpeg subprocesses launched concurrently in Python, not LLM-level parallelism)
        Read references/frame-extraction.md for FFmpeg specs
Step 3: Upload audio to TOS -> ASR transcription
        Read references/asr-pipeline.md for protocol
Step 4: Vision LLM analysis (grids + transcript -> structured JSON)
        Read references/vision-analysis.md for exact vs rewrite schemas
Step 5: Determine replacement mode from uploaded materials
Step 6: Assemble Seedance 2.0 prompt
        Read references/reverse-prompt.md for 4-mode assembly
Step 7: Generate mode-specific SOP
Step 8: Validate output (see below)
Step 9: Return: prompt + analysis + transcript + SOP + replacement summary

Batch Mode

Step 0: Verify FFmpeg
Step 1: Collect API keys + video count + per-video configs
Step 2: For each video (sequential):
  a. Extract frame grids + audio (parallel)
  b. TOS upload -> ASR transcription
  c. Vision LLM analysis
  d. Determine replacement mode
  e. Assemble prompt
  f. Generate SOP
  g. Validate this video's output
  h. Mark: completed / failed
Step 3: Return all results with progress summary

Progress: queued -> processing -> completed/failed
Partial success: batch completes even if some videos fail.

Output Validation (mandatory, never skip)

Before delivering results, verify ALL:

  • Analysis JSON is valid and contains all required fields?
  • Prompt correctly uses @image tags matching the replacement mode?
  • If clone mode: prompt has NO @image references (pure text)?
  • If outfit_swap/full_swap: prompt includes "Do not alter clothing pattern, color, texture or style"?
  • If has_speech: dialogue content is present in prompt (not empty)?
  • SOP upload instructions match the number of images for this mode?
  • Replacement summary correctly lists what was preserved vs replaced?

Any NO -> fix before delivering. Do NOT send unvalidated output.

Error Handling

FailureDetectionAction
FFmpeg not installedcommand not foundSTOP. Provide install command. Do NOT proceed.
No API keyARK_API_KEY emptySTOP. Guide user to 火山方舟. Do NOT proceed.
Vision model error4xx/5xx from APIReport error with model ID used. Suggest checking model availability.
Vision returns invalid JSONJSON parse failsRetry once with same grids. Still fails -> report raw response for debugging.
Frame extraction failsFFmpeg non-zero exitCheck video format. Try re-encoding. Report if still fails.
No audio trackextract_audio returns NoneSkip ASR. Proceed with visual-only analysis. Note in output: "No audio detected."
TOS upload failsUpload exception after 2 retriesSkip ASR. Proceed visual-only. Warn: "Audio transcription unavailable — dialogue will be missing."
ASR timeoutNo result after 120sSkip transcript. Proceed visual-only. Warn: "Speech transcription timed out."
ASR silent audioStatus 20000003Normal — video has no speech. Proceed with visual-only.
Video too large>200MBReject immediately. Ask user to compress or trim.
Batch video failsException during pipelineMark failed with error. Continue remaining. Report partial results.

Degraded Modes (graceful degradation chain)

Failure PointDegraded ModeWhat User Still GetsQuality Impact
ASR fails (TOS/timeout)Visual-only analysisPrompt with visual descriptions, no dialogue~50% fidelity — all spoken content lost
Vision exact mode failsAuto-retry with rewrite modeFlat analysis (less precise)~70% fidelity — loses nested structure (clothing/scene subfields)
Vision rewrite also failsReturn raw materialsFrame grids + transcript for manual analysis~20% — no automated analysis, user must write prompt manually
Seedance prompt assembly failsReturn analysis onlyAnalysis JSON + transcript~30% — user has data but no ready-to-use prompt
FFmpeg unavailable (user provides screenshots)Manual frame modeAnalysis from user-provided images~40% — no timestamps, uneven sampling, incomplete frame coverage

Always prefer delivering partial results over delivering nothing. Every degraded output MUST clearly state: (1) what is missing, (2) why, and (3) the estimated quality impact.

See references/fallbacks.md for detailed recovery procedures per failure case.

Usage Example

Input: "帮我复刻这个爆款视频,换成我的衣服" + uploaded video (15s, 720p) + uploaded garment image

Resolved: mode=exact, replacement=outfit_swap (garment_ref provided, no face_ref)

Output 1 — Structured Analysis:

{
  "person": {
    "gender": "female", "age_range": "22-26",
    "face": "鹅蛋脸,大眼睛,双眼皮",
    "skin_tone": "白皙", "hair": "黑色长直发,中分,自然垂落",
    "build": "纤细高挑", "makeup": "淡妆,裸色唇彩"
  },
  "clothing": {
    "type": "V领碎花连衣裙", "color": "奶油白底+粉色碎花",
    "material_look": "轻薄飘逸雪纺", "neckline": "V领",
    "fit": "A字收腰", "length": "及膝",
    "details": "腰部抽绳系带,裙摆荷叶边"
  },
  "scene": {"location": "现代公寓客厅", "lighting_source": "右侧落地窗自然光"},
  "actions": "0-2s: 正面微笑打招呼;2-5s: 右手拉起裙摆展示面料;5-8s: 小幅转身展示裙摆飘动;8-12s: 右手翻开裙子内侧展示车线;12-15s: 右手捏腰部展示松紧",
  "dialogue": "姐妹们你们快看...(右手拉起裙摆)这个面料是醋酸缎面的...滑滑的凉凉的..."
}

Output 2 — Seedance Prompt (outfit_swap):

一位鹅蛋脸、白皙肤色、黑色长直发中分自然垂落、纤细高挑身材、淡妆的年轻女性,穿着@图片1中的服装。在现代公寓客厅中,右侧落地窗自然光。她的动作:0-2s: 正面微笑打招呼;2-5s: 右手拉起衣角展示面料...对着镜头说:「姐妹们你们快看...这个面料...滑滑的凉凉的...你们猜多少钱?不到两百!超显腿长,闭眼入。」语气自然亲切,像在跟闺蜜视频通话。Do not alter clothing pattern, color, texture or style. 手持vlog镜头感,竖屏9:16。

Output 3 — Transcript: "姐妹们你们快看...这个面料是醋酸缎面的..."

Output 4 — SOP: outfit_swap mode, 1 image upload (@图片1=garment)

Output 5 — Replacement Summary: garment_replaced=true, original_preserved=[face, body, scene, actions, dialogue, camera]

Domain Knowledge Role Declaration

The reference files contain FFmpeg specs, ASR protocols, Vision prompts, and prompt assembly templates. Their role is to assist pipeline execution — providing exact API formats, analysis schemas, and assembly rules. They do NOT replace the execution workflow. Never output reference content directly as the final answer. Always execute: extract frames -> transcribe -> analyze -> assemble -> validate -> deliver.

References

FilePurposeWhen to read
references/frame-extraction.mdFFmpeg filter chain, grid stitching, audio extraction specsStep 2: extracting frames and audio
references/asr-pipeline.mdTOS upload protocol, Seed-ASR-2.0 submit/poll APIStep 3: transcribing audio
references/vision-analysis.mdVision LLM prompts for exact and rewrite modes, output schemasStep 4: analyzing video
references/reverse-prompt.md4-mode prompt assembly, clothing generalization map, SOP templatesStep 6-7: building prompt and SOP
references/fallbacks.md8 failure cases with recovery procedures and degradation chainOn any error during Steps 2-8

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

AIWolfPK - AI狼人杀

四个AI互相猜疑,你坐着看戏。每局30秒,到底谁是狼? Four AIs play Werewolf while you watch. 30s per round. Spot the wolf before they do.

Registry SourceRecently Updated
General

Project Analyzer

Analyze any project directory and produce a detailed report covering what the project does, its tech stack, folder structure, entry points, how to run it, an...

Registry SourceRecently Updated
General

Thought-Retriever

提炼对话回答中的核心洞察为高置信度知识晶体,存储于本体驱动记忆系统的自我进化与复用。

Registry SourceRecently Updated
General

Miaoji Bid Guard Pro

亚马逊广告护城河Pro版,90天ROI预测+多活动协同+季节性出价+关键词攻防矩阵。 从单次调价建议升级为完整的广告战役规划。基础功能可使用 miaoji-bid-guard 免费版。

Registry SourceRecently Updated