When to Use
- User wants to create an explainer or tutorial video
- User asks to "explain" something in video form
- User wants narrated content with AI-generated visuals
- User says "explainer video", "解说视频", "tutorial video"
When NOT to Use
- User wants audio-only content without visuals (use
/speechor/podcast) - User wants a podcast-style discussion (use
/podcast) - User wants to generate a standalone image (use
/image-gen) - User wants to read text aloud without video (use
/speech)
Purpose
Generate explainer videos that combine a single narrator's voiceover with AI-generated visuals. Ideal for product introductions, concept explanations, and tutorials. Supports text-only script generation or full text + video output.
Hard Constraints
- Always read config following
shared/config-pattern.mdbefore any interaction - Follow
shared/cli-patterns.mdfor execution modes, error handling, and interaction patterns - Always follow
shared/cli-authentication.mdfor auth checks - Never hardcode speaker IDs — always fetch from the speakers CLI when the user wants to change voice
- Never save files to
~/Downloads/or.listenhub/— save artifacts to the current working directory with friendly topic-based names (seeshared/config-pattern.md§ Artifact Naming) - Explainer uses exactly 1 speaker
- Mode must be
info(for Info style) orstory(for Story style) — neverslides(use/slidesskill instead)
Step -1: CLI Auth Check
Follow shared/config-pattern.md § CLI Auth Check. If the CLI is not installed or the user is not logged in, auto-install and auto-login per shared/cli-authentication.md — never ask the user to run commands manually.
Step 0: Config Setup
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/explainer"
echo '{"outputMode":"inline","language":null,"defaultStyle":null,"defaultSpeakers":{}}' > ".listenhub/explainer/config.json"
CONFIG_PATH=".listenhub/explainer/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/explainer/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/explainer/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Setup Flow (user-initiated reconfigure only)
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (explainer):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认风格:{info / story / 未设置}
默认主播:{speakerName / 使用内置默认}
Then ask:
-
outputMode: Follow
shared/output-mode.md§ Setup Flow Question. -
Language (optional): "默认语言?"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep
null
-
Style (optional): "默认风格?"
- "Info — 信息展示型"
- "Story — 故事叙述型"
- "每次手动选择" → keep
null
After collecting answers, save immediately:
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Interaction Flow
Step 1: Topic / Content
Free text input. Ask the user:
What would you like to explain or introduce?
Accept: topic description, text content, or concept to explain.
Step 2: Language
If config.language is set, pre-fill and show in summary — skip this question.
Otherwise ask:
Question: "What language?"
Options:
- "Chinese (zh)" — Content in Mandarin Chinese
- "English (en)" — Content in English
- "Japanese (ja)" — Content in Japanese
Step 3: Style
If config.defaultStyle is set, pre-fill and show in summary — skip this question.
Otherwise ask:
Question: "What style of explainer?"
Options:
- "Info" — Informational, factual presentation style
- "Story" — Narrative, storytelling approach
Step 4: Speaker Selection
Follow shared/speaker-selection.md:
- If
config.defaultSpeakers.{language}is set → use saved speaker silently - If not set → use built-in default from
shared/speaker-selection.mdfor the language - Show the speaker in the confirmation summary (Step 6) — user can change from there if desired
- Only show the full speaker list if the user explicitly asks to change voice
Speaker query: see shared/cli-speakers.md for listing and filtering speakers.
Only 1 speaker is supported for explainer videos.
Step 5: Output Type
Question: "What output do you want?"
Options:
- "Text script only" — Generate narration script, no video
- "Text + Video" — Generate full explainer video with AI visuals
Step 6: Confirm & Generate
Summarize all choices:
Ready to generate explainer:
Topic: {topic}
Language: {language}
Style: {info/story}
Speaker: {speaker name}
Output: {text only / text + video}
Proceed?
Wait for explicit confirmation before running any CLI command.
Workflow
Run the CLI command with run_in_background: true and timeout: 660000. The CLI blocks until generation completes and returns the final result as JSON:
listenhub explainer create \
--query "{topic}" \
--mode {info|story} \
--lang {en|zh|ja} \
--speaker "{name}" \
--speaker-id "{id}" \
--timeout 600 \
--json
If the command fails (non-zero exit), check stderr for error details. See shared/cli-patterns.md § Error Handling for exit codes and common errors.
Optional flags (add when applicable):
--source-url "{url}"— if the user provided a reference URL--skip-audio— if text-only output (no video)--image-size {2K|4K}— image resolution (default: 2K)--aspect-ratio {16:9|9:16|1:1}— video aspect ratio (default: 16:9)--style "{style}"— visual style for AI-generated images
Tell the user the task is submitted. When notified of completion, parse and present result:
Parse the CLI JSON output for key fields:
EPISODE_ID=$(echo "$RESULT" | jq -r '.episodeId')
AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl // empty')
VIDEO_URL=$(echo "$RESULT" | jq -r '.videoUrl // empty')
CREDITS=$(echo "$RESULT" | jq -r '.credits // empty')
Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
If text-only output:
inline or both: Present the script inline.
Present:
解说脚本已生成!
「{title}」
在线查看:https://listenhub.ai/app/explainer/{episodeId}
download or both: Also save the script file. Generate a topic slug following shared/config-pattern.md § Artifact Naming.
- Save as
{slug}-explainer.mdin cwd (dedup if exists) - Present the save path in addition to the above summary.
If text + video output:
inline or both: Display video URL and audio URL as clickable links.
Present:
解说视频已生成!
视频链接:{videoUrl}
音频链接:{audioUrl}
消耗积分:{credits}
download or both: Also save files. Generate a topic slug following shared/config-pattern.md § Artifact Naming.
- Create
{slug}-explainer/folder (dedup if exists) - Write
script.mdinside - Download audio:
listenhub download "{audioUrl}" -o "{slug}-explainer/audio.mp3" - Present:
已保存到当前目录: {slug}-explainer/ script.md audio.mp3
After Successful Generation
Update config with the choices made this session:
NEW_CONFIG=$(echo "$CONFIG" | jq \
--arg lang "{language}" \
--arg style "{info/story}" \
--arg speakerId "{speakerId}" \
'. + {"language": $lang, "defaultStyle": $style, "defaultSpeakers": (.defaultSpeakers + {($lang): [$speakerId]})}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
Estimated times:
- Text script only: 2-3 minutes
- Text + Video: 5-10 minutes
Resources
- CLI authentication:
shared/cli-authentication.md - CLI patterns:
shared/cli-patterns.md - Speaker query:
shared/cli-speakers.md - Speaker selection guide:
shared/speaker-selection.md - Config pattern:
shared/config-pattern.md - Output mode:
shared/output-mode.md
Composability
- Invokes: speakers CLI (for speaker selection); may invoke
/speechfor voiceover - Invoked by: content-planner (Phase 3)
Example
User: "Create an explainer video introducing Claude Code"
Agent workflow:
- Topic: "Claude Code introduction"
- Ask language → "English"
- Ask style → "Info"
- Use default speaker "Mars" (cozy-man-english)
- Ask output → "Text + Video"
# Run with run_in_background: true, timeout: 660000
listenhub explainer create \
--query "Introduce Claude Code: what it is, key features, and how to get started" \
--mode info \
--lang en \
--speaker "Mars" \
--speaker-id "cozy-man-english" \
--timeout 600 \
--json
Parse result for episodeId, audioUrl, videoUrl, credits, and present to user.