youtube-voice-summarizer

Transform YouTube videos into podcast-style voice summaries using ElevenLabs TTS

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "youtube-voice-summarizer" with this command: npx skills add Franciscoandsam/youtube-voice-summarizer-elevenlabs

YouTube Voice Summarizer

Transform any YouTube video into a professional voice summary delivered in under 60 seconds.

What It Does

When a user sends a YouTube URL, this skill:

  1. Extracts the video transcript via Supadata
  2. Generates a concise AI summary via OpenRouter/Cerebras
  3. Converts the summary to natural speech via ElevenLabs
  4. Returns an audio file the user can listen to

Requirements

This skill requires a running backend server. Deploy the summarizer service:

git clone https://github.com/Franciscomoney/elevenlabs-moltbot.git
cd elevenlabs-moltbot
npm install
cp .env.example .env
# Add your API keys to .env
npm start

Required API Keys

ServicePurposeGet Key
ElevenLabsText-to-speechhttps://elevenlabs.io
SupadataYouTube transcriptshttps://supadata.ai
OpenRouterAI summarizationhttps://openrouter.ai

How to Use

When user sends a YouTube URL:

Step 1: Start the voice summary job

curl -s -X POST http://127.0.0.1:3050/api/summarize \
  -H "Content-Type: application/json" \
  -d '{"url":"YOUTUBE_URL","length":"short","voice":"podcast"}'

Returns: {"jobId": "job_xxx", "status": "processing"}

Step 2: Poll for completion (wait 3-5 seconds between checks)

curl -s http://127.0.0.1:3050/api/status/JOB_ID

Keep polling until status is "completed".

Step 3: Return the audio to user

When complete, the response includes:

  • result.audioUrl - The MP3 audio URL (send this to the user!)
  • result.teaser - Short hook text about the content
  • result.summary - Full text summary
  • result.keyPoints - Array of key takeaways

Send the user:

  1. The teaser text as a message
  2. The audio URL so they can listen

Voice Options

VoiceStyle
podcastDeep male narrator (default)
newsBritish authoritative
casualFriendly conversational
female_warmWarm female voice

Summary Lengths

LengthDurationBest For
short1-2 minQuick overview
medium3-5 minBalanced detail
detailed5-10 minComprehensive

Example Flow

User: "Summarize this: https://www.youtube.com/watch?v=dQw4w9WgXcQ"

  1. Start job:
curl -s -X POST http://127.0.0.1:3050/api/summarize \
  -H "Content-Type: application/json" \
  -d '{"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ","length":"short","voice":"podcast"}'
  1. Poll status with the returned jobId
  2. When complete, send the audioUrl to the user

Text-Only Summary (No Audio)

For faster, cheaper text-only summaries:

curl -s -X POST http://127.0.0.1:3050/api/quick-summary \
  -H "Content-Type: application/json" \
  -d '{"url":"YOUTUBE_URL","length":"short"}'

Troubleshooting

"Video may not have captions"

  • The video needs subtitles enabled on YouTube
  • Auto-generated captions may take time on new videos

Audio URL not working

  • Ensure BASE_URL in .env is publicly accessible
  • Check firewall allows traffic on port 3050

Cost Per Summary

ServiceCost
Supadata~$0.001
OpenRouter~$0.005-0.02
ElevenLabs~$0.05-0.15
Total~$0.06-0.17

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Elevenlabs Tts

ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7...

Registry SourceRecently Updated
5K6Profile unavailable
General

Autonoannounce

Build, operate, and troubleshoot Autonoannounce local speaker text-to-speech using the queued pipeline (enqueue to worker to ElevenLabs to playback backend)....

Registry SourceRecently Updated
660Profile unavailable
General

Audiomind

Turn any idea into a finished podcast in one command. AudioMind handles ElevenLabs voice narration (29+ voices), AI background music, and server-side audio m...

Registry SourceRecently Updated
4335Profile unavailable
General

text-to-speech

No summary provided by upstream source.

Repository SourceNeeds Review