YouTube Podcaster
This skill enables the automated conversion of YouTube videos into multi-host AI podcasts. It manages transcription, script generation via Gemini, and audio synthesis via OpenAI locally.
Security Setup
For maximum security, the backend server binds strictly to 127.0.0.1. It is not accessible from your local network or the internet.
- Install Dependencies: You must run the install command once before the first use. Say:
Run the npm install command for the youtube-podcaster skill. - Credentials: Place your Gemini API Key and OpenAI API Key in the
.envfile within the skill folder (skills/youtube-podcaster/.env) using the variable namesGEMINI_API_KEYandOPENAI_API_KEY. - Execution: Start the server with
npm startor by instructing the agent:Start the local server for the youtube-podcaster skill.
Usage
Once the server is running, say:
Create a podcast for the video https://www.youtube.com/watch?v=<video_id> using the youtube-podcaster skill
The skill orchestrates three local API calls to localhost:7860:
- Transcription: Extracts text via the YouTube transcript API.
- Drafting: Uses Gemini to create a natural dialogue script.
- Synthesis: Uses OpenAI TTS (tts-1) and FFmpeg to generate a gapless
.m4afile.
Safe Cleanup
When you are finished using the studio, shut down the background process to free up system resources. Do not use generic kill commands. Instead, instruct the agent to use the tracked process ID:
Stop the youtube-podcaster server process
(The agent will execute kill $(cat .podcaster.pid) or pkill -f "node index.js" to target the specific process safely).
Storage & File Outputs
Files are saved to downloads/<session_id>/ inside the skill directory. The server includes an hourly garbage collector that automatically deletes inactive sessions.
- Audio:
podcast.m4a - Captions:
podcast.vtt - Scripts:
script.txtandoriginal.txt
Source Code
The source code is available at: https://github.com/kaudata/youtube-podcaster