YouTube Transcript (Captions-Only)
This skill extracts transcripts from existing YouTube captions.
Primary behavior
- Prefer manual subtitles when available.
- Fall back to auto-generated captions.
- Output either:
- JSON segments (default) or
- plain text (
--text)
- Cache results locally in SQLite for speed.
Reliability behavior
- If YouTube blocks anonymous access (bot-check), provide cookies.txt.
- If
yt-dlpreports no captions for a video, the script tries a fallback:- YouTube’s transcript panel (youtubei
get_transcript) when accessible
- YouTube’s transcript panel (youtubei
This published version intentionally does not call third-party transcript providers.
Privacy note: This published version only contacts YouTube directly (via yt-dlp and the transcript panel fallback). It does not send video IDs/URLs to third-party transcript providers.
Cookies: Cookies are treated as secrets.
- The script supports
--cookies/YT_TRANSCRIPT_COOKIES, but does not auto-load cookies from inside the skill directory. - Store cookies under
~/.config/yt-transcript/.
Path safety: This skill restricts --cookies and --cache paths to approved directories.
- cookies allowed under:
~/.config/yt-transcript/ - cache allowed under:
{baseDir}/cache/and~/.config/yt-transcript/
How to run
Script path:
{baseDir}/scripts/yt_transcript.py
Typical usage:
python3 {baseDir}/scripts/yt_transcript.py <youtube_url_or_id>python3 {baseDir}/scripts/yt_transcript.py <url> --lang enpython3 {baseDir}/scripts/yt_transcript.py <url> --textpython3 {baseDir}/scripts/yt_transcript.py <url> --no-ts
Cookies (optional, but often required on VPS IPs):
python3 {baseDir}/scripts/yt_transcript.py <url> --cookies /path/to/youtube-cookies.txt- or set env var:
YT_TRANSCRIPT_COOKIES=/path/to/youtube-cookies.txt
Publishing safety note: Cookies are optional, so YT_TRANSCRIPT_COOKIES is intentionally not required by skill metadata. Only set it if you need authenticated access.
Best practice: store cookies outside the skill folder (so you never accidentally publish them), e.g. ~/.config/yt-transcript/youtube-cookies.txt, and point to it via --cookies or YT_TRANSCRIPT_COOKIES.
What the script returns
JSON mode (default)
A JSON object:
video_id: 11-char idlang: chosen languagesource:manual|auto|panelsegments: list of{ start, duration, text }(or text-only when--no-ts)
Text mode (--text)
A newline-separated transcript.
- By default timestamps are included as
[12.34s]. - Use
--no-tsto output only the text lines.
Caching
Default cache DB:
{baseDir}/cache/transcripts.sqlite
Cache key includes:
video_id,lang,source,include_timestamp,format
Cookie handling (important)
- Cookies must be in Netscape cookies.txt format.
- Treat cookies as secrets.
- Never commit / publish cookies to ClawHub.
Recommended local path (ignored by git/publish):
{baseDir}/cache/youtube-cookies.txt(chmod 600)
Notes (safety + reliability)
- Only accept a YouTube URL or an 11-character video ID.
- Do not forward arbitrary user-provided flags into the command.
- If
yt-dlpis missing, instruct the user to install it (recommended):- install pipx
pipx install yt-dlp- ensure
yt-dlpis on PATH