yt_transcript

Extract YouTube video transcripts from existing captions (manual or auto-generated) using yt-dlp, with optional timestamps and local SQLite caching. Use when the user asks for a YouTube transcript, captions, subtitles, or wants to turn a YouTube link into text for summarization/search.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "yt_transcript" with this command: npx skills add ItzSubhadip/youtube-transcript-yt-dlp

YouTube Transcript (Captions-Only)

This skill extracts transcripts from existing YouTube captions.

Primary behavior

  • Prefer manual subtitles when available.
  • Fall back to auto-generated captions.
  • Output either:
    • JSON segments (default) or
    • plain text (--text)
  • Cache results locally in SQLite for speed.

Reliability behavior

  • If YouTube blocks anonymous access (bot-check), provide cookies.txt.
  • If yt-dlp reports no captions for a video, the script tries a fallback:
    1. YouTube’s transcript panel (youtubei get_transcript) when accessible

This published version intentionally does not call third-party transcript providers.

Privacy note: This published version only contacts YouTube directly (via yt-dlp and the transcript panel fallback). It does not send video IDs/URLs to third-party transcript providers.

Cookies: Cookies are treated as secrets.

  • The script supports --cookies / YT_TRANSCRIPT_COOKIES, but does not auto-load cookies from inside the skill directory.
  • Store cookies under ~/.config/yt-transcript/.

Path safety: This skill restricts --cookies and --cache paths to approved directories.

  • cookies allowed under: ~/.config/yt-transcript/
  • cache allowed under: {baseDir}/cache/ and ~/.config/yt-transcript/

How to run

Script path:

  • {baseDir}/scripts/yt_transcript.py

Typical usage:

  • python3 {baseDir}/scripts/yt_transcript.py <youtube_url_or_id>
  • python3 {baseDir}/scripts/yt_transcript.py <url> --lang en
  • python3 {baseDir}/scripts/yt_transcript.py <url> --text
  • python3 {baseDir}/scripts/yt_transcript.py <url> --no-ts

Cookies (optional, but often required on VPS IPs):

  • python3 {baseDir}/scripts/yt_transcript.py <url> --cookies /path/to/youtube-cookies.txt
  • or set env var: YT_TRANSCRIPT_COOKIES=/path/to/youtube-cookies.txt

Publishing safety note: Cookies are optional, so YT_TRANSCRIPT_COOKIES is intentionally not required by skill metadata. Only set it if you need authenticated access.

Best practice: store cookies outside the skill folder (so you never accidentally publish them), e.g. ~/.config/yt-transcript/youtube-cookies.txt, and point to it via --cookies or YT_TRANSCRIPT_COOKIES.

What the script returns

JSON mode (default)

A JSON object:

  • video_id: 11-char id
  • lang: chosen language
  • source: manual | auto | panel
  • segments: list of { start, duration, text } (or text-only when --no-ts)

Text mode (--text)

A newline-separated transcript.

  • By default timestamps are included as [12.34s].
  • Use --no-ts to output only the text lines.

Caching

Default cache DB:

  • {baseDir}/cache/transcripts.sqlite

Cache key includes:

  • video_id, lang, source, include_timestamp, format

Cookie handling (important)

  • Cookies must be in Netscape cookies.txt format.
  • Treat cookies as secrets.
  • Never commit / publish cookies to ClawHub.

Recommended local path (ignored by git/publish):

  • {baseDir}/cache/youtube-cookies.txt (chmod 600)

Notes (safety + reliability)

  • Only accept a YouTube URL or an 11-character video ID.
  • Do not forward arbitrary user-provided flags into the command.
  • If yt-dlp is missing, instruct the user to install it (recommended):
    • install pipx
    • pipx install yt-dlp
    • ensure yt-dlp is on PATH

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

SERP Outline Extractor

Turn a target keyword or query into a search-informed content outline with likely subtopics, questions, and comparison angles. Useful for SEO briefs, blog pl...

Registry SourceRecently Updated
General

Multi-Model Response Comparator

Compare responses from multiple AI models for the same task and summarize differences in quality, style, speed, and likely cost. Best for model selection, ev...

Registry SourceRecently Updated
General

API Pricing Comparator

Compare AI API or model pricing across providers and produce a structured summary for product pages, blog posts, or buyer guides. Works with OpenAI-compatibl...

Registry SourceRecently Updated