HN Podcast Transcribe
Automated pipeline: RSS feed → download audio → Whisper transcription → markdown archive.
Quick Start
List episodes from a feed
python3 scripts/hn_podcast_feed.py --feed roundup --limit 10
python3 scripts/hn_podcast_feed.py --json --limit 50
Feeds: roundup (Hacker News Roundup), recap (Hacker News Recap), highlights (HN Highlights), or pass a custom URL with --url <RSS_URL>.
Download and transcribe a single episode
bash scripts/fetch_and_transcribe.sh <audio_url> ./output
Full archive pipeline (download → transcribe → markdown)
By direct audio URL:
bash scripts/archive_episode.sh "https://media.transistor.fm/ac6c95a2/8dc4e7fe.mp3" ./archive
Batch archive recent episodes (with dedup)
# Archive latest 5 from a feed, skip already-processed
bash scripts/batch_archive.sh roundup 5 ./archive
bash scripts/batch_archive.sh highlights 10 ./archive
Processed episodes are tracked in ./archive/.processed_episodes.json to avoid duplicate work.
Configuration
| Variable | Default | Description |
|---|---|---|
WHISPER_MODEL | medium | Whisper model: tiny, base, small, medium, large, turbo |
WHISPER_LANG | en | Language code for transcription |
Use turbo for speed, large for best accuracy, tiny for quick testing.
Output
Each archived episode produces:
<stem>.md— Markdown with metadata header + full transcript<stem>.mp3— Original audio file
Batch runs maintain .processed_episodes.json for dedup tracking.
Requirements
whisper(pip install openai-whisper)yt-dlp(optional, for non-direct URLs)ffmpeg- Python 3.8+