🎙️ Faster Whisper GPU
High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration.
✨ Features
- 🚀 GPU Accelerated: Uses NVIDIA CUDA for blazing-fast transcription
- 🔒 100% Local: No data leaves your machine. Complete privacy.
- 💰 Free Forever: No API costs. Run unlimited transcriptions.
- 🌍 Multilingual: Supports 99 languages with automatic detection
- 📁 Multiple Formats: Input: MP3, WAV, FLAC, OGG, M4A. Output: TXT, SRT, JSON
- 🎯 Multiple Models: From tiny (fast) to large-v3 (most accurate)
- 🎬 Subtitle Generation: Create SRT files with word-level timestamps
📋 Requirements
Hardware
- NVIDIA GPU with CUDA support (recommended: 4GB+ VRAM)
- Or CPU-only mode (slower but works on any machine)
Software
- Python 3.8+
- NVIDIA drivers (for GPU support)
- CUDA Toolkit 11.8+ or 12.x
🚀 Quick Start
Installation
# Install dependencies
pip install faster-whisper torch
# Verify GPU is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
Basic Usage
# Transcribe an audio file (auto-detects GPU)
python transcribe.py audio.mp3
# Specify language explicitly
python transcribe.py audio.mp3 --language pt
# Output as SRT subtitles
python transcribe.py audio.mp3 --format srt --output subtitles.srt
# Use larger model for better accuracy
python transcribe.py audio.mp3 --model large-v3
🔧 Advanced Usage
Command Line Options
python transcribe.py <audio_file> [options]
Options:
--model {tiny,base,small,medium,large-v1,large-v2,large-v3}
Model size to use (default: base)
--language LANG Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified.
--format {txt,srt,json,vtt}
Output format (default: txt)
--output FILE Output file path (default: stdout)
--device {cuda,cpu} Device to use (default: cuda if available)
--compute_type {int8,int8_float16,int16,float16,float32}
Computation precision (default: float16)
--task {transcribe,translate}
Task: transcribe or translate to English (default: transcribe)
--vad_filter Enable voice activity detection filter
--vad_parameters MIN_DURATION_ON,MIN_DURATION_OFF
VAD parameters as comma-separated values
--condition_on_previous_text
Condition on previous text (default: True)
--initial_prompt PROMPT
Initial prompt to guide transcription
--word_timestamps Include word-level timestamps (for SRT/JSON)
--hotwords WORDS Comma-separated hotwords to boost recognition
Examples
Portuguese Transcription with SRT Output
python transcribe.py meeting.mp3 --language pt --format srt --output meeting.srt
English Translation from Any Language
python transcribe.py japanese_audio.mp3 --task translate --format txt
High-Accuracy Mode with Large Model
python transcribe.py podcast.mp3 --model large-v3 --vad_filter --word_timestamps
CPU-Only Mode (no GPU)
python transcribe.py audio.mp3 --device cpu --compute_type int8
🐍 Python API
from faster_whisper import WhisperModel
# Load model
model = WhisperModel("base", device="cuda", compute_type="float16")
# Transcribe
segments, info = model.transcribe("audio.mp3", language="pt")
print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
📊 Model Sizes & VRAM Requirements
| Model | Parameters | VRAM Required | Relative Speed | Accuracy |
|---|---|---|---|---|
| tiny | 39 M | ~1 GB | ~32x | Basic |
| base | 74 M | ~1 GB | ~16x | Good |
| small | 244 M | ~2 GB | ~6x | Better |
| medium | 769 M | ~5 GB | ~2x | Great |
| large-v3 | 1550 M | ~10 GB | 1x | Best |
Benchmarks measured on NVIDIA RTX 4090
🔍 Supported Languages
Faster Whisper supports 99 languages including:
- Portuguese (
pt) - English (
en) - Spanish (
es) - French (
fr) - German (
de) - Italian (
it) - Japanese (
ja) - Chinese (
zh) - Russian (
ru) - And 90+ more...
🛠️ Troubleshooting
CUDA Out of Memory
# Use smaller model
python transcribe.py audio.mp3 --model tiny
# Or use CPU
python transcribe.py audio.mp3 --device cpu
# Or reduce precision
python transcribe.py audio.mp3 --compute_type int8
Model Download Issues
Models are automatically downloaded on first use to ~/.cache/huggingface/hub/.
If behind a proxy, set:
export HF_HOME=/path/to/custom/cache
Slow Transcription
- Ensure GPU is being used: check
nvidia-smiduring transcription - Use smaller model for faster results
- Enable VAD filter to skip silent parts
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
📜 License
MIT License - See LICENSE for details.
Faster Whisper is developed by SYSTRAN and based on OpenAI's Whisper.
🙏 Acknowledgments
- OpenAI Whisper - Original model
- Faster Whisper - Optimized implementation
- CTranslate2 - Fast inference engine
Made with ❤️ for the OpenClaw community