local-tts

Local text-to-speech using MLX and Kokoro model

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "local-tts" with this command: npx skills add krishagel/geoffrey/krishagel-geoffrey-local-tts

Local TTS Skill

Generate high-quality speech audio locally using Apple Silicon MLX acceleration and the Kokoro-82M model. No API keys or recurring costs.

Quick Start

# Generate MP3 from text
uv run --with mlx-audio --with pydub skills/local-tts/scripts/generate_audio.py \
    --text "Hello, this is a test." \
    --output ~/Desktop/test.mp3

# Generate from file
uv run --with mlx-audio --with pydub skills/local-tts/scripts/generate_audio.py \
    --file /tmp/script.txt \
    --voice af_heart \
    --output ~/Desktop/podcast.mp3

# List available voices
uv run --with mlx-audio skills/local-tts/scripts/list_voices.py

Parameters

ParameterRequiredDefaultDescription
--textOne of text/file-Text to convert
--fileOne of text/file-Path to text file
--voiceNoaf_heartVoice preset
--outputYes-Output file path (.mp3, .wav)
--modelNoKokoro-82M-bf16Model to use
--list-voicesNo-Show available voices

Voice Presets

American English Female (prefix: af_)

  • af_heart - Warm, friendly (default)
  • af_bella - Soft, calm
  • af_nova - Clear, professional
  • af_river - Clear, confident
  • af_sarah - Soft, expressive

American English Male (prefix: am_)

  • am_adam - Clear, professional
  • am_echo - Deep, smooth
  • am_liam - Articulate, conversational
  • am_michael - Soft, measured

British English (prefix: bf_, bm_)

  • bf_emma - Clear, refined female
  • bm_daniel - Clear, professional male
  • bm_george - Distinguished male

See references/voices.md for full list.

Output Format

{
  "success": true,
  "file": "/Users/hagelk/Desktop/podcast.mp3",
  "voice": "af_heart",
  "model": "Kokoro-82M-bf16",
  "characters": 9824,
  "chunks": 20,
  "duration_seconds": 612.5,
  "generation_time": 45.2
}

Performance

HardwareSpeedNotes
M3 Pro 36GB~3-4x realtimeFirst run slower (model loading)
M1/M2 Mac Mini 8GB~1.5x realtimeWorks well for briefings
M1/M2 Mac Mini 16GB~2x realtimeComfortable headroom

Technical Details

  • Model: Kokoro-82M-bf16 (~200MB download on first run)
  • Sample rate: 24kHz mono
  • Chunking: Text split at ~400 chars per chunk for quality
  • Concatenation: Chunks joined seamlessly via pydub
  • Formats: MP3, WAV, M4A, OGG

Important Notes

  1. MUST use --with flags - Do not use PEP 723 inline deps. mlx-audio requires uv's cached environment.

  2. First run is slower - Model downloads ~200MB and espeak dependencies initialize.

  3. Model cached at: ~/.cache/huggingface/hub/models--mlx-community--Kokoro-82M-bf16/

Integration with Morning Briefing

The morning-briefing skill uses this for podcast generation:

uv run --with mlx-audio --with pydub skills/local-tts/scripts/generate_audio.py \
    --file /tmp/morning_briefing_podcast.txt \
    --voice af_heart \
    --output ~/Desktop/morning_briefing.mp3

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

morning-briefing

No summary provided by upstream source.

Repository SourceNeeds Review
General

browser-control

No summary provided by upstream source.

Repository SourceNeeds Review
General

omnifocus-manager

No summary provided by upstream source.

Repository SourceNeeds Review
General

personal-strategic-planning

No summary provided by upstream source.

Repository SourceNeeds Review