Text-to-Speech
Convert text to natural speech via inference.sh CLI.
Quick Start
Requires inference.sh CLI (infsh ). Get installation instructions: npx skills add inference-sh/skills@agent-tools
infsh login
Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'
Available Models
Model App ID Best For
DIA TTS infsh/dia-tts
Conversational, expressive
Kokoro TTS infsh/kokoro-tts
Fast, natural
Chatterbox infsh/chatterbox
General purpose
Higgs Audio infsh/higgs-audio
Emotional control
VibeVoice infsh/vibevoice
Podcasts, long-form
Browse All Audio Apps
infsh app list --category audio
Examples
Basic Text-to-Speech
infsh app run infsh/kokoro-tts --input '{"text": "Welcome to our tutorial."}'
Conversational TTS with DIA
infsh app sample infsh/dia-tts --save input.json
Edit input.json:
{
"text": "Hey! How are you doing today? I'm really excited to share this with you.",
"voice": "conversational"
}
infsh app run infsh/dia-tts --input input.json
Long-form Audio (Podcasts)
infsh app sample infsh/vibevoice --save input.json
Edit input.json with your podcast script
infsh app run infsh/vibevoice --input input.json
Expressive Speech with Higgs
infsh app sample infsh/higgs-audio --save input.json
{
"text": "This is absolutely incredible!",
"emotion": "excited"
}
infsh app run infsh/higgs-audio --input input.json
Use Cases
-
Voiceovers: Product demos, explainer videos
-
Audiobooks: Convert text to spoken word
-
Podcasts: Generate podcast episodes
-
Accessibility: Make content accessible
-
IVR: Phone system voice prompts
-
Video Narration: Add narration to videos
Combine with Video
Generate speech, then create a talking head video:
1. Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Your script here"}' > speech.json
2. Use the audio URL with OmniHuman for avatar video
infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "<audio-url-from-step-1>" }'
Related Skills
Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@agent-tools
AI avatars (combine TTS with talking heads)
npx skills add inference-sh/skills@ai-avatar-video
AI music generation
npx skills add inference-sh/skills@ai-music-generation
Speech-to-text (transcription)
npx skills add inference-sh/skills@speech-to-text
Video generation
npx skills add inference-sh/skills@ai-video-generation
Browse all apps: infsh app list
Documentation
-
Running Apps - How to run apps via CLI
-
Audio Transcription Example - Audio processing workflows
-
Apps Overview - Understanding the app ecosystem