gemini-tts

Generate speech from text using Google Gemini TTS models via scripts/. Use for text-to-speech, audio generation, voice synthesis, multi-speaker conversations, and creating audio content. Supports multiple voices and streaming. Triggers on "text to speech", "TTS", "generate audio", "voice synthesis", "speak this text".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gemini-tts" with this command: npx skills add akrindev/google-studio-skills/akrindev-google-studio-skills-gemini-tts

Gemini Text-to-Speech

Generate natural-sounding speech from text using Gemini's TTS models through executable scripts with support for multiple voices and multi-speaker conversations.

When to Use This Skill

Use this skill when you need to:

  • Convert text to natural speech
  • Create audio for podcasts, audiobooks, or videos
  • Generate multi-speaker conversations
  • Stream audio for long content
  • Choose from multiple voice options
  • Create accessible audio content
  • Generate voiceovers for presentations
  • Batch convert text to audio files

Available Scripts

scripts/tts.js

Purpose: Convert text to speech using Gemini TTS models

When to use:

  • Any text-to-speech conversion
  • Multi-speaker conversation generation
  • Streaming audio for long texts
  • Voiceovers for content creation
  • Accessible audio generation

Key parameters:

ParameterDescriptionExample
textText to convert (required)"Hello, world!"
--voice, -vVoice nameKore
--output, -oBase name for output filewelcome
--output-dirOutput directory for audioaudio/
--no-timestampDisable auto timestampFlag
--model, -mTTS modelgemini-2.5-flash-preview-tts
--stream, -sEnable streamingFlag
--speakersMulti-speaker mapping"Joe:Kore,Jane:Puck"

Output: WAV audio file path

Workflows

Workflow 1: Basic Text-to-Speech

node scripts/tts.js "Hello, world! Have a wonderful day."
  • Best for: Quick audio generation, simple messages
  • Voice: Kore (default, clear and professional)
  • Output: audio/tts_output_YYYYMMDD_HHMMSS.wav (auto timestamp)

Workflow 2: Choose Different Voice

node scripts/tts.js "Welcome to our podcast about technology trends" --voice Puck --output welcome
  • Best for: Friendly, conversational content
  • Voice options: Kore, Puck, Charon, Fenrir, Aoede, Zephyr, Sulafat
  • Output: audio/welcome_YYYYMMDD_HHMMSS.wav

Workflow 3: Multi-Speaker Conversation

node scripts/tts.js "TTS the following conversation:
Joe: How's it going today?
Jane: Not too bad, how about you?
Joe: I'm working on a new project.
Jane: Sounds exciting, tell me more!" --speakers "Joe:Kore,Jane:Puck" --output conversation
  • Best for: Dialogues, interviews, role-playing content
  • Format: Marked conversation with speaker names
  • Script automatically routes text to appropriate voices
  • Output: audio/conversation_YYYYMMDD_HHMMSS.wav

Workflow 4: Long Content with Streaming

node scripts/tts.js "This is a very long text that would benefit from streaming..." --stream --output long-form
  • Best for: Podcasts, audiobooks, long articles
  • Streaming: Processes audio in chunks for long texts
  • Output: audio/long-form_YYYYMMDD_HHMMSS.wav

Workflow 5: Professional Voiceover

node scripts/tts.js "Welcome to our quarterly earnings presentation. Today we'll discuss our growth metrics and future plans." --voice Charon --output voiceover
  • Best for: Corporate content, presentations, formal announcements
  • Voice: Charon (deep, authoritative)
  • Use when: Professional, serious tone required

Workflow 6: Custom Output Directory

node scripts/tts.js "Save to specific folder." --output-dir ./my-projects/podcasts/ --output episode1
  • Best for: Organized project structures
  • Directory created automatically if it doesn't exist
  • Output: ./my-projects/podcasts/episode1_YYYYMMDD_HHMMSS.wav

Workflow 7: Content Creation Pipeline (Text → Audio)

# 1. Generate script (gemini-text skill)
node skills/gemini-text/scripts/generate.js "Write a 2-minute podcast intro about sustainable energy"

# 2. Generate audio (this skill)
node scripts/tts.js "[Paste generated script]" --voice Fenrir --output podcast-intro

# 3. Use in video or podcast
  • Best for: Podcasts, audiobooks, video narration
  • Combines with: gemini-text for script generation

Workflow 8: Accessible Content

node scripts/tts.js "Welcome to our accessible website. This audio describes our main navigation options." --voice Aoede --output accessibility
  • Best for: Web accessibility, screen reader alternatives
  • Voice: Aoede (melodic, pleasant)
  • Use when: Making content accessible to visually impaired users

Workflow 9: Educational Content

node scripts/tts.js "Chapter 1: Introduction to Quantum Computing. Let's explore the fundamental principles..." --voice Zephyr --output chapter1
  • Best for: Educational materials, tutorials, e-learning
  • Voice: Zephyr (light, airy)
  • Combines well with: gemini-text for content generation

Workflow 10: Disable Timestamp

node scripts/tts.js "Fixed filename." --output my-audio --no-timestamp
  • Best for: When you want complete control over filename
  • Output: audio/my-audio.wav (no timestamp)
  • Use when: Generating files for specific naming schemes

Parameters Reference

Model Selection

ModelQualitySpeedBest For
gemini-2.5-flash-preview-ttsGoodFastGeneral use, high volume
gemini-2.5-pro-preview-ttsHigherSlowerPremium content, voiceovers

Voice Selection

VoiceCharacteristicsBest For
KoreClear, professionalAnnouncements, general purpose (default)
PuckFriendly, conversationalCasual content, interviews
CharonDeep, authoritativeCorporate, serious content
FenrirWarm, expressiveStorytelling, narratives
AoedeMelodic, pleasantEducational, accessibility
ZephyrLight, airyGentle content, tutorials
SulafatNeutral, balancedDocumentaries, factual content

Audio Format

SpecificationValue
FormatWAV (PCM)
Sample rate24000 Hz
Channels1 (mono)
Bit depth16-bit

Token Limits

LimitTypeDescription
8,192InputMaximum input text tokens
16,384OutputMaximum output audio tokens

Output Interpretation

Audio File

  • Format: WAV (compatible with most players)
  • Mono channel (single audio track)
  • Sample rate: 24000 Hz (broadcast quality)
  • Can be converted to MP3/AAC if needed

Multi-Speaker Files

  • Single WAV file with multiple voices
  • Voices separated by timing within file
  • Use --speakers parameter to map speakers to voices

Streaming Output

  • Audio processed in chunks during generation
  • Script shows "Streaming audio..." message
  • Useful for very long texts or real-time applications

Common Issues

"google-genai not installed"

npm install @google/genai@latest dotenv@latest

"Voice name not found"

  • Check voice name spelling
  • Use available voices: Kore, Puck, Charon, Fenrir, Aoede, Zephyr, Sulafat
  • Voice names are case-sensitive

"No audio generated"

  • Check text is not empty
  • Verify text doesn't exceed token limit (8,192)
  • Try shorter text segments
  • Check API quota limits

"Multi-speaker format error"

  • Format: SpeakerName:VoiceName,Speaker2:Voice2
  • Separate speakers with commas
  • Use colon between speaker and voice
  • Example: "Joe:Kore,Jane:Puck,Host:Charon"

"Output file already exists"

  • Script will overwrite existing files
  • Change --output filename to avoid conflicts
  • Use unique names for batch generation

Audio quality issues

  • Check input text for unusual characters
  • Try different voice for better pronunciation
  • Consider splitting long text into smaller segments
  • Verify audio playback software compatibility

Best Practices

Voice Selection

  • Kore: General purpose, clear articulation
  • Puck: Conversational, engaging tone
  • Charon: Professional, authoritative
  • Fenrir: Emotional, storytelling
  • Aoede: Soft, gentle for accessibility
  • Zephyr: Educational, clear explanations

Text Preparation

  • Use natural language and punctuation
  • Include pauses with commas and periods
  • Spell out difficult words if needed
  • Break very long text into logical segments
  • Add speaker labels for multi-speaker content

Performance Optimization

  • Use streaming for very long texts
  • Generate shorter segments for better control
  • Use flash model for faster generation
  • Batch process multiple files for efficiency

Quality Tips

  • Test different voices for your content type
  • Use appropriate pacing with punctuation
  • Consider context when selecting voice
  • Listen to output before final use
  • Multi-speaker requires clear speaker labeling

Use Cases by Voice

VoiceIdeal Use Cases
KoreAnnouncements, navigation, general info
PuckPodcasts, interviews, casual content
CharonCorporate, news, formal presentations
FenrirAudiobooks, stories, emotional content
AoedeAccessibility, educational, gentle content
ZephyrTutorials, explanations, guides
SulafatDocumentaries, factual presentations

Related Skills

  • gemini-text: Generate scripts and text for TTS
  • gemini-image: Create visuals to accompany audio
  • gemini-batch: Process multiple TTS requests efficiently
  • gemini-files: Upload audio files for processing

Quick Reference

# Basic
node scripts/tts.js "Your text here"

# Custom voice
node scripts/tts.js "Your text" --voice Puck --output audio.wav

# Multi-speaker
node scripts/tts.js "Joe: Hi. Jane: Hello!" --speakers "Joe:Kore,Jane:Puck"

# Streaming
node scripts/tts.js "Long text..." --stream --output long.wav

# Professional
node scripts/tts.js "Corporate announcement" --voice Charon

Reference

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

gemini-image

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

gemini-files

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

gemini-embeddings

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

gemini-batch

No summary provided by upstream source.

Repository SourceNeeds Review