text-to-speech

Text-to-Speech: Generate audio from text with Gemini

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "text-to-speech" with this command: npx skills add superconductor/superconductor-plugin-marketplace/superconductor-superconductor-plugin-marketplace-text-to-speech

Text-to-Speech: Generate audio from text with Gemini

Overview

Convert text to natural-sounding speech using Google Gemini's TTS models. Supports:

  • 30 prebuilt voices with distinct characteristics

  • 24 languages with automatic detection

  • Single-speaker and multi-speaker audio

  • Natural intonation and expression

Reference: https://ai.google.dev/gemini-api/docs/speech-generation

How to use

bash ${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh --model=gemini-2.5-flash-preview-tts "TEXT TO SPEAK"

Arguments:

  • --model

  • Required: Use a TTS model (see Models below)

  • --voice

  • Optional: Voice name (default: Kore )

Examples:

Generate speech with default voice

npx -y superconductor-gemini-skills --model=gemini-2.5-flash-preview-tts "Hello, welcome to our application."

Use a specific voice

npx -y superconductor-gemini-skills --model=gemini-2.5-flash-preview-tts --voice=Puck "The quick brown fox jumps over the lazy dog."

Generate longer narration

npx -y superconductor-gemini-skills --model=gemini-2.5-flash-preview-tts --voice=Charon "In today's tutorial, we'll explore the fundamentals of machine learning."

Use higher quality model for professional content

npx -y superconductor-gemini-skills --model=gemini-2.5-pro-preview-tts --voice=Kore "This is a premium quality voice synthesis."

Available voices

Voice Name Description

Kore

Default voice, clear and professional

Puck

Friendly and warm

Charon

Deep and authoritative

Fenrir

Energetic and dynamic

Leda

Soft and gentle

Orus

Neutral and balanced

Zephyr

Light and airy

Aoede

Melodic and expressive

Additional voices: Altair, Calliope, Clio, Electra, Ember, Eris, Helios, Hyperion, Iris, Lyra, Melpomene, Nova, Orion, Polaris, Sage, Selene, Thalia, Titan, Vega, and more.

Supported languages

English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Japanese, Korean, Chinese (Simplified/Traditional), Arabic, Hindi, Turkish, Polish, Vietnamese, Thai, Indonesian, and more.

Languages are automatically detected from the input text.

Output

Generated audio is saved to the current directory as gemini-speech-{timestamp}.wav .

  • Format: WAV (PCM)

  • Sample rate: 24000 Hz

  • Channels: Mono

  • Bit depth: 16-bit

API Key

The GEMINI_API_KEY environment variable must be set. Get your key at: https://ai.google.dev/gemini-api/docs/api-key

Models

Model ID Context Window Pricing (Input / Output)

gemini-2.5-flash-preview-tts

8k / 16k $0.50 / $10 per 1M tokens

gemini-2.5-pro-preview-tts

8k / 16k $1.00 / $20 per 1M tokens

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

gemini-consultation

No summary provided by upstream source.

Repository SourceNeeds Review
General

audio-understanding

No summary provided by upstream source.

Repository SourceNeeds Review
General

video-understanding

No summary provided by upstream source.

Repository SourceNeeds Review
General

x-api

No summary provided by upstream source.

Repository SourceNeeds Review