elevenlabs-speech

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "elevenlabs-speech" with this command: npx skills add jeffpignataro/miranda-elevenlabs-speech

ElevenLabs Speech

Complete voice solution — both TTS and STT using one API:

  • TTS: Text-to-Speech (high-quality voices)
  • STT: Speech-to-Text via Scribe (accurate transcription)

Quick Start

Environment Setup

Set your API key:

export ELEVENLABS_API_KEY="sk_..."

Or create .env file in workspace root.

Text-to-Speech (TTS)

Convert text to natural-sounding speech:

python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3

With custom voice:

python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3

List Available Voices

python scripts/elevenlabs_speech.py voices

Using in Code

from scripts.elevenlabs_speech import ElevenLabsClient

client = ElevenLabsClient(api_key="sk_...")

# Basic TTS
result = client.text_to_speech(
    text="Hello from zerox",
    output_path="greeting.mp3"
)

# With custom settings
result = client.text_to_speech(
    text="Your text here",
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
    stability=0.5,
    similarity_boost=0.75,
    output_path="output.mp3"
)

# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
    print(f"{voice['name']}: {voice['voice_id']}")

Popular Voices

Voice IDNameDescription
21m00Tcm4TlvDq8ikWAMRachelNatural, versatile (default)
AZnzlk1XvdvUeBnXmlldDomiStrong, energetic
EXAVITQu4vr4xnSDxMaLBellaSoft, soothing
ErXwobaYiN019PkySvjVAntoniWell-rounded
MF3mGyEYCl7XYWbV9V6OElliWarm, friendly
TxGEqnHWrfWFTfGW9XjXJoshDeep, calm
VR6AewLTigWG4xSOukaGArnoldAuthoritative

Voice Settings

  • stability (0-1): Lower = more emotional, Higher = more stable
  • similarity_boost (0-1): Higher = closer to original voice

Default: stability=0.5, similarity_boost=0.75

Models

  • eleven_turbo_v2_5 - Fast, high quality (default)
  • eleven_multilingual_v2 - Best for non-English
  • eleven_monolingual_v1 - English only

Integration with Telegram

When user sends text and wants voice reply:

# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")

# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)

Pricing

Check https://elevenlabs.io/pricing for current rates. Free tier available!

Speech-to-Text (STT) with ElevenLabs Scribe

Transcribe voice messages using ElevenLabs Scribe:

Transcribe Audio

python scripts/elevenlabs_scribe.py voice_message.ogg

With specific language:

python scripts/elevenlabs_scribe.py voice_message.ogg --language ara

With speaker diarization (multiple speakers):

python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2

Using in Code

from scripts.elevenlabs_scribe import ElevenLabsScribe

client = ElevenLabsScribe(api_key="sk-...")

# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])

# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")

# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)

Supported Formats

  • mp3, mp4, mpeg, mpga, m4a, wav, webm
  • Max file size: 100 MB
  • Works great with Telegram voice messages (.ogg)

Language Support

Scribe supports 99 languages including:

  • Arabic (ara)
  • English (eng)
  • Spanish (spa)
  • French (fra)
  • And many more...

Without language hint, it auto-detects.

Complete Workflow Example

User sends voice message → You reply with voice:

from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient

# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']

# 2. Process/understand the text
# ... your logic here ...

# 3. Generate response text
response_text = "Your response here"

# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")

# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)

Pricing

Check https://elevenlabs.io/pricing for current rates:

TTS (Text-to-Speech):

  • Free tier: 10,000 characters/month
  • Paid plans available

STT (Speech-to-Text) - Scribe:

  • Free tier available
  • Check website for current pricing

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

地藏经药师经智慧

地藏经药师经智慧 - 佛家孝道与救度思想,涵盖地藏本愿、药师十二愿、因果报应、消灾延寿等核心智慧,适用于道德修养、慈悲精神、身心健康

Registry SourceRecently Updated
General

Precision Oncology Zhcn

综合学术文献、流行病学报告、临床与药物指南及临床试验报告,提供关于癌症及其治疗的报告。 基于癌变机制进行详细的分子生物学和组织学分析。 当查询涉及以下内容时加载本技能: - 癌症或肿瘤 - 癌变机制 - 癌症或肿瘤的治疗 典型查询 - 乳腺癌是如何发生的? - 白血病的一线和二线治疗 - CAR-T 疗法治疗胰腺...

Registry SourceRecently Updated
General

hermes-traffic-guardian

Hermes runtime traffic monitoring baseline for opt-in proxy inspection, egress detection, and attestation-aware traffic posture.

Registry SourceRecently Updated
General

Scp Paradigm

Use when analyzing how industry structure drives firm behavior and market performance, assessing market concentration, entry barriers, or competitive dynamic...

Registry SourceRecently Updated