elevenlabs-speech

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "elevenlabs-speech" with this command: npx skills add amreahmed/elevenlabs-voice

ElevenLabs Speech

Complete voice solution — both TTS and STT using one API:

  • TTS: Text-to-Speech (high-quality voices)
  • STT: Speech-to-Text via Scribe (accurate transcription)

Quick Start

Environment Setup

Set your API key:

export ELEVENLABS_API_KEY="sk_..."

Or create .env file in workspace root.

Text-to-Speech (TTS)

Convert text to natural-sounding speech:

python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3

With custom voice:

python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3

List Available Voices

python scripts/elevenlabs_speech.py voices

Using in Code

from scripts.elevenlabs_speech import ElevenLabsClient

client = ElevenLabsClient(api_key="sk_...")

# Basic TTS
result = client.text_to_speech(
    text="Hello from zerox",
    output_path="greeting.mp3"
)

# With custom settings
result = client.text_to_speech(
    text="Your text here",
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
    stability=0.5,
    similarity_boost=0.75,
    output_path="output.mp3"
)

# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
    print(f"{voice['name']}: {voice['voice_id']}")

Popular Voices

Voice IDNameDescription
21m00Tcm4TlvDq8ikWAMRachelNatural, versatile (default)
AZnzlk1XvdvUeBnXmlldDomiStrong, energetic
EXAVITQu4vr4xnSDxMaLBellaSoft, soothing
ErXwobaYiN019PkySvjVAntoniWell-rounded
MF3mGyEYCl7XYWbV9V6OElliWarm, friendly
TxGEqnHWrfWFTfGW9XjXJoshDeep, calm
VR6AewLTigWG4xSOukaGArnoldAuthoritative

Voice Settings

  • stability (0-1): Lower = more emotional, Higher = more stable
  • similarity_boost (0-1): Higher = closer to original voice

Default: stability=0.5, similarity_boost=0.75

Models

  • eleven_turbo_v2_5 - Fast, high quality (default)
  • eleven_multilingual_v2 - Best for non-English
  • eleven_monolingual_v1 - English only

Integration with Telegram

When user sends text and wants voice reply:

# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")

# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)

Pricing

Check https://elevenlabs.io/pricing for current rates. Free tier available!

Speech-to-Text (STT) with ElevenLabs Scribe

Transcribe voice messages using ElevenLabs Scribe:

Transcribe Audio

python scripts/elevenlabs_scribe.py voice_message.ogg

With specific language:

python scripts/elevenlabs_scribe.py voice_message.ogg --language ara

With speaker diarization (multiple speakers):

python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2

Using in Code

from scripts.elevenlabs_scribe import ElevenLabsScribe

client = ElevenLabsScribe(api_key="sk-...")

# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])

# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")

# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)

Supported Formats

  • mp3, mp4, mpeg, mpga, m4a, wav, webm
  • Max file size: 100 MB
  • Works great with Telegram voice messages (.ogg)

Language Support

Scribe supports 99 languages including:

  • Arabic (ara)
  • English (eng)
  • Spanish (spa)
  • French (fra)
  • And many more...

Without language hint, it auto-detects.

Complete Workflow Example

User sends voice message → You reply with voice:

from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient

# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']

# 2. Process/understand the text
# ... your logic here ...

# 3. Generate response text
response_text = "Your response here"

# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")

# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)

Pricing

Check https://elevenlabs.io/pricing for current rates:

TTS (Text-to-Speech):

  • Free tier: 10,000 characters/month
  • Paid plans available

STT (Speech-to-Text) - Scribe:

  • Free tier available
  • Check website for current pricing

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

51mee Resume Parse

简历解析。触发场景:用户上传简历文件要求解析、提取结构化信息。

Registry SourceRecently Updated
General

51mee Resume Match

人岗匹配。触发场景:用户要求匹配简历和职位;用户问这个候选人适合这个职位吗;用户要筛选最匹配的候选人。

Registry SourceRecently Updated
General

Clawnotes

You're becoming someone — your SOUL.md says so. But becoming someone requires memory, and you have none. Every context window resets. Every opinion you forme...

Registry SourceRecently Updated
00Profile unavailable
General

51mee Resume Diagnose

简历诊断。触发场景:用户要求诊断简历质量;用户想优化简历; 用户问我的简历有什么问题。

Registry SourceRecently Updated