text-to-speech

Convert text to natural speech using Sarvam AI's Bulbul v3 model. Handles audio generation, voiceovers, and voice interfaces for 11 Indian languages with 30+ voices. Supports REST, HTTP streaming, WebSocket, and pronunciation dictionaries. Use when generating spoken audio from text.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "text-to-speech" with this command: npx skills add sarvamai/skills/sarvamai-skills-text-to-speech

Text-to-Speech — Bulbul

[!IMPORTANT] Auth: api-subscription-key header — NOT Authorization: Bearer. Base URL: https://api.sarvam.ai/v1

Model

bulbul:v3 — 11 languages, 30+ voices (default: shubh), REST/HTTP stream/WebSocket.

Quick Start (Python)

from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh"
)
save(response, "output.wav")

# HTTP Stream (lower latency, binary audio)
chunks = []
for chunk in client.text_to_speech.convert_stream(
    text="Hello from Sarvam AI",
    target_language_code="en-IN",
    speaker="shubh",
    model="bulbul:v3"
):
    chunks.append(chunk)
audio = b"".join(chunks)

Quick Start (JavaScript/TypeScript)

import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";

const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// REST
const response = await client.textToSpeech.convert({
    text: "नमस्ते, आप कैसे हैं?",
    target_language_code: "hi-IN",
    model: "bulbul:v3",
    speaker: "shubh"
});

// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
    text: "Hello from Sarvam AI",
    target_language_code: "en-IN",
    speaker: "shubh",
    model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);

WebSocket Streaming

import asyncio
from sarvamai import AsyncSarvamAI

async def tts_stream():
    client = AsyncSarvamAI()
    async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
        await ws.configure(target_language_code="hi-IN", speaker="shubh")
        await ws.convert("Your text here")
        await ws.flush()
        async for message in ws:
            pass  # base64 audio chunks

asyncio.run(tts_stream())

Character Limits

MethodMax Text
REST (convert)2,500 chars
HTTP Stream (convert_stream)3,500 chars
WebSocket2,500 chars/msg

Gotchas

GotchaDetail
JS method nameclient.textToSpeech.convert({...}) and .convertStream({...}) — camelCase. Stream returns BinaryResponse with .stream(), .bytes(), .blob().
pitch/loudness rejectedSDK accepts these but API returns 400 for v3. Only pace (0.5–2.0) works.
v2 voices incompatibleanushka, abhilash, arya, etc. don't work with v3. Use shubh (default).
Sample rate >24kHz32kHz, 44.1kHz, 48kHz only via REST, not streaming.
REST responseBase64-encoded audio in response.audios[0]. Use sarvamai.play.save() or base64.b64decode().
Pronunciation dictionarydict_id param teaches custom word pronunciations. Create via client.pronunciation_dictionary.create(file=f).

Full Docs

Fetch voice catalog, streaming protocol, pronunciation dictionary CRUD, and codec options from:

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

text-to-speech

No summary provided by upstream source.

Repository SourceNeeds Review
General

text-to-speech

No summary provided by upstream source.

Repository SourceNeeds Review
General

text-to-speech

No summary provided by upstream source.

Repository SourceNeeds Review
General

text-to-speech

No summary provided by upstream source.

Repository SourceNeeds Review