Text-to-Speech — Bulbul

[!IMPORTANT] Auth: api-subscription-key header — NOT Authorization: Bearer. Base URL: https://api.sarvam.ai/v1

Model

bulbul:v3 — 11 languages, 30+ voices (default: shubh), REST/HTTP stream/WebSocket.

Quick Start (Python)

from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh"
)
save(response, "output.wav")

# HTTP Stream (lower latency, binary audio)
chunks = []
for chunk in client.text_to_speech.convert_stream(
    text="Hello from Sarvam AI",
    target_language_code="en-IN",
    speaker="shubh",
    model="bulbul:v3"
):
    chunks.append(chunk)
audio = b"".join(chunks)

Quick Start (JavaScript/TypeScript)

import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";

const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// REST
const response = await client.textToSpeech.convert({
    text: "नमस्ते, आप कैसे हैं?",
    target_language_code: "hi-IN",
    model: "bulbul:v3",
    speaker: "shubh"
});

// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
    text: "Hello from Sarvam AI",
    target_language_code: "en-IN",
    speaker: "shubh",
    model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);

WebSocket Streaming

import asyncio
from sarvamai import AsyncSarvamAI

async def tts_stream():
    client = AsyncSarvamAI()
    async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
        await ws.configure(target_language_code="hi-IN", speaker="shubh")
        await ws.convert("Your text here")
        await ws.flush()
        async for message in ws:
            pass  # base64 audio chunks

asyncio.run(tts_stream())

Character Limits

Method	Max Text
REST (`convert`)	2,500 chars
HTTP Stream (`convert_stream`)	3,500 chars
WebSocket	2,500 chars/msg

Gotchas

Gotcha	Detail
JS method name	`client.textToSpeech.convert({...})` and `.convertStream({...})` — camelCase. Stream returns `BinaryResponse` with `.stream()`, `.bytes()`, `.blob()`.
`pitch`/`loudness` rejected	SDK accepts these but API returns 400 for v3. Only `pace` (0.5–2.0) works.
v2 voices incompatible	`anushka`, `abhilash`, `arya`, etc. don't work with v3. Use `shubh` (default).
Sample rate >24kHz	32kHz, 44.1kHz, 48kHz only via REST, not streaming.
REST response	Base64-encoded audio in `response.audios[0]`. Use `sarvamai.play.save()` or `base64.b64decode()`.
Pronunciation dictionary	`dict_id` param teaches custom word pronunciations. Create via `client.pronunciation_dictionary.create(file=f)`.

Full Docs

Fetch voice catalog, streaming protocol, pronunciation dictionary CRUD, and codec options from:

https://docs.sarvam.ai/llms.txt — comprehensive docs index
TTS Overview
Voice Catalog
HTTP Stream
Pronunciation Dictionary
Rate Limits

text-to-speech

Safety Notice

Copy this and send it to your AI assistant to learn

Text-to-Speech — Bulbul

Model

Quick Start (Python)

Quick Start (JavaScript/TypeScript)

WebSocket Streaming

Character Limits

Gotchas

Full Docs

Source Transparency

Related Skills

text-to-speech

text-to-speech

text-to-speech

text-to-speech