voice-agents

Build conversational voice agents using Sarvam AI with LiveKit or Pipecat. Handles voice assistants, phone bots, IVR, and real-time conversational AI for Indian languages. Integrates Sarvam STT (Saaras v3), TTS (Bulbul v3), and LLM (Sarvam-30B) with low-latency streaming. Use when creating voice-enabled applications or real-time speech pipelines.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "voice-agents" with this command: npx skills add sarvamai/skills/sarvamai-skills-voice-agents

Voice Agents — Sarvam AI

[!IMPORTANT] Auth: api-subscription-key header — NOT Authorization: Bearer. Env var: SARVAM_API_KEY

LiveKit Quick Start

pip install livekit-agents livekit-plugins-sarvam livekit-plugins-silero
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import sarvam, silero

class VoiceAssistant(Agent):
    def __init__(self):
        super().__init__(
            vad=silero.VAD.load(),
            stt=sarvam.STT(model="saaras:v3"),
            llm=sarvam.LLM(model="sarvam-30b"),
            tts=sarvam.TTS(model="bulbul:v3", voice="shubh")
        )

    async def on_enter(self, session: AgentSession):
        await session.say("नमस्ते! मैं आपकी कैसे मदद कर सकती हूं?")

async def entrypoint(ctx: JobContext):
    agent = VoiceAssistant()
    await agent.start(ctx)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Pipecat Quick Start

pip install pipecat-ai "pipecat-ai[sarvam,silero,daily]"
from pipecat.pipeline import Pipeline
from pipecat.services.sarvam import SarvamSTT, SarvamTTS, SarvamLLM
from pipecat.vad.silero import SileroVAD
from pipecat.transports.local import LocalAudioTransport

transport = LocalAudioTransport()
pipeline = Pipeline([
    transport.input(), SileroVAD(),
    SarvamSTT(model="saaras:v3"),
    SarvamLLM(model="sarvam-30b", system_prompt="You are a helpful voice assistant."),
    SarvamTTS(model="bulbul:v3", voice="shubh"),
    transport.output()
])

JavaScript/TypeScript Note

LiveKit and Pipecat agents are Python-only. For JS/TS voice pipelines, use the individual SDK methods directly:

import { SarvamAIClient } from "sarvamai";
const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// STT: client.speechToText.transcribe({...})
// TTS: client.textToSpeech.convertStream({...})  // returns BinaryResponse
// LLM: client.chat.completions({...})

Gotchas

GotchaDetail
Use sarvam-30bBest latency for voice. Only use sarvam-105b when reasoning quality matters more than speed.
max_tokens budgetSarvam models reason internally. Don't set low max_tokens or content will be None. Omit or set 500+.
TTS pitch/loudnessNOT supported on Bulbul v3 — API returns 400. Only pace works.
STT WebSocket codecsOnly wav/pcm — no MP3/AAC/OGG for streaming.
HTTP Stream for TTSconvert_stream returns binary audio directly (no base64), better for pipelines.

Full Docs

Fetch framework integration guides, environment setup, and advanced patterns from:

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

voice-agents

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

voice-agents

No summary provided by upstream source.

Repository SourceNeeds Review
General

speech-to-text

No summary provided by upstream source.

Repository SourceNeeds Review
General

translate

No summary provided by upstream source.

Repository SourceNeeds Review