deepgram-sdk-patterns

Deepgram SDK Patterns

Overview

Production patterns for the Deepgram speech-to-text SDK (deepgram-sdk ). Covers pre-recorded transcription, live streaming, speaker diarization, and multi-language support with proper error handling.

Prerequisites

pip install deepgram-sdk or npm install @deepgram/sdk
DEEPGRAM_API_KEY environment variable
Audio files or microphone access

Instructions

Step 1: Client Initialization

from deepgram import DeepgramClient, PrerecordedOptions, LiveOptions import os

def get_deepgram_client() -> DeepgramClient: return DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

import { createClient, DeepgramClient } from '@deepgram/sdk'; const deepgram = createClient(process.env.DEEPGRAM_API_KEY!);

Step 2: Pre-Recorded Transcription

def transcribe_file(file_path: str, language: str = "en") -> dict: client = get_deepgram_client() with open(file_path, "rb") as audio: response = client.listen.rest.v("1").transcribe_file( {"buffer": audio.read(), "mimetype": get_mimetype(file_path)}, PrerecordedOptions( model="nova-2", language=language, smart_format=True, punctuate=True, diarize=True, utterances=True, paragraphs=True ) ) transcript = response.results.channels[0].alternatives[0] return { "text": transcript.transcript, "confidence": transcript.confidence, "words": [{"word": w.word, "start": w.start, "end": w.end, "speaker": getattr(w, 'speaker', None)} for w in (transcript.words or [])] }

Step 3: Live Streaming Transcription

import asyncio

async def stream_microphone(): client = get_deepgram_client() connection = client.listen.asyncwebsocket.v("1")

async def on_message(self, result, **kwargs):
    transcript = result.channel.alternatives[0].transcript
    if transcript:
        print(f"[{result.type}] {transcript}")

connection.on("Results", on_message)

options = LiveOptions(
    model="nova-2",
    language="en",
    smart_format=True,
    interim_results=True,
    endpointing=300  # 300: timeout: 5 minutes
)

await connection.start(options)
# Send audio chunks from microphone...
# await connection.send(audio_bytes)
await connection.finish()

Step 4: Batch Processing with Concurrency Control

import asyncio from concurrent.futures import ThreadPoolExecutor

async def batch_transcribe(files: list[str], max_concurrent: int = 5) -> list: semaphore = asyncio.Semaphore(max_concurrent) results = []

async def process_one(path):
    async with semaphore:
        loop = asyncio.get_event_loop()
        result = await loop.run_in_executor(None, transcribe_file, path)
        return {"file": path, **result}

tasks = [process_one(f) for f in files]
results = await asyncio.gather(*tasks, return_exceptions=True)
return [r if not isinstance(r, Exception) else {"error": str(r)} for r in results]

Error Handling

Error Cause Solution

401 Unauthorized

Invalid API key Check DEEPGRAM_API_KEY

400 Unsupported format

Bad audio codec Convert to WAV/MP3/FLAC

Empty transcript No speech in audio Check audio quality and volume

WebSocket disconnect Network instability Implement reconnection logic

Examples

Speaker-Labeled Transcript

result = transcribe_file("meeting.wav") current_speaker = None for word in result["words"]: if word["speaker"] != current_speaker: current_speaker = word["speaker"] print(f"\nSpeaker {current_speaker}:", end=" ") print(word["word"], end=" ")

Resources

Deepgram SDK Python
Deepgram API Docs

Output

Configuration files or code changes applied to the project
Validation report confirming correct implementation
Summary of changes made and their rationale

deepgram-sdk-patterns

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

backtesting-trading-strategies

svg-icon-generator

performance-lighthouse-runner

mindmap-generator