podcast-generation

Podcast Generation with GPT Realtime Mini

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "podcast-generation" with this command: npx skills add claudedjale/skillset/claudedjale-skillset-podcast-generation

Podcast Generation with GPT Realtime Mini

Generate real audio narratives from text content using Azure OpenAI's Realtime API.

Quick Start

  • Configure environment variables for Realtime API

  • Connect via WebSocket to Azure OpenAI Realtime endpoint

  • Send text prompt, collect PCM audio chunks + transcript

  • Convert PCM to WAV format

  • Return base64-encoded audio to frontend for playback

Environment Configuration

AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini

Note: Endpoint should NOT include /openai/v1/

  • just the base URL.

Core Workflow

Backend Audio Generation

from openai import AsyncOpenAI import base64

Convert HTTPS endpoint to WebSocket URL

ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"

client = AsyncOpenAI( websocket_base_url=ws_url, api_key=api_key )

audio_chunks = [] transcript_parts = []

async with client.realtime.connect(model="gpt-realtime-mini") as conn: # Configure for audio-only output await conn.session.update(session={ "output_modalities": ["audio"], "instructions": "You are a narrator. Speak naturally." })

# Send text to narrate
await conn.conversation.item.create(item={
    "type": "message",
    "role": "user",
    "content": [{"type": "input_text", "text": prompt}]
})

await conn.response.create()

# Collect streaming events
async for event in conn:
    if event.type == "response.output_audio.delta":
        audio_chunks.append(base64.b64decode(event.delta))
    elif event.type == "response.output_audio_transcript.delta":
        transcript_parts.append(event.delta)
    elif event.type == "response.done":
        break

Convert PCM to WAV (see scripts/pcm_to_wav.py)

pcm_audio = b''.join(audio_chunks) wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)

Frontend Audio Playback

// Convert base64 WAV to playable blob const base64ToBlob = (base64, mimeType) => { const bytes = atob(base64); const arr = new Uint8Array(bytes.length); for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i); return new Blob([arr], { type: mimeType }); };

const audioBlob = base64ToBlob(response.audio_data, 'audio/wav'); const audioUrl = URL.createObjectURL(audioBlob); new Audio(audioUrl).play();

Voice Options

Voice Character

alloy Neutral

echo Warm

fable Expressive

onyx Deep

nova Friendly

shimmer Clear

Realtime API Events

  • response.output_audio.delta

  • Base64 audio chunk

  • response.output_audio_transcript.delta

  • Transcript text

  • response.done

  • Generation complete

  • error

  • Handle with event.error.message

Audio Format

  • Input: Text prompt

  • Output: PCM audio (24kHz, 16-bit, mono)

  • Storage: Base64-encoded WAV

References

  • Full architecture: See references/architecture.md for complete stack design

  • Code examples: See references/code-examples.md for production patterns

  • PCM conversion: Use scripts/pcm_to_wav.py for audio format conversion

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

azure-observability

No summary provided by upstream source.

Repository SourceNeeds Review
General

azure-appconfiguration-java

No summary provided by upstream source.

Repository SourceNeeds Review
General

azure-aigateway

No summary provided by upstream source.

Repository SourceNeeds Review
General

azure-ai-formrecognizer-java

No summary provided by upstream source.

Repository SourceNeeds Review