azure-ai-voicelive-java

Azure AI VoiceLive SDK for Java

Real-time, bidirectional voice conversations with AI assistants using WebSocket technology.

Installation

<dependency> <groupId>com.azure</groupId> <artifactId>azure-ai-voicelive</artifactId> <version>1.0.0-beta.2</version> </dependency>

Environment Variables

AZURE_VOICELIVE_ENDPOINT=https://<resource>.openai.azure.com/ AZURE_VOICELIVE_API_KEY=<your-api-key>

Authentication

API Key

import com.azure.ai.voicelive.VoiceLiveAsyncClient; import com.azure.ai.voicelive.VoiceLiveClientBuilder; import com.azure.core.credential.AzureKeyCredential;

VoiceLiveAsyncClient client = new VoiceLiveClientBuilder() .endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT")) .credential(new AzureKeyCredential(System.getenv("AZURE_VOICELIVE_API_KEY"))) .buildAsyncClient();

DefaultAzureCredential (Recommended)

import com.azure.identity.DefaultAzureCredentialBuilder;

VoiceLiveAsyncClient client = new VoiceLiveClientBuilder() .endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT")) .credential(new DefaultAzureCredentialBuilder().build()) .buildAsyncClient();

Key Concepts

Concept Description

VoiceLiveAsyncClient

Main entry point for voice sessions

VoiceLiveSessionAsyncClient

Active WebSocket connection for streaming

VoiceLiveSessionOptions

Configuration for session behavior

Audio Requirements

Sample Rate: 24kHz (24000 Hz)
Bit Depth: 16-bit PCM
Channels: Mono (1 channel)
Format: Signed PCM, little-endian

Core Workflow

Start Session

import reactor.core.publisher.Mono;

client.startSession("gpt-4o-realtime-preview") .flatMap(session -> { System.out.println("Session started");

    // Subscribe to events
    session.receiveEvents()
        .subscribe(
            event -> System.out.println("Event: " + event.getType()),
            error -> System.err.println("Error: " + error.getMessage())
        );
    
    return Mono.just(session);
})
.block();

2. Configure Session Options

import com.azure.ai.voicelive.models.*; import java.util.Arrays;

ServerVadTurnDetection turnDetection = new ServerVadTurnDetection() .setThreshold(0.5) // Sensitivity (0.0-1.0) .setPrefixPaddingMs(300) // Audio before speech .setSilenceDurationMs(500) // Silence to end turn .setInterruptResponse(true) // Allow interruptions .setAutoTruncate(true) .setCreateResponse(true);

AudioInputTranscriptionOptions transcription = new AudioInputTranscriptionOptions( AudioInputTranscriptionOptionsModel.WHISPER_1);

VoiceLiveSessionOptions options = new VoiceLiveSessionOptions() .setInstructions("You are a helpful AI voice assistant.") .setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY))) .setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO)) .setInputAudioFormat(InputAudioFormat.PCM16) .setOutputAudioFormat(OutputAudioFormat.PCM16) .setInputAudioSamplingRate(24000) .setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD)) .setInputAudioEchoCancellation(new AudioEchoCancellation()) .setInputAudioTranscription(transcription) .setTurnDetection(turnDetection);

// Send configuration ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(options); session.sendEvent(updateEvent).subscribe();

Send Audio Input

byte[] audioData = readAudioChunk(); // Your PCM16 audio data session.sendInputAudio(BinaryData.fromBytes(audioData)).subscribe();

Handle Events

session.receiveEvents().subscribe(event -> { ServerEventType eventType = event.getType();

if (ServerEventType.SESSION_CREATED.equals(eventType)) {
    System.out.println("Session created");
} else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED.equals(eventType)) {
    System.out.println("User started speaking");
} else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED.equals(eventType)) {
    System.out.println("User stopped speaking");
} else if (ServerEventType.RESPONSE_AUDIO_DELTA.equals(eventType)) {
    if (event instanceof SessionUpdateResponseAudioDelta) {
        SessionUpdateResponseAudioDelta audioEvent = (SessionUpdateResponseAudioDelta) event;
        playAudioChunk(audioEvent.getDelta());
    }
} else if (ServerEventType.RESPONSE_DONE.equals(eventType)) {
    System.out.println("Response complete");
} else if (ServerEventType.ERROR.equals(eventType)) {
    if (event instanceof SessionUpdateError) {
        SessionUpdateError errorEvent = (SessionUpdateError) event;
        System.err.println("Error: " + errorEvent.getError().getMessage());
    }
}

});

Voice Configuration

OpenAI Voices

// Available: ALLOY, ASH, BALLAD, CORAL, ECHO, SAGE, SHIMMER, VERSE VoiceLiveSessionOptions options = new VoiceLiveSessionOptions() .setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)));

Azure Voices

// Azure Standard Voice options.setVoice(BinaryData.fromObject(new AzureStandardVoice("en-US-JennyNeural")));

// Azure Custom Voice options.setVoice(BinaryData.fromObject(new AzureCustomVoice("myVoice", "endpointId")));

// Azure Personal Voice options.setVoice(BinaryData.fromObject( new AzurePersonalVoice("speakerProfileId", PersonalVoiceModels.PHOENIX_LATEST_NEURAL)));

Function Calling

VoiceLiveFunctionDefinition weatherFunction = new VoiceLiveFunctionDefinition("get_weather") .setDescription("Get current weather for a location") .setParameters(BinaryData.fromObject(parametersSchema));

VoiceLiveSessionOptions options = new VoiceLiveSessionOptions() .setTools(Arrays.asList(weatherFunction)) .setInstructions("You have access to weather information.");

Best Practices

Use async client — VoiceLive requires reactive patterns
Configure turn detection for natural conversation flow
Enable noise reduction for better speech recognition
Handle interruptions gracefully with setInterruptResponse(true)
Use Whisper transcription for input audio transcription
Close sessions properly when conversation ends

Error Handling

session.receiveEvents() .doOnError(error -> System.err.println("Connection error: " + error.getMessage())) .onErrorResume(error -> { // Attempt reconnection or cleanup return Flux.empty(); }) .subscribe();

Reference Links

Resource URL

GitHub Source https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive

Samples https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive/src/samples

azure-ai-voicelive-java

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

azure-observability

azure-appconfiguration-java

copilot-sdk

skill-creator