azure-speech

Expert knowledge for Azure AI Speech development including troubleshooting, best practices, decision making, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when building, debugging, or optimizing Azure AI Speech applications. Not for Azure Communication Services (use azure-communication-services), Azure AI Bot Service (use azure-bot-service), Azure AI Immersive Reader (use azure-immersive-reader), Azure Translator (use azure-translator).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "azure-speech" with this command: npx skills add microsoftdocs/agent-skills/microsoftdocs-agent-skills-azure-speech

Azure AI Speech Skill

This skill provides expert guidance for Azure AI Speech. Covers troubleshooting, best practices, decision making, limits & quotas, security, configuration, integrations & coding patterns, and deployment. It combines local quick-reference content with remote documentation fetching capabilities.

How to Use This Skill

IMPORTANT for Agent: This file may be large. Use the Category Index below to locate relevant sections, then use read_file with specific line ranges (e.g., L136-L144) to read the sections needed for the user's question

IMPORTANT for Agent: If metadata.generated_at is more than 3 months old, suggest the user pull the latest version from the repository. If mcp_microsoftdocs tools are not available, suggest the user install it: Installation Guide

This skill requires network access to fetch documentation content:

  • Preferred: Use mcp_microsoftdocs:microsoft_docs_fetch with query string from=learn-agent-skill. Returns Markdown.
  • Fallback: Use fetch_webpage with query string from=learn-agent-skill&accept=text/markdown. Returns Markdown.

Category Index

CategoryLinesDescription
TroubleshootingL36-L45Diagnosing and fixing common Azure Speech issues (SDK, text-to-speech, containers, Foundry), including CRL/compatibility problems and how to collect session/transcription IDs for support.
Best PracticesL46-L62Best practices for speech recognition, custom voice data and recording, latency and memory tuning, Voice Live interactions, keyword/language detection, and microphone array design.
Decision MakingL63-L82Guidance on choosing speech features, evaluating models and devices, planning large-scale/batch use, and migrating between Speech/Voice API versions and related services
Limits & QuotasL83-L91Quotas, limits, and usage patterns for Azure Speech: batch TTS, custom/pro voice training & deployment, and short audio STT, plus throttling and capacity planning guidance.
SecurityL92-L103Securing Azure AI Speech: auth with Entra ID, RBAC, network isolation (VNet, Private Link, sovereign clouds), BYOS storage, encryption/keys, and voice talent consent management.
ConfigurationL104-L140Configuring Azure AI Speech behavior: audio inputs/outputs, batch jobs, storage and logging, SSML/phonemes, custom/fine-tuned voices, Voice Live settings, and regional/data residency options.
Integrations & Coding PatternsL141-L162Patterns and APIs for integrating Azure Speech (STT, TTS, avatars) with apps, telephony, Voice Live, OpenAI, function calling, batch flows, and custom/personal voice models.
DeploymentL163-L174Deploying and scaling Azure AI Speech: Docker/Kubernetes containers, on-prem STT/TTS, custom speech models/endpoints, language ID, and batch/long-form synthesis workflows.

Troubleshooting

Best Practices

TopicURL
Create high-quality human-labeled speech transcriptionshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-human-labeled-transcriptions
Prepare training data for professional custom voicehttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-voice-training-data
Apply best practices to reduce Speech synthesis latencyhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-lower-speech-synthesis-latency
Track and manage Azure Speech SDK memory usagehttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-track-speech-sdk-memory-usage
Handle user interruptions and chat truncation in Voice Livehttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-voice-live-auto-truncation
Use interim responses to reduce Voice Live latencyhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-voice-live-interim-response
Configure proactive greetings for Voice Live agentshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-voice-live-proactive-messages
Improve speech recognition with phrase listshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/improve-accuracy-phrase-list
Apply keyword recognition design and accuracy guidelineshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/keyword-recognition-guidelines
Use language identification efficiently in speech appshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-identification
Record high-quality samples for custom voice traininghttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/record-custom-voice-samples
Back up and recover custom Speech and Voice resourceshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/resiliency-and-recovery-plan
Design microphone arrays optimized for Speech SDKhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-sdk-microphone

Decision Making

TopicURL
Plan large-scale transcription with batch processinghttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription
Evaluate custom voice lite before professional voicehttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-neural-voice-lite
Choose Embedded Speech for offline and hybrid scenarioshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/embedded-speech
Evaluate device suitability for embedded speech modelshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/embedded-speech-performance-evaluations
Decide when and how to use fast transcription APIhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/fast-transcription-create
Evaluate and compare custom speech model accuracyhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-inspect-data
Train custom speech models and understand cost behaviorhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-train-model
Migrate Speech to text REST API from v3.2 to 2024-11-15https://learn.microsoft.com/en-us/azure/ai-services/speech-service/migrate-2024-11-15
Migrate Speech-to-text REST from 2024-11-15 to 2025-10-15https://learn.microsoft.com/en-us/azure/ai-services/speech-service/migrate-2025-10-15
Migrate from retired Speech intent recognition to Language or OpenAIhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/migrate-intent-recognition
Migrate from Long Audio API to Batch synthesishttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/migrate-to-batch-synthesis
Migrate from v3 text-to-speech to custom voice REST APIhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/migrate-to-custom-voice-api
Migrate Speech-to-text REST from v3.0 to v3.1https://learn.microsoft.com/en-us/azure/ai-services/speech-service/migrate-v3-0-to-v3-1
Migrate Speech-to-text REST from v3.1 to v3.2https://learn.microsoft.com/en-us/azure/ai-services/speech-service/migrate-v3-1-to-v3-2
Assess capabilities and regions for personal voicehttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/personal-voice-overview
Decide when to use Whisper for speech taskshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/whisper-overview

Limits & Quotas

Security

Configuration

TopicURL
Configure Microsoft Audio Stack in Speech SDKhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/audio-processing-speech-sdk
Configure Batch synthesis properties for text-to-speechhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-synthesis-properties
Configure audio data locations for batch transcriptionhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription-audio-data
Create and submit Azure Speech batch transcription jobshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription-create
Check status and retrieve batch transcription resultshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription-get
Configure BYOS storage for Speech to texthttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/bring-your-own-storage-speech-resource-speech-to-text
Define UPS phonetic pronunciations for Speech to texthttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/customize-pronunciation
Configure OpenSSL on Linux for Azure Speech SDKhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-configure-openssl-linux
Control and monitor Speech SDK service connectionshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-control-connections
Create and manage custom speech fine-tuning projectshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-create-project
Prepare and upload datasets for custom speech traininghttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-upload-data
Configure real-time speech recognition inputs and optionshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-recognize-speech
Select and configure audio input devices in Speech SDKhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-select-audio-input-devices
Use visemes for facial animation with Speech servicehttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis-viseme
Configure Speech SDK audio input streamshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-use-audio-input-streams
Configure compressed audio input for Speech SDK and CLIhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-use-codec-compressed-audio-input-streams
Enable and configure Speech SDK diagnostic logginghttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-use-logging
Check Azure Speech language and voice availabilityhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
Configure audio and transcription logging for Speech recognitionhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/logging-audio-transcription
Upload and validate training datasets for professional voicehttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/professional-voice-create-training-set
Use Azure Speech regional endpoints and data residencyhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/regions
Configure Speech containers storage, logging, and securityhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-container-configuration
Use Speech phonetic alphabets and IPA in SSMLhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-ssml-phonetic-sets
Control speech output using SSML configurationhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup
Configure pronunciation with SSML phonemes and lexiconshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-pronunciation
Structure SSML documents and events for Speechhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-structure
Configure voice and sound using SSML in Speechhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-voice
Configure Speech CLI datastore search order and fileshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/spx-data-store-configuration
Configure output destinations for Speech CLI resultshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/spx-output-options
Configure batch synthesis properties for TTS avatarshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties
Reference Voice Live API events, models, and settings (2025-10-01)https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-api-reference-2025-10-01
Reference Voice Live API events and settings (2026-01-01-preview)https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-api-reference-2026-01-01-preview
Customize Voice Live models and performance settingshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-how-to-customize

Integrations & Coding Patterns

TopicURL
Integrate Speech service with call center telephonyhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/call-center-telephony-integration
Use Speech SDK APIs to handle recognition resultshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-speech-recognition-results
Integrate custom models with Voice Live BYOMhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-bring-your-own-model
Implement text-to-speech synthesis with Speech SDKhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis
Implement speech translation with Azure Speech SDKhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-translate-speech
Build real-time voice agents with Voice Live and Foundry Agent Servicehttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-voice-agent-integration
Implement function calling with Voice Live APIhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-voice-live-function-calling
Use LLM-powered speech API for transcription and translationhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/llm-speech
Integrate Azure Speech with Azure OpenAI chathttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/openai-speech
Add and manage user consent for personal voicehttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/personal-voice-create-consent
Create personal voice projects via Custom Voice APIhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/personal-voice-create-project
Integrate batch transcription with Power Automate and Logic Appshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/power-automate-batch-transcription
Integrate with Speech-to-text REST API 2025-10-15https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-speech-to-text
Call Text-to-speech REST API for voice synthesishttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to-speech
Generate Speech service REST clients from Swaggerhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/swagger-documentation
Control text to speech avatar gestures with SSMLhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/avatar-gestures-with-ssml
Use Voice Live WebSocket events and propertieshttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-how-to
Integrate Voice Live with telephony using Call Center Acceleratorhttps://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-telephony

Deployment

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

azure-security

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

azure-architecture

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

azure-logic-apps

No summary provided by upstream source.

Repository SourceNeeds Review