SenseAudio ASR
Use this skill for all SenseAudio speech recognition tasks.
Credential source: read the API key from SENSEAUDIO_API_KEY and send it only in the Authorization: Bearer ... header.
Do not place API keys in query parameters, logs, transcripts, or saved examples.
Read First
references/asr.md
Workflow
- Pick recognition mode:
- HTTP file transcription for offline audio.
- WebSocket for realtime streaming microphone/audio chunks.
- Audio analysis for noise and quality checks before recognition.
- Records query for recent recognition history lookup.
- Choose model by feature needs:
- Lite for low-cost basic transcription.
- ASR for streaming, translation, diarization, sentiment, and timestamps.
- Pro when diarization plus explicit
max_speakerscontrol is needed. - DeepThink for streaming, translation, and intelligent editing; do not send
language, diarization, sentiment, timestamps, ITN, or punctuation controls.
- Build minimal request:
- Required auth, file/audio format, model.
- Add optional controls only when needed.
- Keep uploaded files at or below 10MB; split longer audio before sending.
- Validate compatibility:
- Check model-parameter support before sending.
- Enforce WS
pcm/16000Hz/ mono requirements. - For HTTP
stream=true, expect SSE text deltas only, not structured verbose fields.
- Parse robustly:
- Handle JSON/text/verbose/SSE forms.
- Handle WS terminal events and failures.
- Treat returned
audioURLs,api_key,session_id, andtrace_idas sensitive operational data.