SenseAudio Video Narrator
Create professional narration audio for videos with timing-aware segmentation, natural delivery, and editor-friendly exports.
What This Skill Does
- Generate narration audio synchronized to script timestamps
- Match narration style to video genre such as documentary or tutorial
- Control pacing with official TTS parameters and text break markers
- Create multiple narration takes with different voices or styles
- Export audio segments and merged narration tracks for editing workflows
Credential and Dependency Rules
- Read the API key from
SENSEAUDIO_API_KEY. - Send auth only as
Authorization: Bearer <API_KEY>. - Do not place API keys in query parameters, logs, or saved examples.
- If Python helpers are used, this skill expects
python3,requests, andpydub. pydubis used only for optional local audio assembly and mixing.
Official TTS Constraints
Use the official SenseAudio TTS rules summarized below:
- HTTP endpoint:
POST https://api.senseaudio.cn/v1/t2a_v2 - Model:
SenseAudio-TTS-1.0 - Max text length per request:
10000characters voice_setting.voice_idis requiredvoice_setting.speedrange:0.5-2.0voice_setting.pitchrange:-12to12- Optional audio formats:
mp3,wav,pcm,flac - Optional sample rates:
8000,16000,22050,24000,32000,44100 - Optional MP3 bitrates:
32000,64000,128000,256000 - Optional channels:
1or2 extra_info.audio_lengthreturns segment duration in milliseconds- Inline break markup such as
<break time=500>is supported in text
Recommended Workflow
- Prepare the script:
- Split narration into timestamped segments.
- Keep each segment comfortably below the
10000character limit.
- Choose a voice and pacing profile:
- Pick a
voice_idand tunespeed,pitch, and optionalvol. - Use shorter segments when timing precision matters.
- Generate audio segments:
- Call the TTS API for each segment.
- Decode
data.audiofrom hex before saving. - Capture
extra_info.audio_lengthfor timeline metadata.
- Assemble the narration track locally:
- Use
pydubto position clips on a silent master track. - Keep per-segment files for easier editor import and retiming.
- Validate timing against the video:
- Leave small gaps when natural pacing is needed.
- Adjust segment boundaries instead of overusing extreme speed values.
Minimal Timed Narration Helper
import binascii
import os
import re
import requests
API_KEY = os.environ["SENSEAUDIO_API_KEY"]
API_URL = "https://api.senseaudio.cn/v1/t2a_v2"
def parse_timed_script(script):
pattern = r"\[(\d{2}):(\d{2}):(\d{2})\]\s*(.+?)(?=\n\[|\Z)"
segments = []
for match in re.finditer(pattern, script, re.DOTALL):
hours, minutes, seconds, text = match.groups()
timestamp_ms = (int(hours) * 3600 + int(minutes) * 60 + int(seconds)) * 1000
segments.append({"timestamp": timestamp_ms, "text": text.strip()})
return segments
def synthesize_segment(text, voice_id, speed=1.0, pitch=0, vol=1.0):
response = requests.post(
API_URL,
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "SenseAudio-TTS-1.0",
"text": text,
"stream": False,
"voice_setting": {
"voice_id": voice_id,
"speed": speed,
"pitch": pitch,
"vol": vol,
},
"audio_setting": {
"format": "mp3",
"sample_rate": 32000,
"bitrate": 128000,
"channel": 2,
},
},
timeout=60,
)
response.raise_for_status()
data = response.json()
return {
"audio_bytes": binascii.unhexlify(data["data"]["audio"]),
"duration_ms": data["extra_info"]["audio_length"],
"trace_id": data.get("trace_id"),
}
Local Assembly Pattern
from pydub import AudioSegment
def create_synced_narration(audio_segments, video_duration_ms):
narration_track = AudioSegment.silent(duration=video_duration_ms)
for segment in audio_segments:
clip = AudioSegment.from_file(segment["file"])
narration_track = narration_track.overlay(clip, position=segment["timestamp"])
return narration_track
Style Presets
- Documentary: slower
speedsuch as0.95, neutralpitch - Tutorial:
speednear1.0, slightly warmerpitch - Commercial: modestly faster
speed, slightly higherpitch
Prefer conservative tuning and script editing over extreme voice parameter changes.
Output Options
- Per-segment narration clips in
mp3orwav - Timing metadata in
json - Merged narration track for video editors
- Optional alternate takes with different styles
Safety Notes
- Do not hardcode credentials.
- Do not assume local media tooling exists beyond what is declared here.
- Treat returned
trace_idand generated narration assets as potentially sensitive production data.