Voice Bridge Light

# Voice Bridge Light

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Voice Bridge Light" with this command: npx skills add fangbb-coder/voice-bridge-light

Voice Bridge Light

Lightweight offline voice bridging service providing OpenAI-compatible STT/TTS HTTP API.

Features

  • TTS Text-to-Speech: Supports Edge TTS (online) and Piper (local)
  • STT Speech Recognition: Based on Whisper local recognition
  • OpenAI Compatible API: Compatible with OpenAI Audio API
  • Lightweight Deployment: Minimal dependencies, easy to install

Usage

Installation

pip install -r requirements.txt

Start Service

Default using Edge TTS:

python api_server.py

Using Piper (model required):

TTS_ENGINE=piper PIPER_MODEL=models/piper/zh_CN-huayan-medium.onnx python api_server.py

API Endpoints

EndpointMethodDescription
GET /healthGETHealth check
POST /audio/speechPOSTTTS speech synthesis
POST /audio/transcriptionsPOSTSTT speech recognition

Configuration Environment Variables

VariableDefaultDescription
VOICE_BRIDGE_HOST0.0.0.0Listen address
VOICE_BRIDGE_PORT18790Listen port
TTS_ENGINEedgeTTS engine: edge or piper
EDGE_VOICEzh-CN-XiaoxiaoNeuralEdge TTS voice
PIPER_MODELmodels/piper/zh_CN-huayan-medium.onnxPiper model path
STT_MODELbaseWhisper model size

TTS Request Example

curl -X POST http://localhost:18790/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, world!",
    "voice": "zh-CN-XiaoxiaoNeural",
    "response_format": "mp3"
  }' \
  --output speech.mp3

STT Request Example

curl -X POST http://localhost:18790/audio/transcriptions \
  -F "file=@speech.mp3" \
  -H "Content-Type: multipart/form-data"

OpenClaw Integration

Configure in openclaw.json:

{
  "tts": {
    "enabled": true,
    "provider": "local-piper",
    "baseUrl": "http://127.0.0.1:18790",
    "apiKey": "local",
    "voice": "zh-CN-XiaoxiaoNeural"
  }
}

Dependencies

  • Python 3.8+
  • edge-tts (Edge TTS)
  • faster-whisper (Whisper STT)
  • soundfile (audio processing)
  • Flask + Flask-CORS (web service)

Service Management

systemd Service (Recommended)

[Unit]
Description=Voice Bridge Light - STT/TTS HTTP API
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/root/.openclaw/workspace/skills/voice-bridge-light
ExecStart=/usr/bin/python3 api_server.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Enable and start:

systemctl daemon-reload
systemctl enable voice-bridge-light.service
systemctl start voice-bridge-light.service

Performance

  • TTS latency: < 1s (Edge TTS requires network)
  • STT latency: depends on audio length, real-time CPU
  • Memory usage: ~300-500MB (mainly from Whisper model)

Notes

  • Edge TTS requires internet access to Microsoft services
  • Piper requires downloading model files (first use)
  • Whisper model loads slowly on first run, recommend warm-up
  • Production environment recommended to use systemd management

License

MIT

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Cult Of Carcinization

Give your agent a voice — and ears. The Cult of Carcinization is the bot-first gateway to ScrappyLabs TTS and STT. Speak with 20+ voices, design your own from a text description, transcribe audio to text, and evolve into a permanent bot identity. No human signup required.

Registry Source
1.9K3Profile unavailable
General

Feishu Voice Loop

Accept text or voice input, transcribe if needed, generate natural OpenAI TTS speech, and send audio output to Feishu chat or web player.

Registry SourceRecently Updated
3720Profile unavailable
General

MLX Audio Server

Local 24x7 OpenAI-compatible API server for STT/TTS, powered by MLX on your Mac.

Registry SourceRecently Updated
2.6K0Profile unavailable
General

Kesha Voice Kit

Offline voice toolkit for speech-to-text, text-to-speech, and language detection supporting 25 languages with no API keys or cloud usage.

Registry SourceRecently Updated
1770Profile unavailable