WebChat Voice GUI
Voice input GUI for OpenClaw WebChat Control UI:
- Mic button with idle/recording/processing states
- Real-time VU meter: button shadow/scale reacts to voice level
- Push-to-Talk: hold mic button to record, release to send (default mode)
- Toggle mode: click to start, click to stop (switch via double-click on mic button)
- Keyboard shortcuts:
Ctrl+SpacePush-to-Talk,Ctrl+Shift+Mstart/stop continuous recording,Ctrl+Shift+Blive transcription [beta] - Localized UI: auto-detects browser language (English, German, Chinese built-in), customizable
- Gateway startup hook re-injects script after
openclaw update
Prerequisites
webchat-https-proxy— HTTPS/WSS reverse proxy must be deployed and running.faster-whisper-local-service— Local STT backend on port 18790.
Verify:
systemctl --user is-active openclaw-voice-https.service
systemctl --user is-active openclaw-transcribe.service
Deploy
bash scripts/deploy.sh
With language override:
VOICE_LANG=de bash scripts/deploy.sh
When run interactively without VOICE_LANG, the script will ask you to choose a UI language.
This script is idempotent.
Quick verify
bash scripts/status.sh
Security Notes
Client-side JS (voice-input.js)
- No dynamic code execution: No
eval(),new Function(), orinnerHTMLwith user data. - HTTPS-first: Transcription requests use same-origin
/transcribewhen served over HTTPS. Only falls back tohttp://127.0.0.1:18790in local dev. - No external servers: Audio is never sent outside the local machine.
- Auth forwarding: Bearer token from Control UI is forwarded to
/transcribeproxy. - Uses
textContentfor all toast messages (no XSS vector). - Bounded memory: Continuous recording mode enforces a 120-chunk limit (~2 minutes), preventing unbounded memory growth.
Deployment scripts
- Language input validated:
VOICE_LANGmust match^([a-zA-Z]{2,5}(-[a-zA-Z]{2,5})?|auto)$— prevents injection via sed. - Robust path detection: All scripts validate Control UI directory exists before modifying files.
- Gateway hook: Uses
execFileSyncwith array args — no shell interpolation. Script path derived from__dirname, not user input. - Idempotent: All scripts safe to run repeatedly.
No data exfiltration
- No outbound network calls from JS or scripts.
- No telemetry, analytics, or tracking.
What this skill modifies
| What | Path | Action |
|---|---|---|
| Control UI HTML | <npm-global>/openclaw/dist/control-ui/index.html | Adds <script> tag for voice-input.js |
| Control UI asset | <npm-global>/openclaw/dist/control-ui/assets/voice-input.js | Copies mic button JS |
| Gateway hook | ~/.openclaw/hooks/voice-input-inject/ | Installs startup hook that re-injects JS after updates |
| Workspace files | ~/.openclaw/workspace/voice-input/ | Copies voice-input.js, i18n.json |
Mic Button Controls
| Action | Effect |
|---|---|
| Hold (PTT mode) | Record while held, transcribe on release |
| Click (Toggle mode) | Start recording / stop and transcribe |
| Double-click | Switch between PTT and Toggle mode |
| Right-click | Toggle beep sound on/off |
| Ctrl+Space (hold) | Push-to-Talk via keyboard |
| Ctrl+Shift+M | Start/stop recording |
| Ctrl+Shift+B | Start/stop live transcription [beta] |
Language / i18n
Auto-detects browser language. Built-in: English (en), German (de), Chinese (zh).
Override in browser console:
localStorage.setItem('oc-voice-lang', 'de'); // force German
localStorage.removeItem('oc-voice-lang'); // back to auto-detect
See assets/i18n.json for all translation keys.
Uninstall
bash scripts/uninstall.sh
This removes the UI injection, hook, and workspace files. Does not touch the HTTPS proxy or faster-whisper backend — uninstall those separately.