Discord Voice Memo Upgrades - Skill Documentation
Overview
This skill provides a core patch for Moltbot that fixes voice memo TTS auto-replies. The issue occurs when block streaming prevents the final payload from reaching the TTS synthesis pipeline.
Type
Core Patch / Documentation
This is not a traditional plugin that extends functionality - it's a documentation package with patch files for core Clawdbot modifications.
Use Case
Use this if you're experiencing:
- Voice memos not triggering TTS responses
- TTS working for text messages but not audio messages
- TTS auto mode = "inbound" not functioning
Installation Methods
Method 1: Manual Patch (Recommended for Development)
# 1. Locate your clawdbot installation
CLAWDBOT_PATH=$(which clawdbot)
CLAWDBOT_DIR=$(dirname $(dirname $CLAWDBOT_PATH))
# 2. Backup original files
cp $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/dispatch-from-config.js \
$CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/dispatch-from-config.js.backup
cp $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/tts.js \
$CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/tts.js.backup
# 3. Apply patch
cp patch/dispatch-from-config.js $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/
cp patch/tts.js $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/
# 4. Restart clawdbot
clawdbot restart
Method 2: Wait for Upstream
If this patch gets accepted into core Clawdbot, you can simply update:
npm install -g clawdbot@latest
Configuration
No additional configuration needed beyond existing TTS settings. Ensure you have:
{
"messages": {
"tts": {
"auto": "inbound", // or "always"
"provider": "openai", // or "elevenlabs" or "edge"
"elevenlabs": {
"apiKey": "your-key-here"
}
}
}
}
How to Test
- Configure TTS with
auto: "inbound" - Send a voice memo to your bot
- Check logs for debug output:
[TTS-DEBUG] inboundAudio=true ttsAutoResolved=inbound ttsWillFire=true [TTS-APPLY] PASSED all checks, proceeding to textToSpeech [TTS-SPEECH] ... - Verify bot responds with audio
Debug Logging
The patch includes extensive debug logging. To view:
# Logs will show in your clawdbot console
clawdbot gateway start
Look for:
[TTS-DEBUG]- Shows TTS detection logic[TTS-APPLY]- Shows TTS payload processing decisions[TTS-SPEECH]- Shows TTS synthesis attempt
Production Deployment
Important: Before deploying to production, consider:
- Remove debug logging - The
console.logstatements should be removed or made configurable - Test thoroughly - Ensure voice memos work correctly
- Monitor performance - Disabling block streaming may impact streaming behavior
To remove debug logging, edit the patched files and remove lines containing:
console.log('[TTS-DEBUG]'console.log('[TTS-APPLY]'console.log('[TTS-SPEECH]'
Reverting
If you need to revert the patch:
# Restore backups
CLAWDBOT_PATH=$(which clawdbot)
CLAWDBOT_DIR=$(dirname $(dirname $CLAWDBOT_PATH))
cp $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/dispatch-from-config.js.backup \
$CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/dispatch-from-config.js
cp $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/tts.js.backup \
$CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/tts.js
clawdbot restart
Technical Details
The Problem
Block streaming is used to send incremental text chunks to the user as they're generated. However, TTS synthesis hooks into the "final" payload type by default. When block streaming is enabled:
- Text chunks are sent as "block" payloads
- The final assembled text is sent as a "final" payload
- But block streaming optimization drops the final payload (text already sent)
- TTS never fires because it only processes "final" payloads
The Solution
The patch adds detection logic to identify when TTS should fire:
- Inbound message has audio attachment (
isInboundAudioContext()) - TTS auto mode is "inbound" or "always"
- Valid TTS provider and API key configured
When these conditions are met, block streaming is temporarily disabled for that specific reply, ensuring the final payload reaches the TTS pipeline.
Code Flow
dispatchReplyFromConfig()
├─ isInboundAudioContext(ctx) → detects audio
├─ resolveSessionTtsAuto(ctx, cfg) → gets TTS settings
├─ ttsWillFire = conditions met?
└─ getReplyFromConfig({ disableBlockStreaming: ttsWillFire })
└─ maybeApplyTtsToPayload() receives final payload
└─ textToSpeech() synthesizes audio
Compatibility
- Clawdbot: 1.0.0+
- Node.js: 18+
- Platforms: All platforms supported by Clawdbot
Known Issues
- Debug logging is verbose (should be removed for production)
- Modifies compiled dist files (not source)
- May need to reapply after clawdbot updates
Contributing
To improve this patch:
- Test with different TTS providers (OpenAI, ElevenLabs, Edge)
- Test with different auto modes ("always", "inbound", "tagged")
- Suggest optimizations to reduce debug logging overhead
- Propose integration into core Clawdbot source
Support
If you encounter issues:
- Check logs for
[TTS-DEBUG]output - Verify TTS configuration is correct
- Ensure API keys are valid
- Check that block streaming was actually disabled (
disableBlockStreaming: truein logs)
License
Same as Moltbot.