Discord Voice Memo Upgrades - Skill Documentation

Overview

This skill provides a core patch for Moltbot that fixes voice memo TTS auto-replies. The issue occurs when block streaming prevents the final payload from reaching the TTS synthesis pipeline.

Type

Core Patch / Documentation

This is not a traditional plugin that extends functionality - it's a documentation package with patch files for core Clawdbot modifications.

Use Case

Use this if you're experiencing:

Voice memos not triggering TTS responses
TTS working for text messages but not audio messages
TTS auto mode = "inbound" not functioning

Installation Methods

Method 1: Manual Patch (Recommended for Development)

# 1. Locate your clawdbot installation
CLAWDBOT_PATH=$(which clawdbot)
CLAWDBOT_DIR=$(dirname $(dirname $CLAWDBOT_PATH))

# 2. Backup original files
cp $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/dispatch-from-config.js \
   $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/dispatch-from-config.js.backup

cp $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/tts.js \
   $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/tts.js.backup

# 3. Apply patch
cp patch/dispatch-from-config.js $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/
cp patch/tts.js $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/

# 4. Restart clawdbot
clawdbot restart

Method 2: Wait for Upstream

If this patch gets accepted into core Clawdbot, you can simply update:

npm install -g clawdbot@latest

Configuration

No additional configuration needed beyond existing TTS settings. Ensure you have:

{
  "messages": {
    "tts": {
      "auto": "inbound",  // or "always"
      "provider": "openai",  // or "elevenlabs" or "edge"
      "elevenlabs": {
        "apiKey": "your-key-here"
      }
    }
  }
}

How to Test

Configure TTS with auto: "inbound"
Send a voice memo to your bot

Check logs for debug output:

[TTS-DEBUG] inboundAudio=true ttsAutoResolved=inbound ttsWillFire=true
[TTS-APPLY] PASSED all checks, proceeding to textToSpeech
[TTS-SPEECH] ...

Verify bot responds with audio

Debug Logging

The patch includes extensive debug logging. To view:

# Logs will show in your clawdbot console
clawdbot gateway start

Look for:

[TTS-DEBUG] - Shows TTS detection logic
[TTS-APPLY] - Shows TTS payload processing decisions
[TTS-SPEECH] - Shows TTS synthesis attempt

Production Deployment

Important: Before deploying to production, consider:

Remove debug logging - The console.log statements should be removed or made configurable
Test thoroughly - Ensure voice memos work correctly
Monitor performance - Disabling block streaming may impact streaming behavior

To remove debug logging, edit the patched files and remove lines containing:

console.log('[TTS-DEBUG]'
console.log('[TTS-APPLY]'
console.log('[TTS-SPEECH]'

Reverting

If you need to revert the patch:

# Restore backups
CLAWDBOT_PATH=$(which clawdbot)
CLAWDBOT_DIR=$(dirname $(dirname $CLAWDBOT_PATH))

cp $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/dispatch-from-config.js.backup \
   $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/auto-reply/reply/dispatch-from-config.js

cp $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/tts.js.backup \
   $CLAWDBOT_DIR/lib/node_modules/clawdbot/dist/tts/tts.js

clawdbot restart

Technical Details

The Problem

Block streaming is used to send incremental text chunks to the user as they're generated. However, TTS synthesis hooks into the "final" payload type by default. When block streaming is enabled:

Text chunks are sent as "block" payloads
The final assembled text is sent as a "final" payload
But block streaming optimization drops the final payload (text already sent)
TTS never fires because it only processes "final" payloads

The Solution

The patch adds detection logic to identify when TTS should fire:

Inbound message has audio attachment (isInboundAudioContext())
TTS auto mode is "inbound" or "always"
Valid TTS provider and API key configured

When these conditions are met, block streaming is temporarily disabled for that specific reply, ensuring the final payload reaches the TTS pipeline.

Code Flow

dispatchReplyFromConfig()
  ├─ isInboundAudioContext(ctx) → detects audio
  ├─ resolveSessionTtsAuto(ctx, cfg) → gets TTS settings
  ├─ ttsWillFire = conditions met?
  └─ getReplyFromConfig({ disableBlockStreaming: ttsWillFire })
       └─ maybeApplyTtsToPayload() receives final payload
            └─ textToSpeech() synthesizes audio

Compatibility

Clawdbot: 1.0.0+
Node.js: 18+
Platforms: All platforms supported by Clawdbot

Known Issues

Debug logging is verbose (should be removed for production)
Modifies compiled dist files (not source)
May need to reapply after clawdbot updates

Contributing

To improve this patch:

Test with different TTS providers (OpenAI, ElevenLabs, Edge)
Test with different auto modes ("always", "inbound", "tagged")
Suggest optimizations to reduce debug logging overhead
Propose integration into core Clawdbot source

Support

If you encounter issues:

Check logs for [TTS-DEBUG] output
Verify TTS configuration is correct
Ensure API keys are valid
Check that block streaming was actually disabled (disableBlockStreaming: true in logs)

License

Same as Moltbot.