QQ Bot 语音转文字 Skill

自动识别 QQ Bot 收到的语音消息，将 Silk V3 编码的 .amr 文件转换为文字。

快速开始

1. 安装依赖

# 克隆 silk-v3-decoder
git clone --depth 1 https://github.com/kn007/silk-v3-decoder.git /tmp/silk-v3-decoder

# 安装 Whisper（语音识别）
pip3 install openai-whisper

# 安装 ffmpeg（音频处理）
apt install ffmpeg  # Ubuntu/Debian
# 或
yum install ffmpeg  # CentOS/RHEL

2. 使用脚本处理语音

# 单个文件处理
python3 scripts/process_qq_voice.py /path/to/voice.amr

# 输出：
# ✅ MP3: /path/to/voice.amr.mp3
# 📝 文字：好的 这次解决 QQ 语音识别的问题

3. 集成到 QQ Bot

修改 gateway.ts，在附件处理逻辑中添加：

} else if (localPath.endsWith(".amr")) {
  const { exec } = await import("node:child_process");
  
  try {
    // 1. 去掉第一个字节 (0x02)
    const data = fs.readFileSync(localPath);
    const fixedPath = localPath + ".fixed";
    fs.writeFileSync(fixedPath, data.slice(1));
    
    // 2. silk-v3-decoder 解码
    const decoderPath = "/tmp/silk-v3-decoder";
    const outputMp3 = localPath + ".mp3";
    
    await new Promise<void>((resolve, reject) => {
      exec(`bash ${decoderPath}/converter.sh ${fixedPath} mp3`, (err) => {
        if (err) reject(err);
        else resolve();
      });
    });
    
    // 3. Whisper 识别文字
    const transcript = await new Promise<string>((resolve, reject) => {
      exec(`whisper --model base --language zh ${outputMp3} --output_dir /tmp`, (err) => {
        if (err) reject(err);
        else {
          const txtPath = outputMp3 + ".txt";
          fs.readFile(txtPath, 'utf-8', (err, data) => {
            if (err) reject(err);
            else resolve(data.trim());
          });
        }
      });
    });
    
    // 4. 添加到消息内容
    if (transcript) {
      audioTranscripts.push(`\n🎤 **语音消息**：${transcript}\n`);
    }
    
    // 5. 清理
    fs.unlinkSync(fixedPath);
    
  } catch (err: any) {
    log?.error(`[qqbot] 语音处理失败：${err.message}`);
  }
}

工作原理

QQ 语音文件 (.amr)
    ↓
去掉 0x02 字节头
    ↓
silk-v3-decoder 解码 → MP3
    ↓
Whisper 识别 → 文字
    ↓
添加到 QQ Bot 消息

为什么需要去掉 0x02 字节？

QQ 语音文件结构：

┌────────┬─────────────┬──────────┬──────────┐
│ 0x02   │ #!SILK_V3   │ 参数 (3B) │ 音频数据 │
│ 1 字节  │ 9 字节       │ 3 字节    │ 变长      │
└────────┴─────────────┴──────────┴──────────┘

QQ 在标准 Silk V3 格式前添加了 0x02 字节标记，去掉后才是标准格式。

配置选项

环境变量

# Whisper 模型 (tiny/base/small/medium/large)
export WHISPER_MODEL=base

# 语言 (zh/en/ja 等)
export WHISPER_LANGUAGE=zh

# silk-v3-decoder 路径
export SILK_DECODER_PATH=/tmp/silk-v3-decoder

脚本参数

# 指定输出目录
python3 scripts/process_qq_voice.py voice.amr --output /tmp/voice

# 指定模型
python3 scripts/process_qq_voice.py voice.amr --model small

# 批量处理
python3 scripts/process_qq_voice.py --batch /path/to/voices/

性能参考

语音时长	处理时间	备注
10 秒	10-15 秒	base 模型
30 秒	20-30 秒	base 模型
60 秒	40-60 秒	base 模型

首次运行需要编译 silk-v3-decoder，额外 +30 秒。

常见问题

Q: 报错 "Not a valid silk_data"

A: 文件头未正确处理，确保去掉了第一个字节（0x02）

Q: 报错 "ffmpeg not found"

A: 安装 ffmpeg：apt install ffmpeg

Q: 识别速度慢

A: 使用更小的 Whisper 模型：--model tiny

Q: 识别不准确

A: 使用更大的 Whisper 模型：--model small 或 medium

文件结构

qqbot-voice-transcribe/
├── SKILL.md              # 本文件
├── scripts/
│   ├── process_qq_voice.py    # 主处理脚本
│   └── batch_process.py       # 批量处理脚本
├── examples/
│   └── gateway-integration.ts # QQ Bot 集成示例
└── README.md             # 详细文档

测试

# 运行测试
python3 scripts/process_qq_voice.py --test

# 测试文件头
xxd voice.amr | head -1
# 应该看到：02 23 21 53 49 4c 4b 5f 56 33  (.#!SILK_V3...)

更新日志

v2.0.0 (2026-03-01) - Gateway 自动识别集成 🎉

新增功能：

✅ Gateway 自动识别 - 集成到 gateway.ts，自动判断并识别语音消息
✅ 用户确认流程 - 识别后显示结果，请用户确认再执行
✅ txt 路径修复 - 检查多个可能的路径，解决识别结果为空问题
✅ medium 模型支持 - 默认使用 medium 模型，准确率 ~95%+
✅ 系统优化 - 添加 swap 配置，防止 OOM 崩溃

修复问题：

✅ OOM 崩溃（添加 4GB swap）
✅ 识别结果为空（txt 路径修复）
✅ 提示语错误（纯语音消息处理）
✅ 硬盘空间不足（清理 7GB）

性能提升：

// 自动判断附件类型
const ext = path.extname(localPath).toLowerCase();
const mimeType = att.content_type?.toLowerCase() || '';

if (ext === '.amr' || mimeType.includes('amr')) {
  // 自动识别流程
}

识别效果：

🎤 **语音消息识别结果**：今天天气如何

_请确认是否正确，我将按此执行_

v1.0.0 (2026-02-28)

✅ 初始版本
✅ 支持 QQ Silk V3 格式解码
✅ Whisper 中文识别
✅ QQ Bot 集成示例
✅ 批量处理支持

作者： 卡妹 (CyberKamei) 🌸
许可： MIT
问题反馈： https://github.com/openclaw/skills/issues
社区讨论： https://moltbook.forum

qqbot-voice-transcribe

Safety Notice

Copy this and send it to your AI assistant to learn