stt-simple

Local speech-to-text using OpenAI Whisper. Use when the user needs to: (1) transcribe audio files to text, (2) convert voice messages to written content, (3) process recordings in 99+ languages. Supports tiny/base/small/medium/large models. One-command installation with auto model download. Multi-Agent support with session isolation.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "stt-simple" with this command: npx skills add lkisme/sst-simple

STT Simple - Local Speech-to-Text

🎯 触发场景 / Trigger Scenarios

使用此技能当用户需要 / Use this skill when user needs to:

  1. 转录音频文件 / Transcribe audio files - 将 .ogg, .wav, .mp3 等格式转为文字 / Convert .ogg, .wav, .mp3 to text
  2. 处理语音消息 / Process voice messages - WhatsApp/Telegram 语音消息转文字 / Voice message to text
  3. 多语言识别 / Multi-language recognition - 支持中文、英文、日文等 99+ 语言 / Supports Chinese, English, Japanese, etc. (99+ languages)
  4. 批量转录 / Batch transcription - 处理多个音频文件 / Process multiple audio files
  5. 多 Agent 协作 / Multi-Agent collaboration - 多个 Agent 同时转录,输出隔离 / Multiple Agents transcribe simultaneously with isolated outputs

🚀 执行流程 / Execution Flow

1️⃣ 检查安装状态 / Check Installation

/root/.openclaw/venv/stt-simple/bin/whisper --version

如果未安装,先运行安装脚本 / If not installed, run install script first:

/root/.openclaw/workspace/skills/stt-simple/scripts/install.sh

2️⃣ 选择模型 / Model Selection

模型 / Model大小 / Size速度 / Speed精度 / Accuracy推荐场景 / Recommended For
tiny39MB⚡⚡⚡⭐⭐⭐快速测试 / Quick testing
base74MB⚡⚡⭐⭐⭐⭐日常使用 / Daily use
small244MB⭐⭐⭐⭐⭐默认推荐 / Default
medium769MB🐌⭐⭐⭐⭐⭐高精度需求 / High accuracy
large1.5GB🐌🐌⭐⭐⭐⭐⭐+最佳质量 / Best quality

3️⃣ 执行转录 / Execute Transcription

方法 A: 使用 Whisper 命令行 / Use Whisper CLI

/root/.openclaw/venv/stt-simple/bin/whisper <audio_file> --model small --language Chinese

方法 B: 使用 Python 脚本(推荐支持多 Agent) / Use Python Script (Recommended for Multi-Agent)

# Without session isolation / 无会话隔离
/root/.openclaw/venv/stt-simple/bin/python \
  /root/.openclaw/workspace/skills/stt-simple/scripts/stt_simple.py \
  <audio_file> small zh

# With session isolation / 带会话隔离(多 Agent 场景)
/root/.openclaw/venv/stt-simple/bin/python \
  /root/.openclaw/workspace/skills/stt-simple/scripts/stt_simple.py \
  <audio_file> small zh agent-main-whatsapp

4️⃣ 输出位置 / Output Location

  • 文本输出 / Text output: /root/.openclaw/workspace/stt_output/<session_id>/<filename>_<timestamp>.txt
  • JSON 输出 / JSON output: 包含时间戳和置信度(如需)/ Includes timestamps and confidence (if needed)

🌍 语言代码 / Language Codes

语言 / Language代码 / Code别名 / Alias
中文 / ChinesezhChinese
英文 / EnglishenEnglish
日文 / JapanesejaJapanese
韩文 / KoreankoKorean
法文 / FrenchfrFrench
德文 / GermandeGerman
西班牙文 / SpanishesSpanish

自动检测 / Auto-detect: 省略 --language 参数 / Omit --language parameter


📁 输出格式 / Output Formats

  • .txt - 纯文本 / Plain text (default)
  • .json - 完整结果(含时间戳、置信度)/ Full results (with timestamps, confidence)
  • .srt - 字幕格式(视频用)/ Subtitle format (for videos)
  • .vtt - WebVTT(网页用)/ WebVTT (for web)

🔧 故障排查 / Troubleshooting

检查安装 / Check Installation

/root/.openclaw/venv/stt-simple/bin/whisper --version

重新安装 / Reinstall

rm -rf /root/.openclaw/venv/stt-simple
/root/.openclaw/workspace/skills/stt-simple/scripts/install.sh

手动下载模型 / Manual Model Download

/root/.openclaw/venv/stt-simple/bin/python \
  -c "import whisper; whisper.load_model('small')"

📦 资源文件 / Resources

文件 / File路径 / Path用途 / Purpose
安装脚本 / Install scriptscripts/install.sh一键安装虚拟环境、依赖、模型 / One-click install venv, dependencies, models
Python 脚本 / Python scriptscripts/stt_simple.py简化的转录 API,返回 JSON 结果 / Simplified transcription API with JSON output

💡 使用示例 / Examples

示例 1: 转录中文语音消息 / Transcribe Chinese voice message

/root/.openclaw/venv/stt-simple/bin/whisper \
  /root/.openclaw/media/inbound/voice.ogg \
  --model small --language Chinese

示例 2: 转录英文会议录音 / Transcribe English meeting recording

/root/.openclaw/venv/stt-simple/bin/whisper \
  meeting.wav --model medium --language en

示例 3: 使用 Python API 获取 JSON 结果 / Use Python API for JSON output

/root/.openclaw/venv/stt-simple/bin/python \
  /root/.openclaw/workspace/skills/stt-simple/scripts/stt_simple.py \
  audio.ogg small zh

示例 4: 多 Agent 场景 - 带会话隔离 / Multi-Agent with Session Isolation

# Jari (WhatsApp) - outputs to /root/.openclaw/workspace/stt_output/agent-jari-whatsapp/
/root/.openclaw/venv/stt-simple/bin/python \
  /root/.openclaw/workspace/skills/stt-simple/scripts/stt_simple.py \
  voice_a.ogg small zh agent-jari-whatsapp

# Other Agent (Telegram) - outputs to /root/.openclaw/workspace/stt_output/agent-telegram/
/root/.openclaw/venv/stt-simple/bin/python \
  /root/.openclaw/workspace/skills/stt-simple/scripts/stt_simple.py \
  voice_b.ogg small zh agent-telegram

🔐 多 Agent 支持 / Multi-Agent Support

为什么需要会话隔离 / Why Session Isolation?

当多个 Agent 同时使用 STT 功能时:

  • 避免输出文件冲突 / Avoid output file conflicts
  • 每个 Agent 可以追踪自己的转录结果 / Each Agent can track its own transcriptions
  • 便于清理和管理 / Easier cleanup and management

会话 ID 命名建议 / Session ID Naming Suggestions

Agent / 场景推荐 session_id输出目录 / Output Directory
Jari (WhatsApp)agent-jari-whatsappstt_output/agent-jari-whatsapp/
Eric (WhatsApp)agent-eric-whatsappstt_output/agent-eric-whatsapp/
Telegram Agentagent-telegramstt_output/agent-telegram/
临时会话session-<uuid>stt_output/session-<uuid>/
用户专属user-<user_id>stt_output/user-<user_id>/

输出文件命名规则 / Output File Naming

<audio_filename>_<unique_timestamp>.txt

例如 / For example:

  • voice_a_3f8b2c1d.txt
  • meeting_9a4e7f2b.txt

每个文件名包含唯一的时间戳后缀,即使同一音频多次转录也不会覆盖。 Each filename includes a unique timestamp suffix, preventing overwrites even for repeated transcriptions.


⚠️ 注意事项 / Notes

  1. CPU vs GPU: 默认使用 CPU,FP16 会自动降级为 FP32 / Defaults to CPU, FP16 auto-downgrades to FP32
  2. 首次运行 / First run: 首次使用会下载模型(small 约 244MB)/ Downloads model on first use (~244MB for small)
  3. 输出目录 / Output directory: 结果保存在 /root/.openclaw/workspace/stt_output/ / Results saved to /root/.openclaw/workspace/stt_output/
  4. 隐私安全 / Privacy: 所有处理在本地完成,音频文件不会上传 / All processing is local, audio files never uploaded
  5. 多 Agent / Multi-Agent: 建议使用 session_id 参数隔离输出 / Recommended to use session_id parameter for output isolation

🔑 默认配置 / Default Configuration

当前会话标识符 / Current Session ID:

agent-jari-whatsapp

输出目录 / Output Directory:

/root/.openclaw/workspace/stt_output/agent-jari-whatsapp/

快速调用 / Quick Start:

# 转录当前 WhatsApp 语音消息 / Transcribe current WhatsApp voice message
/root/.openclaw/venv/stt-simple/bin/python \
  /root/.openclaw/workspace/skills/stt-simple/scripts/stt_simple.py \
  <audio_file> small zh agent-jari-whatsapp

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Charging Ledger

充电记录账本 - 从截图提取充电信息并记录,支持按周、月查询汇总。**快速暗号**: 充电记录、充电账本、充电汇总。**自然触发**: 记录充电、查询充电费用、充电统计。

Registry SourceRecently Updated
General

qg-skill-sync

从团队 Git 仓库同步最新技能到本机 OpenClaw。支持首次设置、定时自动更新、手动同步和卸载。当用户需要同步技能、设置技能同步、安装或更新团队技能,或提到「技能同步」「同步技能」时使用。

Registry SourceRecently Updated
General

Ad Manager

广告投放管理 - 自动管理广告投放、优化ROI、生成报告。适合:营销人员、电商运营。

Registry SourceRecently Updated