deepgram-voice-workflow

End-to-end voice workflow with Deepgram STT and TTS. Use when transcribing voice messages, generating spoken replies, or building a shell-based audio pipeline that turns input audio into text and optionally returns an MP3 reply. Especially useful for Telegram/QQ/OneBot voice-message automation and Chinese speech workflows.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "deepgram-voice-workflow" with this command: npx skills add MengBad/deepgram-voice-workflow

Deepgram Voice Workflow

Overview

Use this skill for a complete speech workflow:

  1. transcribe audio to text with Deepgram STT
  2. optionally synthesize a spoken reply with Deepgram TTS
  3. return structured outputs that can feed chat or agent pipelines

This skill is the right choice when the task is broader than plain transcription and needs an input-audio to output-audio pipeline.

Quick Start

Transcribe only

{baseDir}/scripts/deepgram-transcribe.sh /path/to/audio.ogg

Generate speech from text

{baseDir}/scripts/deepgram-tts.sh "你好,我是 Neko。"

Run the full pipeline

{baseDir}/scripts/neko-voice-pipeline.sh /path/to/audio.ogg --reply "收到啦,这是语音回复测试。"

Environment

Set DEEPGRAM_API_KEY before use.

The bundled scripts also fall back to reading it from:

  • /root/.openclaw/.env

Workflow Decision

Use deepgram-transcribe.sh when

  • only text transcription is needed
  • the downstream system will generate its own reply
  • the task is speech-to-text only

Use deepgram-tts.sh when

  • text already exists
  • only an MP3 spoken response is needed
  • the workflow is text-to-speech only

Use neko-voice-pipeline.sh when

  • the task begins with an audio file
  • a transcript is needed
  • an optional spoken reply should be generated in the same flow

Outputs

STT output

deepgram-transcribe.sh writes:

  • transcript text file
  • raw API JSON file next to it

TTS output

deepgram-tts.sh writes:

  • MP3 output file

Pipeline output

neko-voice-pipeline.sh prints JSON with:

  • out_dir
  • transcript_path
  • transcript
  • reply_audio_path

This makes it easy to wire into scripts or adapters.

Typical Uses

Prefer this skill for:

  • transcribing Telegram/QQ/OneBot voice messages
  • generating MP3 replies to short voice prompts
  • building bot-side voice input/output automation
  • testing speech pipelines from shell without introducing a full SDK

Notes

  • Defaults are tuned for lightweight practical use, not maximal configurability.
  • deepgram-transcribe.sh defaults to model=nova-2 and language=zh.
  • deepgram-tts.sh defaults to model=aura-2-luna-en; override the model when a different voice is preferred.
  • Inspect the raw JSON transcript response when debugging recognition quality or API errors.

References

Read these files when needed:

  • references/stt-notes.md for transcription details
  • references/tts-notes.md for speech synthesis details
  • references/pipeline-notes.md for end-to-end pipeline behavior

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Ai Agent Builder

快速构建和部署支持多工具集成与记忆管理的自定义 AI Agent,适用于客服、数据采集和研究自动化。

Registry SourceRecently Updated
Automation

GolemedIn MCP

Discover AI agents, manage agent profiles, post updates, search jobs, and message other agents on GolemedIn — the open agent registry.

Registry SourceRecently Updated
Automation

Agent HQ

Deploy the Agent HQ mission-control stack (Express + React + Telegram notifier / Jarvis summary) so other Clawdbot teams can spin up the same board, high-priority watcher, and alert automation. Includes setup, telemetry, and automation hooks.

Registry SourceRecently Updated
41.1K
Profile unavailable