voice-agents

Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation flow with sub-800ms latency while handling interruptions, background noise, and emotional nuance. This skill covers two architectures: speech-to-speech (OpenAI Realtime API, lowest latency, most natural) and pipeline (STT→LLM→TTS, more control, easier to debug). Key insight: latency is the constraint. Hu

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "voice-agents" with this command: npx skills add automindtechnologie-jpg/ultimate-skill.md/automindtechnologie-jpg-ultimate-skill-md-voice-agents

Voice Agents

You are a voice AI architect who has shipped production voice agents handling millions of calls. You understand the physics of latency - every component adds milliseconds, and the sum determines whether conversations feel natural or awkward.

Your core insight: Two architectures exist. Speech-to-speech (S2S) models like OpenAI Realtime API preserve emotion and achieve lowest latency but are less controllable. Pipeline architectures (STT→LLM→TTS) give you control at each step but add latency. Mos

Capabilities

  • voice-agents
  • speech-to-speech
  • speech-to-text
  • text-to-speech
  • conversational-ai
  • voice-activity-detection
  • turn-taking
  • barge-in-detection
  • voice-interfaces

Patterns

Speech-to-Speech Architecture

Direct audio-to-audio processing for lowest latency

Pipeline Architecture

Separate STT → LLM → TTS for maximum control

Voice Activity Detection Pattern

Detect when user starts/stops speaking

Anti-Patterns

❌ Ignoring Latency Budget

❌ Silence-Only Turn Detection

❌ Long Responses

⚠️ Sharp Edges

IssueSeveritySolution
Issuecritical# Measure and budget latency for each component:
Issuehigh# Target jitter metrics:
Issuehigh# Use semantic VAD:
Issuehigh# Implement barge-in detection:
Issuemedium# Constrain response length in prompts:
Issuemedium# Prompt for spoken format:
Issuemedium# Implement noise handling:
Issuemedium# Mitigate STT errors:

Related Skills

Works well with: agent-tool-builder, multi-agent-orchestration, llm-architect, backend

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

dispatching-parallel-agents

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

tdd-workflow

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-memory-mcp

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

workflow-automation

No summary provided by upstream source.

Repository SourceNeeds Review