Voice Calls (OpenAI Realtime)
Make real phone calls through your OpenClaw agent. Ask it to book a restaurant, check store hours, schedule an appointment — it dials the number, handles the conversation, and reports back with structured results.
Uses OpenAI's Realtime API for single-model speech-to-speech with ~200-300ms response latency. No separate STT or TTS — one model does it all.
Setup
This skill requires a Twilio account and an OpenAI API key with Realtime API access.
-
Set your credentials in the plugin config (via OpenClaw settings or
openclaw.json):twilio.accountSid— your Twilio Account SIDtwilio.authToken— your Twilio Auth TokenfromNumber— a Twilio voice-capable phone number (E.164 format, e.g.+17075551234)openai.apiKey— your OpenAI API keypublicUrl— a public HTTPS origin that routes to the plugin's server (port 3335 by default). Must not be localhost/private/internal.
-
Set up a tunnel (Cloudflare Tunnel, ngrok, Tailscale Funnel, etc.) so Twilio can reach the webhook server.
-
Verify setup:
openclaw voicecall-rt status
Usage
Just tell your agent what to call and why:
"Call Tony's Pizza at +14155551234 and reserve a table for 4 on Friday at 7pm"
"Call the barbershop at +14155559876 and book a haircut for Saturday morning"
"Call +14155550000 and ask if they have the iPhone 16 Pro in stock"
The agent writes a system prompt for the voice AI, dials the number, and the voice AI handles the conversation autonomously — including navigating phone menus (DTMF), detecting voicemail, and reporting the outcome. The plugin wraps prompts with safety guardrails and blocks deceptive identity behavior.
CLI
openclaw voicecall-rt call -n +14155551234 -t "Check store hours"
openclaw voicecall-rt status
openclaw voicecall-rt active
Inbound calls
Optionally receive calls by enabling inbound.enabled and setting a policy (open or allowlist). Disabled by default.
Cost
$0.31/min total ($0.06 OpenAI input + ~$0.24 OpenAI output + ~$0.014 Twilio). A typical 5-minute call costs ~$1.55.
Notes
- The voice AI waits for the callee to speak before talking ("listen first") — no awkward overlap on pickup.
- Server binds to
127.0.0.1by default. Only exposed via your tunnel. - Max 5 concurrent calls by default (configurable via
calls.maxConcurrent). - Debug mode (
debug: true) enables call recording, verbose logging, and latency metrics; recordings/transcripts may contain sensitive data.