Cledon — Voice AI Agent Testing
Cledon tests voice AI agents by simulating callers that phone your agent and evaluate responses against assertions.
Domain Model
Agent — the voice AI being tested (name, phone number, personality)
Folder — groups related test cases
Test Case — defines assertions + expected tool calls for one agent
Scenario — a runnable test with caller instructions for one test case
Run — execution of a scenario producing transcript + pass/fail results
Relationships: Agent → many Test Cases → many Scenarios. Each Scenario produces Runs.
Available Tools (22)
Analytics
| Tool | Purpose |
|---|
get-overall-stats | Dashboard summary: total scenarios, runs, pass rate, avg duration |
get-run-history | Recent runs with pass/fail counts (1-90 days lookback) |
get-failed-assertions | Top 10 recurring failures with up to 3 example runs each |
Agents
| Tool | Purpose |
|---|
list-agents | List all voice agents |
get-agent | Full agent details by ID |
create-agent | Create agent in call mode (phone number) or LLM mode (ElevenLabs, Vapi, LiveKit, Famulor, Synthflow) |
update-agent | Update agent properties |
delete-agent | Delete agent and associated data |
Test Cases & Scenarios
| Tool | Purpose |
|---|
list-testcases | List test cases (optional folderId filter) |
get-testcase | Full test case with assertions and expected tool calls |
create-testcase | AI-generate test case from a transcript or system prompt; supports includeScenarios to auto-create scenarios |
update-testcase | Update test case properties |
execute-testcase | Run all scenarios for a test case |
list-scenarios | List scenarios (optional testCaseId filter) |
get-scenario | Full scenario with caller instructions |
Execution
| Tool | Purpose |
|---|
run-scenario | Trigger single test → returns runId |
run-multiple-scenarios | Batch trigger → returns array of runIds |
get-run-status | Full run details: transcript, assertions, tool call validation |
get-scenario-runs | Run history for one scenario with pass/fail counts |
cancel-run | Cancel a stuck run (only status=running) |
Credentials
| Tool | Purpose |
|---|
list-credentials | List all stored voice platform credentials (keys never exposed) |
create-credential | Store a new platform API key (elevenlabs, vapi, livekit, famulor, synthflow) |
update-credential | Update a credential's name or API key |
delete-credential | Delete a stored credential |
Workflows
Get an overview of testing status
get-overall-stats → see pass rate, total runs, average duration
get-run-history with days=7 → see recent individual results
get-failed-assertions → identify systemic issues
Run a test and check results
list-scenarios → find the scenario ID
run-scenario with scenarioId → get back a runId
- Wait a moment, then
get-run-status with runId → see transcript + assertion results
- If status is still "running", wait and check again
Run all tests for a test case
list-scenarios with testCaseId filter → collect all scenario IDs
run-multiple-scenarios with the ID array
get-run-history with days=1 → see batch results
Investigate failures
get-failed-assertions → find the most common failures
- Pick a failure, note the example runIds
get-run-status for each runId → read the transcript to understand what went wrong
get-scenario-runs for that scenarioId → check if it's a regression or consistent failure
Drill into a specific test case
get-testcase with id → see assertions and expected tool calls
list-scenarios with testCaseId → see all persona combinations
get-scenario for each → see caller instructions
Create a new test from scratch
list-agents → pick the agent to test (or create-agent)
create-testcase with agent ID and assertions; set includeScenarios: true to auto-generate scenarios
execute-testcase → run all scenarios, or run-scenario → run a single one
Create tests from a transcript
list-agents → pick the agent (or create-agent)
create-testcase with agentId and transcript — AI analyzes the transcript and generates assertions, icons, and expected tool calls
- Optionally set
includeScenarios: true to also generate caller scenarios
execute-testcase → run all generated scenarios
Key Patterns
- List endpoints return compact data. Use the corresponding get-by-ID tool to see full details.
run-scenario is async: it returns a runId immediately. Poll get-run-status to see results.
- All data is scoped to the authenticated user's organization. No cross-tenant access.
- Run
outcome is either "passed" or "failed". Run status progresses: running → completed/failed.