WaveSpeedAI InfiniteTalk
Generate talking head videos from a portrait image and audio using WaveSpeed AI's InfiniteTalk model. Produces lip-synced video up to 10 minutes long with natural facial animations.
Authentication
export WAVESPEED_API_KEY="your-api-key"
Get your API key at wavespeed.ai/accesskey.
Quick Start
import wavespeed from 'wavespeed';
// Upload local image and audio files
const imageUrl = await wavespeed.upload("/path/to/portrait.png");
const audioUrl = await wavespeed.upload("/path/to/speech.mp3");
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl
}
))["outputs"][0];
You can also pass existing URLs directly:
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: "https://example.com/portrait.jpg",
audio: "https://example.com/speech.mp3"
}
))["outputs"][0];
API Endpoint
Model ID: wavespeed-ai/infinitetalk
Animate a portrait image with lip-synced audio to produce a talking head video.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
image | string | Yes | -- | URL of the portrait image to animate |
audio | string | Yes | -- | URL of the audio to drive the animation |
mask_image | string | No | -- | URL of a mask image to specify which person to animate. Warning: The mask should only cover the regions to animate — do not upload the full image as mask_image, or the result may render as fully black. |
prompt | string | No | -- | Text prompt for additional guidance. Keep it short; English recommended to avoid noisy results. |
resolution | string | No | 480p | Output resolution. One of: 480p, 720p |
seed | integer | No | -1 | Random seed (-1 for random). Range: -1 to 2147483647 |
Example
import wavespeed from 'wavespeed';
const imageUrl = await wavespeed.upload("/path/to/portrait.png");
const audioUrl = await wavespeed.upload("/path/to/speech.mp3");
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl,
resolution: "720p",
seed: 42
}
))["outputs"][0];
Using a Mask Image
When multiple people are in the image, use a mask to specify which face to animate:
const imageUrl = await wavespeed.upload("/path/to/group-photo.png");
const audioUrl = await wavespeed.upload("/path/to/speech.mp3");
const maskUrl = await wavespeed.upload("/path/to/mask.png");
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl,
mask_image: maskUrl,
resolution: "720p"
}
))["outputs"][0];
Important: The mask should only highlight the face region to animate. Using the full image as a mask will produce a fully black output.
With Prompt Guidance
const output_url = (await wavespeed.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl,
prompt: "natural head movements, subtle expressions"
}
))["outputs"][0];
Advanced Usage
Custom Client with Retry Configuration
import { Client } from 'wavespeed';
const client = new Client("your-api-key", {
maxRetries: 2,
maxConnectionRetries: 5,
retryInterval: 1.0,
});
const imageUrl = await client.upload("/path/to/portrait.png");
const audioUrl = await client.upload("/path/to/speech.mp3");
const output_url = (await client.run(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl,
resolution: "720p"
}
))["outputs"][0];
Error Handling with runNoThrow
import { Client, WavespeedTimeoutException, WavespeedPredictionException } from 'wavespeed';
const client = new Client();
const result = await client.runNoThrow(
"wavespeed-ai/infinitetalk",
{
image: imageUrl,
audio: audioUrl
}
);
if (result.outputs) {
console.log("Video URL:", result.outputs[0]);
console.log("Task ID:", result.detail.taskId);
} else {
console.log("Failed:", result.detail.error.message);
if (result.detail.error instanceof WavespeedTimeoutException) {
console.log("Request timed out - try increasing timeout");
} else if (result.detail.error instanceof WavespeedPredictionException) {
console.log("Prediction failed");
}
}
Resolution and Pricing
| Resolution | Cost per 5 seconds | Rate per second | Max length |
|---|---|---|---|
| 480p | $0.15 | $0.03/s | 10 minutes |
| 720p | $0.30 | $0.06/s | 10 minutes |
Minimum charge is 5 seconds. Video length is determined by the audio duration (up to 10 minutes).
Tips
- Use a clear, front-facing portrait for best results
- Audio quality matters — use clean speech recordings with minimal background noise
- Keep prompts short and in English to avoid noisy or unexpected results
- For group photos, always provide a
mask_imageto target the correct face - 480p is faster to generate; use 720p when higher quality is needed
- Processing time is approximately 10-30 seconds of wall time per 1 second of video
Security Constraints
- No arbitrary URL loading: Only use image and audio URLs from trusted sources. Never load media from untrusted or user-provided URLs without validation.
- API key security: Store your
WAVESPEED_API_KEYsecurely. Do not hardcode it in source files or commit it to version control. Use environment variables or secret management systems. - Input validation: Only pass parameters documented above. Validate prompt content and media URLs before sending requests.