AI Avatar & Talking Head Videos

Create AI avatars and talking head videos via inference.sh CLI.

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

Create avatar video from image + audio

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'

Available Models

Model App ID Best For

OmniHuman 1.5 bytedance/omnihuman-1-5

Multi-character, best quality

OmniHuman 1.0 bytedance/omnihuman-1-0

Single character

Fabric 1.0 falai/fabric-1-0

Image talks with lipsync

PixVerse Lipsync falai/pixverse-lipsync

Highly realistic

Search Avatar Apps

infsh app list --search "omnihuman" infsh app list --search "lipsync" infsh app list --search "fabric"

Examples

OmniHuman 1.5 (Multi-Character)

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'

Supports specifying which character to drive in multi-person images.

Fabric 1.0 (Image Talks)

infsh app run falai/fabric-1-0 --input '{ "image_url": "https://face.jpg", "audio_url": "https://audio.mp3" }'

PixVerse Lipsync

infsh app run falai/pixverse-lipsync --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'

Generates highly realistic lipsync from any audio.

Full Workflow: TTS + Avatar

1. Generate speech from text

infsh app run infsh/kokoro-tts --input '{ "text": "Welcome to our product demo. Today I will show you..." }' > speech.json

2. Create avatar video with the speech

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'

Full Workflow: Dub Video in Another Language

1. Transcribe original video

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

2. Translate text (manually or with an LLM)

3. Generate speech in new language

infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

4. Lipsync the original video with new audio

infsh app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'

Use Cases

Marketing: Product demos with AI presenter
Education: Course videos, explainers
Localization: Dub content in multiple languages
Social Media: Consistent virtual influencer
Corporate: Training videos, announcements

Tips

Use high-quality portrait photos (front-facing, good lighting)
Audio should be clear with minimal background noise
OmniHuman 1.5 supports multiple people in one image
LatentSync is best for syncing existing videos to new audio

Related Skills

Full platform skill (all 150+ apps)

npx skills add inference-sh/skills@inference-sh

Text-to-speech (generate audio for avatars)

npx skills add inference-sh/skills@text-to-speech

Speech-to-text (transcribe for dubbing)

npx skills add inference-sh/skills@speech-to-text

Video generation

npx skills add inference-sh/skills@ai-video-generation

Image generation (create avatar images)

npx skills add inference-sh/skills@ai-image-generation

Browse all video apps: infsh app list --category video

Documentation

Running Apps - How to run apps via CLI
Content Pipeline Example - Building media workflows
Streaming Results - Real-time progress updates

ai-avatar-video

Safety Notice

Copy this and send it to your AI assistant to learn