ai-avatar-video

AI Avatar & Talking Head Videos

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ai-avatar-video" with this command: npx skills add inference-sh/skills@p-video-avatar

AI Avatar & Talking Head Videos

Create AI avatars and talking head videos via inference.sh CLI.

Quick Start

Requires inference.sh CLI (belt ). Install instructions

belt login

Recommended: P-Video-Avatar (fastest, cheapest, built-in TTS)

belt app run pruna/p-video-avatar --input '{ "image": "https://portrait.jpg", "voice_script": "Hello, welcome to our product demo!", "voice": "Zephyr (Female)" }'

Available Models

Start with P-Video-Avatar — it's 18x faster and 6x cheaper than alternatives, with built-in TTS, dynamic backgrounds, and 1080p support.

Model App ID Best For Built-in TTS

P-Video-Avatar pruna/p-video-avatar

Best overall: speed, cost, quality, control Yes (30 voices, 10 languages)

OmniHuman 1.5 bytedance/omnihuman-1-5

Multi-character, audio-driven No

Fabric 1.0 falai/fabric-1-0

Image talks with lipsync Yes

PixVerse Lipsync falai/pixverse-lipsync

Highly realistic lipsync No

Cost & Speed Comparison

Model Speed (per sec of video) Cost per second

P-Video-Avatar ~1.83s/s $0.025

OmniHuman 1.5 ~28s/s (15x slower) $0.16 (6.4x more)

Fabric 1.0 ~34s/s (18x slower) $0.14 (5.6x more)

Examples

P-Video-Avatar (Recommended)

Generate avatar from portrait + text script with built-in TTS:

belt app run pruna/p-video-avatar --input '{ "image": "https://portrait.jpg", "voice_script": "Welcome to our product walkthrough. Today I will show you three key features.", "voice": "Puck (Male)", "voice_language": "English (US)", "resolution": "720p" }'

With custom style control:

belt app run pruna/p-video-avatar --input '{ "image": "https://portrait.jpg", "voice_script": "This is exciting news!", "voice": "Aoede (Female)", "voice_prompt": "Enthusiastic and energetic tone", "video_prompt": "The person is presenting on stage with dramatic lighting", "resolution": "1080p" }'

With audio file instead of TTS:

belt app run pruna/p-video-avatar --input '{ "image": "https://portrait.jpg", "audio": "https://speech.mp3" }'

Full Workflow: Generate Portrait + Avatar

Use Pruna P-Image to generate the portrait, then create the avatar:

1. Generate a portrait image

belt app run pruna/p-image --input '{ "prompt": "professional headshot portrait of a young woman, neutral background, looking at camera, studio lighting, photorealistic", "aspect_ratio": "9:16" }'

2. Create avatar video with built-in TTS

belt app run pruna/p-video-avatar --input '{ "image": "<image-url-from-step-1>", "voice_script": "Hi there! Let me walk you through our latest features.", "voice": "Zephyr (Female)" }'

OmniHuman 1.5 (Multi-Character)

belt app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'

Supports specifying which character to drive in multi-person images.

Fabric 1.0 (Image Talks)

belt app run falai/fabric-1-0 --input '{ "image_url": "https://face.jpg", "audio_url": "https://audio.mp3" }'

PixVerse Lipsync

belt app run falai/pixverse-lipsync --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'

Full Workflow: TTS + Avatar (Non-TTS Models)

For models without built-in TTS, generate speech first:

1. Generate speech from text

belt app run infsh/kokoro-tts --input '{ "prompt": "Welcome to our product demo. Today I will show you..." }' > speech.json

2. Create avatar video with the speech

belt app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'

Full Workflow: Dub Video in Another Language

1. Transcribe original video

belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

2. Translate text (manually or with an LLM)

3. Generate speech in new language

belt app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

4. Lipsync the original video with new audio

belt app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'

Use Cases

  • Marketing: Product demos with AI presenter

  • Education: Course videos, explainers

  • Localization: Dub content in multiple languages

  • Social Media: Consistent virtual influencer

  • Corporate: Training videos, announcements

  • Gaming: Character avatars, NPC dialogue

Tips

  • Use high-quality portrait photos (front-facing, good lighting)

  • Audio should be clear with minimal background noise

  • P-Video-Avatar supports built-in TTS — no need for a separate speech generation step

  • P-Video-Avatar output aspect ratio matches the input image

  • Generate portraits with pruna/p-image using 9:16 aspect ratio for vertical videos

  • OmniHuman 1.5 supports multiple people in one image

  • LatentSync is best for syncing existing videos to new audio

Related Skills

Dedicated P-Video-Avatar skill

npx skills add inference-sh/skills@p-video-avatar

Full platform skill (all 250+ apps)

npx skills add inference-sh/skills@infsh-cli

Text-to-speech (generate audio for non-TTS avatar models)

npx skills add inference-sh/skills@text-to-speech

Speech-to-text (transcribe for dubbing)

npx skills add inference-sh/skills@speech-to-text

Video generation

npx skills add inference-sh/skills@ai-video-generation

Image generation (create avatar images)

npx skills add inference-sh/skills@ai-image-generation

Browse all video apps: belt app list --category video

Documentation

  • Running Apps - How to run apps via CLI

  • Content Pipeline Example - Building media workflows

  • Streaming Results - Real-time progress updates

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ai-avatar-video

No summary provided by upstream source.

Repository SourceNeeds Review
General

ai-image-generation

No summary provided by upstream source.

Repository SourceNeeds Review
General

ai-video-generation

No summary provided by upstream source.

Repository SourceNeeds Review
General

remotion-render

No summary provided by upstream source.

Repository SourceNeeds Review