UNITH Digital Humans Skill

Create, configure, update, and deploy AI-powered Digital Human avatars using the UNITH API.

Quick Overview

UNITH digital humans are AI avatars that can speak, converse, and interact with users. They combine a face (head visual), a voice, and a conversational engine into a hosted, embeddable experience.

Base API URL: https://platform-api.unith.ai Docs: https://docs.unith.ai

Prerequisites

The user must supply the following credentials (stored as environment variables):

Variable	Description	How to obtain
`UNITH_EMAIL`	Account email	Register at https://unith.ai
`UNITH_SECRET_KEY`	Non-expiring secret key	UNITH dashboard → Manage Account → "Secret Key" section → Generate

⚠️ The secret key is displayed only once. If lost, the user must delete and regenerate it.

Authentication

All API calls require a Bearer token (valid 7 days). Use the auth script:

source scripts/auth.sh

This validates credentials, retries on network errors, and exports UNITH_TOKEN. On failure, it prints specific guidance (wrong key, expired token, etc.).

Workflow: Creating a Digital Human

Step 1: Choose an Operating Mode

Ask the user what they want the digital human to do. Map their answer to one of 5 modes:

Mode	`operationMode` value	Use case	Output
Text-to-Video	`ttt`	Generate an MP4 video of the avatar speaking provided text	MP4 file
Open Dialogue	`oc`	Free-form conversational avatar guided by a system prompt	Hosted conversational URL
Document Q&A	`doc_qa`	Avatar answers questions from uploaded documents	Hosted conversational URL
Voiceflow	`voiceflow`	Guided conversation flow via Voiceflow	Hosted conversational URL
Plugin	`plugin`	Connect any external LLM or conversational engine via webhook	Hosted conversational URL

Complexity spectrum (simple → sophisticated):

Simplest: ttt — just text in, video out. No knowledge base needed.
Standard: oc — conversational with a system prompt. Good for general assistants.
Knowledge-grounded: doc_qa — upload documents, avatar answers from them. Best for support/FAQ.
Workflow-driven: voiceflow — structured conversation paths. Requires Voiceflow account.
Most flexible: plugin — BYO conversational engine. Maximum control.

Step 2: List Available Faces

bash scripts/list-resources.sh faces

Each face has an id (used as headVisualId in creation). Faces can be:

Public: Available to all organizations
Private: Available only to the user's organization
Custom (BYOF): User uploads a video of a real person (currently managed by UNITH)

Present the available faces to the user and let them choose.

Step 3: List Available Voices

bash scripts/list-resources.sh voices

Voices come from providers: elevenlabs, azure, audiostack. Present options to the user. Voices have performance rankings — faster voices are better for real-time conversation.

Step 4: Create the Digital Human

Build a JSON payload file (see references/api-payloads.md for the schema per mode), then:

bash scripts/create-head.sh payload.json --dry-run   # validate first
bash scripts/create-head.sh payload.json              # create

The script validates required fields, checks mode-specific requirements, retries on server errors, and prints the publicUrl on success.

Step 5 (doc_qa only): Upload Knowledge Document

For doc_qa mode, the digital human needs a knowledge document:

bash scripts/upload-document.sh <headId> /path/to/document.pdf

The script checks file existence/size, uses a longer timeout for uploads, and provides guidance on next steps.

Step 6: Test and Iterate

The digital human is live at the publicUrl from Step 4. The user should:

Visit the URL and test the conversation
Update configuration as needed (see below)

Updating a Digital Human

Use the update script to modify any parameter except the face (changing face requires creating a new head):

bash scripts/update-head.sh <headId> updates.json                         # from a JSON file
bash scripts/update-head.sh <headId> --field ttsVoice=rachel              # single field
bash scripts/update-head.sh <headId> --field ttsVoice=rachel --field greetings="Hi!"  # multiple fields

Listing Existing Digital Humans

bash scripts/list-resources.sh heads           # list all
bash scripts/list-resources.sh head <headId>   # get details for one

Deleting a Digital Human

bash scripts/delete-head.sh <headId> --confirm     # always use --confirm in automated/agent contexts

This permanently removes the digital human and cannot be undone.

Agent note: Always pass --confirm when calling this script. Without it, the script prompts for interactive input and will hang.

Embedding

Digital humans can be embedded in websites/apps. See references/embedding.md for code snippets and configuration options.

Scripts

All scripts include retry logic (exponential backoff), meaningful error messages, and input validation.

Script	Purpose
`scripts/_utils.sh`	Shared utilities: retry wrapper, colored logging, error parsing
`scripts/auth.sh`	Authenticate and export `UNITH_TOKEN` (with 6-day token caching)
`scripts/list-resources.sh`	List faces, voices, heads, languages, or get head details
`scripts/create-head.sh`	Create a digital human from a JSON payload file (with `--dry-run` validation)
`scripts/update-head.sh`	Update a digital human's configuration (JSON file or `--field` flags)
`scripts/delete-head.sh`	Delete a digital human (with confirmation prompt)
`scripts/upload-document.sh`	Upload knowledge document to a `doc_qa` head

Configuration via environment variables:

UNITH_MAX_RETRIES — max retry attempts (default: 3)
UNITH_RETRY_DELAY — initial delay between retries in seconds (default: 2, doubles each retry)
UNITH_CURL_TIMEOUT — curl timeout in seconds (default: 30, 120 for uploads)
UNITH_CONNECT_TIMEOUT — connection timeout in seconds (default: 10)
UNITH_TOKEN_CACHE — token cache file path (default: /tmp/.unith_token_cache, set empty to disable)

Detailed API Reference

For full payload schemas, configuration parameters, and mode-specific details:

Read references/api-payloads.md      # Full request/response schemas per mode
Read references/configuration.md     # All configurable parameters
Read references/embedding.md         # Embedding code and options

Common Patterns

"I want a quick video of someone saying X" → ttt mode, minimal config "I want a customer support avatar" → doc_qa mode with knowledge docs "I want an AI sales rep" → oc mode with a sales personality prompt "I want to connect my own LLM" → plugin mode with webhook URL "I want a guided onboarding flow" → voiceflow mode with Voiceflow API key

Information to Collect from the User

Before creating, ask for:

Purpose / use case → determines operating mode
Face preference → list available faces for selection
Voice preference → language, accent, gender, speed priority
Alias → display name for the digital human
Language → speech recognition and UI language (e.g., en-US, es-ES)
Greeting message → initial message the avatar says
System prompt (for oc/doc_qa) → personality and behavior instructions
Knowledge documents (for doc_qa) → files to upload
Voiceflow API key (for voiceflow) → from their Voiceflow account
Plugin URL (for plugin) → webhook endpoint for their custom engine

digital-clawatar

Safety Notice

Copy this and send it to your AI assistant to learn