Multimodal Content Creator
A WhatsApp-powered content creation workflow that lets customers send text or voice messages and receive AI-generated images in return.
How It Works
- Receive a WhatsApp message (text or voice note)
- Transcribe voice notes using OpenAI Whisper
- Generate an image from the prompt using DALL-E 3
- Reply with the generated image back to the customer
Prerequisites
- OpenAI API key set as
OPENAI_API_KEYenvironment variable - WhatsApp CLI authentication (
python wacli.py login <token>)
Usage
# Process all unread WhatsApp messages
python scripts/workflow.py process-all
# Generate a single image
python scripts/generate_images.py "a cat riding a skateboard"
# Batch generate from prompts file
python scripts/generate_images.py prompts.txt
# Transcribe an audio file
python scripts/transcribe.py recording.mp3
Files
scripts/workflow.py— Main orchestration scriptscripts/generate_images.py— DALL-E 3 image generationscripts/transcribe.py— Whisper audio transcription (with chunking for large files)scripts/wacli.py— WhatsApp CLI client