multimodal-content-creator

Multimodal content creation workflow — receive WhatsApp messages (text or voice), transcribe audio via Whisper, generate images with DALL-E 3, and reply automatically.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "multimodal-content-creator" with this command: npx skills add terrycarter1985/multimodal-content-creator

Multimodal Content Creator

A WhatsApp-powered content creation workflow that lets customers send text or voice messages and receive AI-generated images in return.

How It Works

Receive a WhatsApp message (text or voice note)
Transcribe voice notes using OpenAI Whisper
Generate an image from the prompt using DALL-E 3
Reply with the generated image back to the customer

Prerequisites

OpenAI API key set as OPENAI_API_KEY environment variable
WhatsApp CLI authentication (python wacli.py login <token>)

Usage

# Process all unread WhatsApp messages
python scripts/workflow.py process-all

# Generate a single image
python scripts/generate_images.py "a cat riding a skateboard"

# Batch generate from prompts file
python scripts/generate_images.py prompts.txt

# Transcribe an audio file
python scripts/transcribe.py recording.mp3

Files

scripts/workflow.py — Main orchestration script
scripts/generate_images.py — DALL-E 3 image generation
scripts/transcribe.py — Whisper audio transcription (with chunking for large files)
scripts/wacli.py — WhatsApp CLI client

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open Registry Record Open in ClawHub

Related Skills

Related by shared tags or category signals.

Automation

Bilibili Notion Pipeline Skill

Skill-first Bilibili to Notion pipeline. Download a Bilibili/b23 video, transcribe audio, upload the mp4, create or update a Notion transcript page, write tr...

Registry SourceRecently Updated

1800Profile unavailable

Automation

Whisper Voice Transcription (whisper.cpp)

Build and use whisper.cpp for local speech-to-text workflows, with optional cloud fallback when local transcription is not practical.

Registry SourceRecently Updated

1850Profile unavailable

Research

Tracked Video Analysis

Analyze local or linked video files and convert them into structured summaries of features, functions, workflows, or topics. Use when a user wants a walkthro...

Registry SourceRecently Updated

5080Profile unavailable

General

Speech to Text

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...

Registry SourceRecently Updated

4270Profile unavailable