multimodal-content-creator

Multimodal content creation workflow — receive WhatsApp messages (text or voice), transcribe audio via Whisper, generate images with DALL-E 3, and reply automatically.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "multimodal-content-creator" with this command: npx skills add terrycarter1985/multimodal-content-creator

Multimodal Content Creator

A WhatsApp-powered content creation workflow that lets customers send text or voice messages and receive AI-generated images in return.

How It Works

  1. Receive a WhatsApp message (text or voice note)
  2. Transcribe voice notes using OpenAI Whisper
  3. Generate an image from the prompt using DALL-E 3
  4. Reply with the generated image back to the customer

Prerequisites

  • OpenAI API key set as OPENAI_API_KEY environment variable
  • WhatsApp CLI authentication (python wacli.py login <token>)

Usage

# Process all unread WhatsApp messages
python scripts/workflow.py process-all

# Generate a single image
python scripts/generate_images.py "a cat riding a skateboard"

# Batch generate from prompts file
python scripts/generate_images.py prompts.txt

# Transcribe an audio file
python scripts/transcribe.py recording.mp3

Files

  • scripts/workflow.py — Main orchestration script
  • scripts/generate_images.py — DALL-E 3 image generation
  • scripts/transcribe.py — Whisper audio transcription (with chunking for large files)
  • scripts/wacli.py — WhatsApp CLI client

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Bilibili Notion Pipeline Skill

Skill-first Bilibili to Notion pipeline. Download a Bilibili/b23 video, transcribe audio, upload the mp4, create or update a Notion transcript page, write tr...

Registry SourceRecently Updated
1800Profile unavailable
Automation

Whisper Voice Transcription (whisper.cpp)

Build and use whisper.cpp for local speech-to-text workflows, with optional cloud fallback when local transcription is not practical.

Registry SourceRecently Updated
1850Profile unavailable
Research

Tracked Video Analysis

Analyze local or linked video files and convert them into structured summaries of features, functions, workflows, or topics. Use when a user wants a walkthro...

Registry SourceRecently Updated
5080Profile unavailable
General

Speech to Text

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...

Registry SourceRecently Updated
4270Profile unavailable