ergon

AI media generation CLI tool using Google's Imagen 4, Veo 3.1, and Gemini TTS. Use when the user wants to (1) generate images from text prompts, (2) edit existing images with AI, (3) explain image contents, (4) generate videos from text or images, (5) create narration/voice audio with character settings. Triggers on requests like "generate an image of...", "create a video...", "make a voice that says...", "edit this image to...", "describe this image".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ergon" with this command: npx skills add hirokidaichi/ergon/hirokidaichi-ergon-ergon

ergon - AI Media Generation CLI

Note: Run with npx ergon if not installed globally.

Quick Reference

npx ergon image gen "<prompt>" -t <style> -a <ratio>   # Image generation
npx ergon image edit <file> "<instruction>"            # Image editing
npx ergon video gen "<prompt>" [-i <image>]            # Video with audio
npx ergon narration gen "<text>" -c "<character>"      # Voice generation

Image Generation

npx ergon image gen [options] <theme>

Style Selection Guide

Use CaseStyle (-t)Aspect (-a)
Product photo, landscaperealistic16:9, 4:3
Character, mascotanime, illustration1:1, 3:4
Icon, logoflat, minimal1:1
Art, posterwatercolor, oil-painting, pop-artvaries
Game assetpixel-art, 3d-render1:1
Business, presentationcorporate16:9
Concept sketchsketchvaries

Options

OptionValuesDefault
-t, --typerealistic, illustration, flat, anime, watercolor, oil-painting, pixel-art, sketch, 3d-render, corporate, minimal, pop-artflat
-a, --aspect-ratio16:9, 4:3, 1:1, 9:16, 3:416:9
-s, --sizetiny, hd, fullhd, 2k, 4kfullhd
-e, --engineimagen4, imagen4-fast, imagen4-ultraimagen4

Examples:

npx ergon image gen "cute cat mascot for tech startup" -t anime -a 1:1
npx ergon image gen "professional team meeting in modern office" -t corporate -a 16:9
npx ergon image gen "abstract geometric logo" -t minimal -a 1:1 -o logo.png

Image Editing

npx ergon image edit [options] <file> <prompt>

Edit instructions in natural language:

  • Background change: "change background to sunset beach"
  • Style transfer: "make it look like watercolor painting"
  • Object removal: "remove the person on the left"
  • Color adjustment: "make colors more vibrant"
npx ergon image edit photo.jpg "change background to blue sky"
npx ergon image edit portrait.png "convert to anime style"

Video Generation (with Audio)

Veo 3.1 generates videos with synchronized audio. Include audio/sound instructions directly in the prompt.

npx ergon video gen [options] <theme>

Prompt Structure for Audio-Video

Include sound descriptions in your prompt:

# Sound effects included
npx ergon video gen "cat meowing and playing with a ball, soft purring sounds"

# Music/ambient audio
npx ergon video gen "sunset timelapse over ocean, with calming wave sounds and soft piano music"

# Dialogue/voice
npx ergon video gen "person saying 'welcome to our channel' with friendly tone, waving at camera"

Image-to-Video

Animate a static image with motion and sound:

npx ergon video gen "character starts dancing to upbeat music" -i character.png
npx ergon video gen "logo reveals with whoosh sound effect" -i logo.png

Options

OptionValuesDefault
-i, --inputimage file-
-d, --duration5-8 seconds8
-a, --aspect-ratio16:9, 9:1616:9
--fastuse Veo 3.1 Fastfalse

Vertical video for TikTok/Reels: -a 9:16

Narration Generation

For voice-only audio without video, use narration command.

npx ergon narration gen [options] <text>

Character and Acting Direction

Use -c (character) and -d (direction) for expressive voice:

# Character defines WHO is speaking
npx ergon narration gen "Let's go on an adventure!" -c "energetic young girl"

# Direction defines HOW they speak
npx ergon narration gen "The results are in..." -c "news anchor" -d "serious, building suspense"

# Combined for full expression
npx ergon narration gen "Yay! We did it!" -c "excited child" -d "jumping with joy, high energy"

Voice Selection

VoiceCharacter
KoreFemale, versatile (default)
AoedeFemale, warm
CharonMale, deep
FenrirMale, strong
PuckNeutral, playful

Options

OptionValuesDefault
-v, --voiceKore, Aoede, Charon, Fenrir, PuckKore
-c, --charactercharacter description-
-d, --directionacting direction-
--speed0.25-4.01.0
-l, --langja, en, zh, ko, etc.ja

Workflow Patterns

Generate, then Edit

npx ergon image gen "product photo of headphones" -t realistic
npx ergon image edit headphones.png "add soft shadow, white background"

Image to Animated Video

npx ergon image gen "mascot character standing" -t anime -a 1:1
npx ergon video gen "mascot waves and says hello cheerfully" -i mascot.png

Preview Before Generation

npx ergon image gen "complex scene" --dry-run  # Check settings
npx ergon video gen "expensive render" --dry-run  # Verify before API call

Common Options

All commands support:

  • --json - JSON output for scripting
  • --dry-run - Preview settings without API call
  • -o, --output <path> - Specify output path

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

marp-lens

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

OPC Landing Page Manager

Landing page strategy, copywriting, design, and code generation for solo entrepreneurs. From product idea to a complete, self-contained, conversion-optimized...

Registry SourceRecently Updated
Coding

OPC Product Manager

Product spec generation for solo entrepreneurs. Turns a one-sentence idea into a build-ready spec that AI coding agents (Claude Code, etc.) can execute direc...

Registry SourceRecently Updated
Coding

设备

Use when querying or modifying device configurations on ESD service, calling REST APIs with sigV2 authentication on HK baseline or STG environments

Registry SourceRecently Updated