audio-understanding

Audio Understanding: Audio transcription and analysis with Gemini

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "audio-understanding" with this command: npx skills add superconductor/superconductor-plugin-marketplace/superconductor-superconductor-plugin-marketplace-audio-understanding

Audio Understanding: Audio transcription and analysis with Gemini

File support

This skill supports audio analysis using Google Gemini models. Supported formats:

Category Extensions

Audio .mp3 , .wav , .m4a , .ogg , .flac

  • Local audio files up to 9.5 hours long

  • YouTube links (youtube.com/watch, youtu.be, youtube.com/embed)

Reference: https://ai.google.dev/gemini-api/docs/audio?example=dialogue

How to use

bash ${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh --file=AUDIO_PATH "YOUR QUESTION ABOUT THE AUDIO"

Arguments:

  • --file

  • Required: Local audio file path or YouTube URL

  • --model

  • Optional: Model to use (defaults to gemini-3-flash-preview )

Examples:

Transcribe a local audio file

npx -y superconductor-gemini-skills --file=recording.mp3 "Transcribe this audio" npx -y superconductor-gemini-skills --file=meeting.wav "Summarize the key points discussed"

Analyze a podcast or YouTube audio

npx -y superconductor-gemini-skills --file="https://www.youtube.com/watch?v=dQw4w9WgXcQ" "Transcribe this audio" npx -y superconductor-gemini-skills --file="https://youtu.be/dQw4w9WgXcQ" "What topics are discussed?"

Extract specific information

npx -y superconductor-gemini-skills --file=interview.m4a "List all the questions asked by the interviewer" npx -y superconductor-gemini-skills --file=lecture.ogg "Create a bullet-point summary of the main concepts"

API Key

The GEMINI_API_KEY environment variable must be set. Get your key at: https://ai.google.dev/gemini-api/docs/api-key

Models

Model ID Context Window Pricing (Input / Output)

gemini-3-pro-preview

1M / 64k $2 / $12 (<200k), $4 / $18 (>200k)

gemini-3-flash-preview

1M / 64k $0.50 / $3

gemini-2.5-pro

1M / 65k $1.25 / $10 (<200k), $2.50 / $15 (>200k)

gemini-2.5-flash

1M / 65k $0.30 / $2.50

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

image-generation

No summary provided by upstream source.

Repository SourceNeeds Review
General

video-understanding

No summary provided by upstream source.

Repository SourceNeeds Review
General

text-to-speech

No summary provided by upstream source.

Repository SourceNeeds Review
General

gemini-consultation

No summary provided by upstream source.

Repository SourceNeeds Review