modelslab-audio-generation

Generate speech, music, and sound effects using ModelsLab's v7 Voice API. Supports text-to-speech, speech-to-text, speech-to-speech, music generation, sound effects, dubbing, song extension, and song inpainting via ElevenLabs and Inworld models.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "modelslab-audio-generation" with this command: npx skills add modelslab/skills/modelslab-skills-modelslab-audio-generation

ModelsLab Audio Generation

Generate high-quality audio including speech, music, voice conversion, sound effects, and dubbing using AI.

When to Use This Skill

  • Convert text to natural-sounding speech (TTS)
  • Transcribe speech to text
  • Transform voice characteristics (speech-to-speech)
  • Generate music from text prompts
  • Create sound effects
  • Dub audio into different languages
  • Extend or inpaint songs
  • Build voice assistants or audiobooks

Available APIs (v7)

Voice Endpoints

  • Text to Speech: POST https://modelslab.com/api/v7/voice/text-to-speech
  • Speech to Text: POST https://modelslab.com/api/v7/voice/speech-to-text
  • Speech to Speech: POST https://modelslab.com/api/v7/voice/speech-to-speech
  • Music Generation: POST https://modelslab.com/api/v7/voice/music-gen
  • Sound Generation: POST https://modelslab.com/api/v7/voice/sound-generation
  • Create Dubbing: POST https://modelslab.com/api/v7/voice/create-dubbing
  • Song Extender: POST https://modelslab.com/api/v7/voice/song-extender
  • Song Inpaint: POST https://modelslab.com/api/v7/voice/song-inpaint
  • Fetch Result: POST https://modelslab.com/api/v7/voice/fetch/{id}

Note: v6 endpoints (/api/v6/voice/text_to_speech, etc.) still work but v7 is the current version. Parameter names have changed in v7 (e.g., text is now prompt, audio is now init_audio).

Discovering Audio Models

# Search audio/voice models
modelslab models search --feature audio_gen

# Search by provider
modelslab models search --search "eleven"

# Get model details
modelslab models detail --id eleven_multilingual_v2

Audio Model IDs

model_idNameUse With
eleven_multilingual_v2ElevenLabs Multilingual v2text-to-speech
eleven_english_sts_v2ElevenLabs Voice Changerspeech-to-speech
scribe_v1ElevenLabs Scribespeech-to-text
eleven_sound_effectElevenLabs Sound Effectssound-generation
music_v1ElevenLabs Musicmusic-gen
inworld-tts-1Inworld TTStext-to-speech

Text to Speech

import requests
import time

def text_to_speech(text, api_key, voice_id="21m00Tcm4TlvDq8ikWAM", model_id="eleven_multilingual_v2"):
    """Convert text to speech.

    Args:
        text: The text to convert to speech
        api_key: Your ModelsLab API key
        voice_id: ElevenLabs voice ID (see Available Voices below)
        model_id: TTS model to use
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/text-to-speech",
        json={
            "key": api_key,
            "prompt": text,             # v7 uses "prompt" not "text"
            "voice_id": voice_id,
            "model_id": model_id
        }
    )

    data = response.json()

    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)
    else:
        raise Exception(f"Error: {data.get('message', 'Unknown error')}")

# Usage
audio_url = text_to_speech(
    "Hello! Welcome to ModelsLab. This is a test of our text-to-speech API.",
    "your_api_key"
)
print(f"Audio URL: {audio_url}")

Speech to Text (Transcription)

def speech_to_text(audio_url, api_key, model_id="scribe_v1"):
    """Transcribe speech from audio to text.

    Args:
        audio_url: URL of audio file (must be publicly accessible)
        model_id: STT model to use
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/speech-to-text",
        json={
            "key": api_key,
            "init_audio": audio_url,    # v7 uses "init_audio" not "audio"
            "model_id": model_id
        }
    )

    data = response.json()

    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)
    else:
        raise Exception(data.get("message"))

# Transcribe audio
result = speech_to_text(
    "https://example.com/speech.mp3",
    "your_api_key"
)
print(f"Transcription: {result}")

Speech to Speech (Voice Conversion)

def speech_to_speech(audio_url, voice_id, api_key, model_id="eleven_english_sts_v2"):
    """Convert voice characteristics in audio.

    Args:
        audio_url: URL of the source audio
        voice_id: Target ElevenLabs voice ID
        model_id: Voice conversion model
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/speech-to-speech",
        json={
            "key": api_key,
            "init_audio": audio_url,
            "voice_id": voice_id,
            "model_id": model_id
        }
    )

    data = response.json()
    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)

Sound Effects Generation

def generate_sound_effect(description, api_key, model_id="eleven_sound_effect"):
    """Generate a sound effect from a text description.

    Args:
        description: What sound to generate
        model_id: Sound effects model
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/sound-generation",
        json={
            "key": api_key,
            "prompt": description,
            "model_id": model_id
        }
    )

    data = response.json()
    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)

# Generate door slam sound
sfx_url = generate_sound_effect(
    "Heavy wooden door slamming shut",
    "your_api_key"
)

Music Generation

def generate_music(prompt, api_key, model_id="music_v1"):
    """Generate music from a text description.

    Args:
        prompt: Description of music style/mood
        model_id: Music generation model
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/music-gen",
        json={
            "key": api_key,
            "prompt": prompt,
            "model_id": model_id
        }
    )

    data = response.json()
    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)

# Generate background music
music_url = generate_music(
    "Upbeat electronic music with a driving beat, perfect for a tech startup video",
    "your_api_key"
)
print(f"Music: {music_url}")

Polling for Async Results

def poll_audio_result(request_id, api_key, timeout=300):
    """Poll for async audio generation results."""
    start_time = time.time()

    while time.time() - start_time < timeout:
        fetch = requests.post(
            f"https://modelslab.com/api/v7/voice/fetch/{request_id}",
            json={"key": api_key}
        )
        result = fetch.json()

        if result["status"] == "success":
            return result["output"][0]
        elif result["status"] == "failed":
            raise Exception(result.get("message", "Generation failed"))

        time.sleep(5)

    raise Exception("Timeout waiting for audio generation")

Available ElevenLabs Voice IDs

Voice IDNameStyle
21m00Tcm4TlvDq8ikWAMRachelNeutral, calm
AZnzlk1XvdvUeBnXmlldDomiConfident
EXAVITQu4vr4xnSDxMaLBellaSoft, warm
ErXwobaYiN019PkySvjVAntoniWell-rounded
MF3mGyEYCl7XYWbV9V6OElliYoung, clear
TxGEqnHWrfWFTfGW9XjXJoshDeep, warm
VR6AewLTigWG4xSOukaGArnoldStrong
pNInz6obpgDQGcFmaJgBAdamDeep, narrative
yoZ06aMxZJJ28mfd3POQSamDynamic

Key Parameters

Text to Speech

ParameterTypeRequiredDescription
promptstringYesText to convert to speech
voice_idstringYesElevenLabs voice identifier
model_idstringYesTTS model (e.g., eleven_multilingual_v2)
temperaturefloatNoVoice variation
webhookstringNoAsync notification URL

Speech to Text

ParameterTypeRequiredDescription
init_audiostringYesURL of audio to transcribe
model_idstringYesSTT model (e.g., scribe_v1)

Sound Generation

ParameterTypeRequiredDescription
promptstringYesSound effect description
model_idstringYesSFX model (e.g., eleven_sound_effect)

v6 to v7 Parameter Changes

v6 Parameterv7 ParameterNotes
textpromptTTS text input
audioinit_audioSTT/STS audio input
target_audioinit_audioVoice-to-voice source
(not required)model_idNow required on all endpoints

Best Practices

1. Use Correct Voice IDs

TTS requires valid ElevenLabs voice IDs (not generic names like "alloy").

2. Ensure Audio Accessibility

Audio URLs for speech-to-text must be publicly accessible without redirects or authentication.

3. Use Webhooks for Long Operations

payload = {
    "key": api_key,
    "prompt": "...",
    "model_id": "eleven_multilingual_v2",
    "webhook": "https://yourserver.com/webhook/audio",
    "track_id": "audio_001"
}

Error Handling

try:
    audio = text_to_speech(text, api_key)
    print(f"Audio generated: {audio}")
except Exception as e:
    print(f"Audio generation failed: {e}")

Resources

Related Skills

  • modelslab-model-discovery - Find and filter models
  • modelslab-video-generation - Add audio to videos
  • modelslab-chat-generation - Chat with LLM models
  • modelslab-webhooks - Handle async audio generation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

modelslab-interior-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

modelslab-3d-generation

No summary provided by upstream source.

Repository SourceNeeds Review
General

modelslab-deepfake

No summary provided by upstream source.

Repository SourceNeeds Review