acestep-songwriting

ACE-Step Songwriting Guide

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "acestep-songwriting" with this command: npx skills add ace-step/ace-step-1.5/ace-step-ace-step-1-5-acestep-songwriting

ACE-Step Songwriting Guide

Professional music creation knowledge for writing captions, lyrics, and choosing music parameters for ACE-Step.

Output Format

After using this guide, produce two things for the acestep skill:

  • Caption (-c ): Style/genre/instruments/emotion description

  • Lyrics (-l ): Complete structured lyrics with tags

  • Parameters: --duration , --bpm , --key , --time-signature , --language

Caption: The Most Important Input

Caption is the most important factor affecting generated music.

Supports multiple formats: simple style words, comma-separated tags, complex natural language descriptions.

Common Dimensions

Dimension Examples

Style/Genre pop, rock, jazz, electronic, hip-hop, R&B, folk, classical, lo-fi, synthwave

Emotion/Atmosphere melancholic, uplifting, energetic, dreamy, dark, nostalgic, euphoric, intimate

Instruments acoustic guitar, piano, synth pads, 808 drums, strings, brass, electric bass

Timbre Texture warm, bright, crisp, muddy, airy, punchy, lush, raw, polished

Era Reference 80s synth-pop, 90s grunge, 2010s EDM, vintage soul, modern trap

Production Style lo-fi, high-fidelity, live recording, studio-polished, bedroom pop

Vocal Characteristics female vocal, male vocal, breathy, powerful, falsetto, raspy, choir

Speed/Rhythm slow tempo, mid-tempo, fast-paced, groovy, driving, laid-back

Structure Hints building intro, catchy chorus, dramatic bridge, fade-out ending

Caption Writing Principles

  • Specific beats vague — "sad piano ballad with female breathy vocal" > "a sad song"

  • Combine multiple dimensions — style+emotion+instruments+timbre anchors direction precisely

  • Use references well — "in the style of 80s synthwave" conveys complex aesthetic quickly

  • Texture words are useful — warm, crisp, airy, punchy influence mixing and timbre

  • Don't pursue perfection — Caption is a starting point, iterate based on results

  • Granularity determines freedom — Less detail = more model creativity; more detail = more control

  • Avoid conflicting words — "classical strings" + "hardcore metal" degrades output

  • Fix: Repetition reinforcement — Repeat the elements you want more

  • Fix: Conflict to evolution — "Start with soft strings, middle becomes metal rock, end turns to hip-hop"

  • Don't put BPM/key/tempo in Caption — Use dedicated parameters instead

Lyrics: The Temporal Script

Lyrics controls how music unfolds over time. It carries:

  • Lyric text itself

  • Structure tags ([Verse], [Chorus], [Bridge]...)

  • Vocal style hints ([raspy vocal], [whispered]...)

  • Instrumental sections ([guitar solo], [drum break]...)

  • Energy changes ([building energy], [explosive drop]...)

Structure Tags

Category Tag Description

Basic Structure [Intro]

Opening, establish atmosphere

[Verse] / [Verse 1]

Verse, narrative progression

[Pre-Chorus]

Pre-chorus, build energy

[Chorus]

Chorus, emotional climax

[Bridge]

Bridge, transition or elevation

[Outro]

Ending, conclusion

Dynamic Sections [Build]

Energy gradually rising

[Drop]

Electronic music energy release

[Breakdown]

Reduced instrumentation, space

Instrumental [Instrumental]

Pure instrumental, no vocals

[Guitar Solo]

Guitar solo

[Piano Interlude]

Piano interlude

Special [Fade Out]

Fade out ending

[Silence]

Silence

Combining Tags

Use - for finer control, but keep it concise:

✅ [Chorus - anthemic] ❌ [Chorus - anthemic - stacked harmonies - high energy - powerful - epic]

Put complex style descriptions in Caption, not in tags.

Caption-Lyrics Consistency

Models are not good at resolving conflicts. Checklist:

  • Instruments in Caption ↔ Instrumental section tags in Lyrics

  • Emotion in Caption ↔ Energy tags in Lyrics

  • Vocal description in Caption ↔ Vocal control tags in Lyrics

Vocal Control Tags

Tag Effect

[raspy vocal]

Raspy, textured vocals

[whispered]

Whispered

[falsetto]

Falsetto

[powerful belting]

Powerful, high-pitched singing

[spoken word]

Rap/recitation

[harmonies]

Layered harmonies

[call and response]

Call and response

[ad-lib]

Improvised embellishments

Energy and Emotion Tags

Tag Effect

[high energy]

High energy, passionate

[low energy]

Low energy, restrained

[building energy]

Increasing energy

[explosive]

Explosive energy

[melancholic]

Melancholic

[euphoric]

Euphoric

[dreamy]

Dreamy

[aggressive]

Aggressive

Lyric Writing Tips

  • 6-10 syllables per line — Model aligns syllables to beats; keep similar counts for lines in same position (±1-2)

  • Uppercase = stronger intensity — WE ARE THE CHAMPIONS! (shouting) vs walking through the streets (normal)

  • Parentheses = background vocals — We rise together (together)

  • Extend vowels — Feeeling so aliiive (use cautiously, effects unstable)

  • Clear section separation — Blank lines between sections

Avoiding "AI-flavored" Lyrics

Red Flag Description

Adjective stacking "neon skies, electric hearts, endless dreams" — vague imagery filler

Rhyme chaos Inconsistent patterns or forced rhymes breaking meaning

Blurred boundaries Lyric content crosses structure tags

No breathing room Lines too long to sing in one breath

Mixed metaphors Water → fire → flying — listeners can't anchor

Metaphor discipline: One core metaphor per song, explore its multiple aspects.

Music Metadata

Most of the time, let LM auto-infer. Only set manually when you have clear requirements.

Parameter Range Description

bpm

30–300 Slow 60–80, mid 90–120, fast 130–180

keyscale

Key e.g. C Major , Am . Common keys (C, G, D, Am, Em) most stable

timesignature

Time sig 4/4 (most common), 3/4 (waltz), 6/8 (swing)

vocal_language

Language Usually auto-detected from lyrics

duration

Seconds See duration calculation below

When to Set Manually

Scenario Set

Daily generation Let LM auto-infer

Clear tempo requirement bpm

Specific style (waltz) timesignature=3/4

Match other material bpm

  • duration

Specific key color keyscale

Duration Calculation

Estimation Method

  • Intro/Outro: 5-10 seconds each

  • Instrumental sections: 5-15 seconds each

  • Typical structures:

  • 2 verses + 2 choruses: 120-150s minimum

  • 2 verses + 2 choruses + bridge: 180-240s minimum

  • Full song with intro/outro: 210-270s (3.5-4.5 min)

BPM and Duration Relationship

  • Slower BPM (60-80): Need MORE duration for same lyrics

  • Medium BPM (100-130): Standard duration

  • Faster BPM (150-180): Can fit more lyrics, but still need breathing room

Rule of thumb: When in doubt, estimate longer. A song too short feels rushed.

Note: Lyrics tags (piano, powerful, whispered) are consistent with Caption (piano ballad, building to powerful chorus, intimate).

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

acestep-simplemv

No summary provided by upstream source.

Repository SourceNeeds Review
General

acestep-docs

No summary provided by upstream source.

Repository SourceNeeds Review
General

acestep

No summary provided by upstream source.

Repository SourceNeeds Review
General

acestep-lyrics-transcription

No summary provided by upstream source.

Repository SourceNeeds Review