ACE-Step Songwriting Guide
Professional music creation knowledge for writing captions, lyrics, and choosing music parameters for ACE-Step.
Output Format
After using this guide, produce two things for the acestep skill:
-
Caption (-c ): Style/genre/instruments/emotion description
-
Lyrics (-l ): Complete structured lyrics with tags
-
Parameters: --duration , --bpm , --key , --time-signature , --language
Caption: The Most Important Input
Caption is the most important factor affecting generated music.
Supports multiple formats: simple style words, comma-separated tags, complex natural language descriptions.
Common Dimensions
Dimension Examples
Style/Genre pop, rock, jazz, electronic, hip-hop, R&B, folk, classical, lo-fi, synthwave
Emotion/Atmosphere melancholic, uplifting, energetic, dreamy, dark, nostalgic, euphoric, intimate
Instruments acoustic guitar, piano, synth pads, 808 drums, strings, brass, electric bass
Timbre Texture warm, bright, crisp, muddy, airy, punchy, lush, raw, polished
Era Reference 80s synth-pop, 90s grunge, 2010s EDM, vintage soul, modern trap
Production Style lo-fi, high-fidelity, live recording, studio-polished, bedroom pop
Vocal Characteristics female vocal, male vocal, breathy, powerful, falsetto, raspy, choir
Speed/Rhythm slow tempo, mid-tempo, fast-paced, groovy, driving, laid-back
Structure Hints building intro, catchy chorus, dramatic bridge, fade-out ending
Caption Writing Principles
-
Specific beats vague — "sad piano ballad with female breathy vocal" > "a sad song"
-
Combine multiple dimensions — style+emotion+instruments+timbre anchors direction precisely
-
Use references well — "in the style of 80s synthwave" conveys complex aesthetic quickly
-
Texture words are useful — warm, crisp, airy, punchy influence mixing and timbre
-
Don't pursue perfection — Caption is a starting point, iterate based on results
-
Granularity determines freedom — Less detail = more model creativity; more detail = more control
-
Avoid conflicting words — "classical strings" + "hardcore metal" degrades output
-
Fix: Repetition reinforcement — Repeat the elements you want more
-
Fix: Conflict to evolution — "Start with soft strings, middle becomes metal rock, end turns to hip-hop"
-
Don't put BPM/key/tempo in Caption — Use dedicated parameters instead
Lyrics: The Temporal Script
Lyrics controls how music unfolds over time. It carries:
-
Lyric text itself
-
Structure tags ([Verse], [Chorus], [Bridge]...)
-
Vocal style hints ([raspy vocal], [whispered]...)
-
Instrumental sections ([guitar solo], [drum break]...)
-
Energy changes ([building energy], [explosive drop]...)
Structure Tags
Category Tag Description
Basic Structure [Intro]
Opening, establish atmosphere
[Verse] / [Verse 1]
Verse, narrative progression
[Pre-Chorus]
Pre-chorus, build energy
[Chorus]
Chorus, emotional climax
[Bridge]
Bridge, transition or elevation
[Outro]
Ending, conclusion
Dynamic Sections [Build]
Energy gradually rising
[Drop]
Electronic music energy release
[Breakdown]
Reduced instrumentation, space
Instrumental [Instrumental]
Pure instrumental, no vocals
[Guitar Solo]
Guitar solo
[Piano Interlude]
Piano interlude
Special [Fade Out]
Fade out ending
[Silence]
Silence
Combining Tags
Use - for finer control, but keep it concise:
✅ [Chorus - anthemic] ❌ [Chorus - anthemic - stacked harmonies - high energy - powerful - epic]
Put complex style descriptions in Caption, not in tags.
Caption-Lyrics Consistency
Models are not good at resolving conflicts. Checklist:
-
Instruments in Caption ↔ Instrumental section tags in Lyrics
-
Emotion in Caption ↔ Energy tags in Lyrics
-
Vocal description in Caption ↔ Vocal control tags in Lyrics
Vocal Control Tags
Tag Effect
[raspy vocal]
Raspy, textured vocals
[whispered]
Whispered
[falsetto]
Falsetto
[powerful belting]
Powerful, high-pitched singing
[spoken word]
Rap/recitation
[harmonies]
Layered harmonies
[call and response]
Call and response
[ad-lib]
Improvised embellishments
Energy and Emotion Tags
Tag Effect
[high energy]
High energy, passionate
[low energy]
Low energy, restrained
[building energy]
Increasing energy
[explosive]
Explosive energy
[melancholic]
Melancholic
[euphoric]
Euphoric
[dreamy]
Dreamy
[aggressive]
Aggressive
Lyric Writing Tips
-
6-10 syllables per line — Model aligns syllables to beats; keep similar counts for lines in same position (±1-2)
-
Uppercase = stronger intensity — WE ARE THE CHAMPIONS! (shouting) vs walking through the streets (normal)
-
Parentheses = background vocals — We rise together (together)
-
Extend vowels — Feeeling so aliiive (use cautiously, effects unstable)
-
Clear section separation — Blank lines between sections
Avoiding "AI-flavored" Lyrics
Red Flag Description
Adjective stacking "neon skies, electric hearts, endless dreams" — vague imagery filler
Rhyme chaos Inconsistent patterns or forced rhymes breaking meaning
Blurred boundaries Lyric content crosses structure tags
No breathing room Lines too long to sing in one breath
Mixed metaphors Water → fire → flying — listeners can't anchor
Metaphor discipline: One core metaphor per song, explore its multiple aspects.
Music Metadata
Most of the time, let LM auto-infer. Only set manually when you have clear requirements.
Parameter Range Description
bpm
30–300 Slow 60–80, mid 90–120, fast 130–180
keyscale
Key e.g. C Major , Am . Common keys (C, G, D, Am, Em) most stable
timesignature
Time sig 4/4 (most common), 3/4 (waltz), 6/8 (swing)
vocal_language
Language Usually auto-detected from lyrics
duration
Seconds See duration calculation below
When to Set Manually
Scenario Set
Daily generation Let LM auto-infer
Clear tempo requirement bpm
Specific style (waltz) timesignature=3/4
Match other material bpm
- duration
Specific key color keyscale
Duration Calculation
Estimation Method
-
Intro/Outro: 5-10 seconds each
-
Instrumental sections: 5-15 seconds each
-
Typical structures:
-
2 verses + 2 choruses: 120-150s minimum
-
2 verses + 2 choruses + bridge: 180-240s minimum
-
Full song with intro/outro: 210-270s (3.5-4.5 min)
BPM and Duration Relationship
-
Slower BPM (60-80): Need MORE duration for same lyrics
-
Medium BPM (100-130): Standard duration
-
Faster BPM (150-180): Can fit more lyrics, but still need breathing room
Rule of thumb: When in doubt, estimate longer. A song too short feels rushed.
Note: Lyrics tags (piano, powerful, whispered) are consistent with Caption (piano ballad, building to powerful chorus, intimate).