Nightingale Karaoke Skill

Skill by ara.so — Daily 2026 Skills collection.

Nightingale is a self-contained, ML-powered karaoke application written in Rust (Bevy engine). It scans a local music folder, separates vocals from instrumentals (UVR Karaoke model or Demucs), transcribes lyrics with word-level timestamps (WhisperX), and plays back with synchronized highlighting, real-time pitch scoring, player profiles, and GPU shader / video backgrounds. Everything — ffmpeg, Python, PyTorch, ML models — is bootstrapped automatically on first launch.

Installation

Pre-built Binary (Recommended)

Download the latest release from the Releases page for your platform and run it.

macOS only — remove quarantine after extracting:

xattr -cr Nightingale.app

Build from Source

Prerequisites:

Rust 1.85+ (edition 2024)
Linux additionally needs: libasound2-dev libudev-dev libwayland-dev libxkbcommon-dev

git clone https://github.com/rzru/nightingale
cd nightingale

# Development build
cargo build --release

# Run directly
./target/release/nightingale

Release Packaging

# Linux / macOS
scripts/make-release.sh

# Windows (PowerShell)
powershell -ExecutionPolicy Bypass -File scripts/make-release.ps1

Outputs a .tar.gz (Linux/macOS) or .zip (Windows) ready for distribution.

First Launch / Bootstrap

On first run, Nightingale downloads and configures:

ffmpeg binary
uv (Python package manager)
Python 3.10 via uv
PyTorch + WhisperX + audio-separator in a virtual environment
UVR Karaoke ONNX model and WhisperX large-v3 model

This takes 2–10 minutes depending on network speed. A progress screen is shown in-app.

To force re-bootstrap at any time:

./nightingale --setup

Bootstrap completion is marked by ~/.nightingale/vendor/.ready.

CLI Flags

Flag	Description
`--setup`	Force re-run of the first-launch bootstrap (re-downloads vendor deps)

Keyboard & Gamepad Controls

Navigation

Action	Keyboard	Gamepad
Move	Arrow keys	D-pad / Left stick
Confirm	Enter	A (South)
Back	Escape	B (East) / Start
Switch panel	Tab	—
Search	Type to filter	—

Playback

Action	Keyboard	Gamepad
Pause / Resume	Space	Start
Exit to menu	Escape	B (East)
Toggle guide vocals	G	—
Guide volume up/down	+ / -	—
Cycle background	T	—
Cycle video flavor	F	—
Toggle microphone	M	—
Next microphone	N	—
Toggle fullscreen	F11	—

Configuration

Main Config

Located at ~/.nightingale/config.json. Edit directly or via in-app settings.

{
  "music_folder": "/home/user/Music",
  "separator": "uvr",
  "guide_vocal_volume": 0.3,
  "background_theme": "plasma",
  "video_flavor": "nature",
  "default_profile": "Alice"
}

separator options: "uvr" (default, preserves backing vocals) | "demucs"

background_theme options: "plasma", "aurora", "waves", "nebula", "starfield", "video", "source_video"

video_flavor options: "nature", "underwater", "space", "city", "countryside"

Profiles

Located at ~/.nightingale/profiles.json:

{
  "profiles": [
    {
      "name": "Alice",
      "scores": {
        "blake3_hash_of_song": {
          "stars": 4,
          "score": 87250,
          "played_at": "2026-03-18T21:00:00Z"
        }
      }
    }
  ]
}

Pixabay Video Backgrounds (Dev)

API key is embedded in release builds. For local development, create .env at project root:

# .env
PIXABAY_API_KEY=$PIXABAY_API_KEY

The release script (make-release.sh) sources .env automatically.

Data Storage Layout

~/.nightingale/
├── cache/              # Per-song stems, transcripts, lyrics (keyed by blake3 hash)
├── config.json         # App settings
├── profiles.json       # Player profiles and per-song scores
├── videos/             # Pre-downloaded Pixabay video backgrounds
├── sounds/             # Sound effects
├── vendor/
│   ├── ffmpeg          # ffmpeg binary
│   ├── uv              # uv binary
│   ├── python/         # Python 3.10
│   ├── venv/           # ML virtualenv (WhisperX, Demucs, audio-separator)
│   ├── analyzer/       # Python analyzer scripts
│   └── .ready          # Bootstrap completion marker
└── models/
    ├── torch/          # Demucs model weights
    ├── huggingface/    # WhisperX large-v3 weights
    └── audio_separator/ # UVR Karaoke ONNX model

Cache keys are blake3 hashes of the source file — re-analysis only triggers if the file changes or is manually invalidated.

Supported File Formats

Audio: .mp3, .flac, .ogg, .wav, .m4a, .aac, .wma

Video: .mp4, .mkv, .avi, .webm, .mov, .m4v

Video files: audio track is extracted, vocals separated, original video plays as background automatically.

Hardware Acceleration

PyTorch backend is auto-detected:

Backend	Device	Notes
CUDA	NVIDIA GPU	Fastest; ~2–5 min/song
MPS	Apple Silicon	macOS; WhisperX alignment falls back to CPU
CPU	Any	Always works; ~10–20 min/song

UVR Karaoke model uses ONNX Runtime with CUDA (NVIDIA) or CoreML (Apple Silicon) automatically.

Processing Pipeline

Audio/Video file
       │
       ▼
 UVR Karaoke (ONNX) or Demucs (PyTorch)
       │  vocals.ogg + instrumental.ogg
       ▼
 LRCLIB API  ──▶  Synced lyrics fetch (if available)
       │
       ▼
 WhisperX large-v3  ──▶  Transcription + word-level timestamps
       │
       ▼
 Bevy App (Rust)
   - Plays instrumental audio
   - Synchronized word highlighting
   - Real-time pitch detection & scoring
   - GPU shader / video backgrounds
   - Scoreboards per profile

Code Patterns

Adding a New Background Theme (Bevy System)

// In your Bevy plugin, register a new background variant
use bevy::prelude::*;

#[derive(Component)]
pub struct MyCustomBackground;

pub fn spawn_custom_background(mut commands: Commands) {
    commands.spawn((
        MyCustomBackground,
        // ... your background components
    ));
}

pub struct CustomBackgroundPlugin;

impl Plugin for CustomBackgroundPlugin {
    fn build(&self, app: &mut App) {
        app.add_systems(OnEnter(AppState::Playing), spawn_custom_background);
    }
}

Extending Config Deserialization

use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NightingaleConfig {
    pub music_folder: String,
    #[serde(default = "default_separator")]
    pub separator: StemSeparator,
    #[serde(default = "default_guide_volume")]
    pub guide_vocal_volume: f32,
}

#[derive(Debug, Clone, Serialize, Deserialize, Default)]
#[serde(rename_all = "lowercase")]
pub enum StemSeparator {
    #[default]
    Uvr,
    Demucs,
}

fn default_guide_volume() -> f32 { 0.3 }
fn default_separator() -> StemSeparator { StemSeparator::Uvr }

// Load config
fn load_config() -> NightingaleConfig {
    let path = dirs::home_dir()
        .unwrap()
        .join(".nightingale/config.json");
    let raw = std::fs::read_to_string(&path).unwrap_or_default();
    serde_json::from_str(&raw).unwrap_or_default()
}

Triggering Re-analysis Programmatically

use std::fs;
use std::path::PathBuf;

/// Remove cached stems/transcript for a song to force re-analysis
fn invalidate_song_cache(song_hash: &str) {
    let cache_dir = dirs::home_dir()
        .unwrap()
        .join(".nightingale/cache")
        .join(song_hash);

    if cache_dir.exists() {
        fs::remove_dir_all(&cache_dir)
            .expect("Failed to remove cache directory");
        println!("Cache invalidated for {}", song_hash);
    }
}

Computing a Song's Blake3 Hash (for Cache Lookup)

use blake3::Hasher;
use std::fs::File;
use std::io::{BufReader, Read};

fn hash_file(path: &std::path::Path) -> String {
    let file = File::open(path).expect("Cannot open file");
    let mut reader = BufReader::new(file);
    let mut hasher = Hasher::new();
    let mut buf = [0u8; 65536];
    loop {
        let n = reader.read(&mut buf).unwrap();
        if n == 0 { break; }
        hasher.update(&buf[..n]);
    }
    hasher.finalize().to_hex().to_string()
}

Profile Score Update Pattern

use serde::{Deserialize, Serialize};
use std::collections::HashMap;

#[derive(Debug, Serialize, Deserialize)]
pub struct SongScore {
    pub stars: u8,
    pub score: u32,
    pub played_at: String,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct Profile {
    pub name: String,
    pub scores: HashMap<String, SongScore>, // key = blake3 hash
}

fn update_score(profile: &mut Profile, song_hash: &str, stars: u8, score: u32) {
    profile.scores.insert(song_hash.to_string(), SongScore {
        stars,
        score,
        played_at: chrono::Utc::now().to_rfc3339(),
    });
}

Troubleshooting

Bootstrap Fails / Stuck on Setup Screen

# Force re-bootstrap
./nightingale --setup

# Or manually remove the vendor directory and restart
rm -rf ~/.nightingale/vendor
./nightingale

Song Analysis Hangs or Errors

# Check the analyzer venv is healthy
~/.nightingale/vendor/venv/bin/python -c "import whisperx; print('ok')"

# Re-bootstrap if broken
./nightingale --setup

macOS "App is damaged" Error

xattr -cr Nightingale.app

GPU Not Being Used

NVIDIA: Ensure CUDA drivers are installed and nvidia-smi shows your GPU.
Apple Silicon: MPS is used automatically on macOS with Apple Silicon; WhisperX alignment falls back to CPU (normal behavior).
Check ~/.nightingale/vendor/venv — if PyTorch installed the CPU-only build, re-bootstrap after installing CUDA drivers.

Cache Corruption / Wrong Lyrics

# Find the blake3 hash of your file (build a small tool or use b3sum)
b3sum /path/to/song.mp3

# Remove that song's cache
rm -rf ~/.nightingale/cache/<hash>

Then re-open the song in Nightingale to re-analyze.

Audio Playback Issues (Linux)

Ensure ALSA/PulseAudio/PipeWire is running. Install missing deps:

sudo apt install libasound2-dev libudev-dev libwayland-dev libxkbcommon-dev

Video Backgrounds Not Loading

Video backgrounds are pre-downloaded during setup via the Pixabay API. For development builds, ensure .env contains a valid PIXABAY_API_KEY. If videos are missing in a release build, run --setup to re-trigger the download.

Platform Targets

Platform	Target Triple
Linux x86_64	`x86_64-unknown-linux-gnu`
Linux aarch64	`aarch64-unknown-linux-gnu`
macOS ARM	`aarch64-apple-darwin`
macOS Intel	`x86_64-apple-darwin`
Windows x86_64	`x86_64-pc-windows-msvc`

Cross-compile with:

rustup target add aarch64-unknown-linux-gnu
cargo build --release --target aarch64-unknown-linux-gnu

License

GPL-3.0-or-later. See LICENSE.