Rust Build Times

Purpose

Guide agents through diagnosing and improving Rust compilation speed: cargo-timings for build profiling, sccache for caching, the Cranelift codegen backend for faster dev builds, workspace crate splitting, LTO configuration trade-offs, and fast linkers (mold/lld).

Triggers

"My Rust project takes too long to compile"
"How do I profile which crates are slow to build?"
"How do I set up sccache for Rust?"
"What is the Cranelift backend and how does it help?"
"Should I use thin LTO or fat LTO?"
"How do I use the mold linker with Rust?"

Workflow

Diagnose with cargo-timings

Build with timing report

cargo build --timings

Opens build/cargo-timings/cargo-timing.html

Shows: crate compilation timeline, parallelism, bottlenecks

For release builds

cargo build --release --timings

Key things to look for in the timing report:

- Long sequential chains (no parallelism)

- Individual crates taking > 10s (candidates for optimization)

- Proc-macro crates blocking everything downstream

cargo-llvm-lines — count LLVM IR lines per function (monomorphization)

cargo install cargo-llvm-lines cargo llvm-lines --release | head -20

Shows functions generating the most LLVM IR (template explosion)

sccache — compilation caching for Rust

Install

cargo install sccache

or: brew install sccache

Configure for Rust builds

export RUSTC_WRAPPER=sccache

Add to .cargo/config.toml (project or global)

~/.cargo/config.toml

[build] rustc-wrapper = "sccache"

Check cache stats

sccache --show-stats

S3 backend for CI teams

export SCCACHE_BUCKET=my-rust-cache export SCCACHE_REGION=us-east-1 export AWS_ACCESS_KEY_ID=xxx export AWS_SECRET_ACCESS_KEY=yyy sccache --start-server

GitHub Actions with sccache

- uses: mozilla-actions/sccache-action@v0.0.4

Cranelift codegen backend

Cranelift is a fast codegen backend (vs LLVM) — produces slower code but compiles much faster. Ideal for development builds:

Install nightly (Cranelift requires nightly for now)

rustup toolchain install nightly rustup component add rustc-codegen-cranelift-preview --toolchain nightly

Use Cranelift for dev builds only

.cargo/config.toml

[unstable] codegen-backend = true

[profile.dev] codegen-backend = "cranelift"

Use per-build

CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift
RUSTFLAGS="-Zunstable-options"
cargo +nightly build

Cranelift vs LLVM trade-off:

Dev builds: 20–40% faster compilation with Cranelift
Runtime performance: LLVM-compiled code is faster (Cranelift skips many optimizations)
Release builds: always use LLVM

Workspace splitting for parallelism

A single large crate compiles sequentially. Split into smaller crates to enable Cargo parallelism:

Before: one giant crate

[package] name = "monolith" # everything in one crate = sequential compile

After: workspace with parallel crates

[workspace] members = [ "core", # compiled in parallel "networking", # no deps on ui → parallel with ui "ui", # no deps on networking → parallel "server", # depends on core + networking "cli", # depends on core + ui ]

Visualize dependency graph

cargo tree | head -30 cargo tree --graph | dot -Tsvg > deps.svg # visual graph

Check how many crates compile in parallel

cargo build -j$(nproc) --timings # maximize parallelism

Rules for effective workspace splitting:

Break circular dependencies first
Separate proc-macros into their own crate (they block everything)
Keep frequently-changed code isolated (less invalidation)

LTO configuration

LTO improves runtime performance but increases link time:

Cargo.toml profile configuration

[profile.release] lto = "thin" # thin LTO: good performance, much faster than "fat" codegen-units = 1 # needed for best optimization (but disables parallelism)

[profile.release-fast] inherits = "release" lto = "fat" # full LTO: maximum performance, very slow link

[profile.dev] lto = "off" # never use LTO in dev (compilation speed) codegen-units = 16 # maximize parallel codegen in dev

LTO comparison:

Setting Link time Runtime perf Use when

lto = false

Fast Baseline Dev builds

lto = "thin"

Moderate +5–15% Most release builds

lto = "fat"

Slow +15–30% Maximum performance

codegen-units = 1

Slowest Best With LTO for release

Fast linkers

The linker is often the bottleneck for large Rust projects:

mold — fastest general-purpose linker (Linux)

sudo apt-get install mold

.cargo/config.toml

[target.x86_64-unknown-linux-gnu] linker = "clang" rustflags = ["-C", "link-arg=-fuse-ld=mold"]

Or use cargo-zigbuild (uses zig cc as linker)

cargo install cargo-zigbuild cargo zigbuild --release

lld — LLVM's linker (faster than GNU ld, available everywhere)

.cargo/config.toml

[target.x86_64-unknown-linux-gnu] rustflags = ["-C", "link-arg=-fuse-ld=lld"]

On macOS: zld or the default lld

[target.x86_64-apple-darwin] rustflags = ["-C", "link-arg=-fuse-ld=/usr/local/bin/zld"]

Linker speed comparison (large project, typical):

GNU ld: baseline
lld: ~2× faster
mold: ~5–10× faster
gold: ~1.5× faster

Other quick wins

Reduce debug info level (faster but less debuggable)

Cargo.toml

[profile.dev] debug = 1 # 0=off, 1=line tables, 2=full (default)

debug=1 saves 20-40% on debug build time

Split debug info (reduces linker input)

[profile.dev] split-debuginfo = "unpacked" # macOS: equivalent of gsplit-dwarf

Disable incremental compilation (sometimes faster for full rebuilds)

CARGO_INCREMENTAL=0 cargo build

Reduce proc-macro compile time (pin heavy proc-macro deps)

Heavy proc-macros: serde, tokio, axum — keep versions stable

Related skills

Use skills/rust/cargo-workflows for Cargo workspace and profile configuration
Use skills/build-systems/build-acceleration for C/C++ equivalent build acceleration
Use skills/debuggers/dwarf-debug-format for debug info size/split-dwarf tradeoffs
Use skills/binaries/linkers-lto for LTO internals

rust-build-times

Safety Notice

Copy this and send it to your AI assistant to learn

Build with timing report

Opens build/cargo-timings/cargo-timing.html

Shows: crate compilation timeline, parallelism, bottlenecks

For release builds

Key things to look for in the timing report:

- Long sequential chains (no parallelism)

- Individual crates taking > 10s (candidates for optimization)

- Proc-macro crates blocking everything downstream

cargo-llvm-lines — count LLVM IR lines per function (monomorphization)

Shows functions generating the most LLVM IR (template explosion)

Install

or: brew install sccache

Configure for Rust builds

Add to .cargo/config.toml (project or global)

~/.cargo/config.toml

Check cache stats

S3 backend for CI teams

GitHub Actions with sccache

- uses: mozilla-actions/sccache-action@v0.0.4

Install nightly (Cranelift requires nightly for now)

Use Cranelift for dev builds only

.cargo/config.toml

Use per-build

Before: one giant crate

After: workspace with parallel crates

Visualize dependency graph

Check how many crates compile in parallel

Cargo.toml profile configuration

mold — fastest general-purpose linker (Linux)

.cargo/config.toml

Or use cargo-zigbuild (uses zig cc as linker)

lld — LLVM's linker (faster than GNU ld, available everywhere)

.cargo/config.toml

On macOS: zld or the default lld

Reduce debug info level (faster but less debuggable)

Cargo.toml

debug=1 saves 20-40% on debug build time

Split debug info (reduces linker input)

Disable incremental compilation (sometimes faster for full rebuilds)

Reduce proc-macro compile time (pin heavy proc-macro deps)

Heavy proc-macros: serde, tokio, axum — keep versions stable

Source Transparency

Related Skills

cmake

static-analysis

llvm

gdb