Rust Profiling

Purpose

Guide agents through Rust performance profiling: flamegraphs via cargo-flamegraph, binary size analysis, monomorphization bloat measurement, Criterion microbenchmarks, and interpreting profiling results with inlined Rust frames.

Triggers

"How do I generate a flamegraph for a Rust program?"
"My Rust binary is huge — how do I find what's causing it?"
"How do I write Criterion benchmarks?"
"How do I measure monomorphization bloat?"
"Rust performance is worse than expected — how do I profile it?"
"How do I use perf with Rust?"

Workflow

Build for profiling

Release with debug symbols (needed for readable profiles)

Cargo.toml:

[profile.release-with-debug] inherits = "release" debug = true

cargo build --profile release-with-debug

Or quick: release + debug info inline

CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release

Flamegraphs with cargo-flamegraph

Install

cargo install flamegraph

Linux: uses perf (requires perf_event_paranoid ≤ 1)

sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid' cargo flamegraph --bin myapp -- arg1 arg2

macOS: uses DTrace (requires sudo)

sudo cargo flamegraph --bin myapp -- arg1 arg2

Profile tests

cargo flamegraph --test mytest -- test_filter

Profile benchmarks

cargo flamegraph --bench mybench -- --bench

Output

Generates flamegraph.svg in current directory

Open in browser: firefox flamegraph.svg

Custom flamegraph options:

More samples

cargo flamegraph --freq 1000 --bin myapp

Filter to specific threads

cargo flamegraph --bin myapp -- args 2>/dev/null

Using perf directly for more control

perf record -g -F 999 ./target/release-with-debug/myapp args perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg

Binary size analysis with cargo-bloat

Install

cargo install cargo-bloat

Show top functions by size

cargo bloat --release -n 20

Show per-crate size breakdown

cargo bloat --release --crates

Include only specific crate

cargo bloat --release --filter myapp

Compare before/after a change

cargo bloat --release --crates > before.txt

make changes

cargo bloat --release --crates > after.txt diff before.txt after.txt

Typical output:

File .text Size Crate Name 2.4% 3.0% 47.0KiB std <std macros> 1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process 1.2% 1.5% 23.1KiB serde serde::de::...

Monomorphization bloat with cargo-llvm-lines

Install

cargo install cargo-llvm-lines

Show LLVM IR line counts (proxy for monomorphization)

cargo llvm-lines --release | head -40

Filter to your crate only

cargo llvm-lines --release | grep '^myapp'

Typical output:

Lines Copies Function name 85330 1 [LLVM passes] 7761 92 core::fmt::write 4672 11 myapp::process::<impl MyTrait for T> 3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop

High Copies count = monomorphization expansion. Fix:

// Before: generic, gets monomorphized for every T fn process<T: AsRef<[u8]>>(data: T) -> usize { do_work(data.as_ref()) }

// After: thin generic wrapper + concrete inner fn process<T: AsRef<[u8]>>(data: T) -> usize { fn inner(data: &[u8]) -> usize { do_work(data) } inner(data.as_ref()) }

Criterion microbenchmarks

Cargo.toml

[dev-dependencies] criterion = { version = "0.5", features = ["html_reports"] }

[[bench]] name = "my_bench" harness = false

// benches/my_bench.rs use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};

fn bench_process(c: &mut Criterion) { // Simple benchmark c.bench_function("process 1000 items", |b| { let data: Vec<i32> = (0..1000).collect(); b.iter(|| process(black_box(&data))) // black_box prevents optimization }); }

fn bench_sizes(c: &mut Criterion) { let mut group = c.benchmark_group("process_sizes");

for size in [100, 1000, 10000].iter() {
    let data: Vec&#x3C;i32> = (0..*size).collect();
    group.bench_with_input(
        BenchmarkId::from_parameter(size),
        &#x26;data,
        |b, data| b.iter(|| process(black_box(data))),
    );
}
group.finish();

}

criterion_group!(benches, bench_process, bench_sizes); criterion_main!(benches);

Run all benchmarks

cargo bench

Run specific benchmark

cargo bench --bench my_bench

Run with filter

cargo bench -- process_sizes

Compare with baseline (save/load)

cargo bench -- --save-baseline before

make changes

cargo bench -- --baseline before

View HTML report

open target/criterion/report/index.html

perf with Rust (Linux)

Record

perf record -g ./target/release-with-debug/myapp args perf record -g -F 999 ./target/release-with-debug/myapp args # higher freq

Report

perf report # interactive TUI perf report --stdio --no-call-graph | head -40 # text

Annotate specific function

perf annotate myapp::hot_function

stat (quick counters)

perf stat ./target/release/myapp args

Rust-specific perf tips:

Build with debug = 1 (line tables only) for faster builds with line-level attribution
Use RUSTFLAGS="-C force-frame-pointers=yes" for better call graphs without DWARF unwinding
Disable ASLR for reproducible addresses: setarch $(uname -m) -R ./myapp

heaptrack / DHAT for allocations

heaptrack (Linux)

heaptrack ./target/release/myapp args heaptrack_print heaptrack.myapp.*.zst | head -50

DHAT via Valgrind

valgrind --tool=dhat ./target/debug/myapp args

Open dhat-out.* with dh_view.html

For flamegraph setup and Criterion configuration, see references/cargo-flamegraph-setup.md.

Related skills

Use skills/rust/rustc-basics for build configuration (debug symbols, profiles)
Use skills/profilers/linux-perf for perf fundamentals
Use skills/profilers/flamegraphs for reading and interpreting flamegraph SVGs
Use skills/profilers/valgrind for allocation profiling with massif/DHAT

rust-profiling

Safety Notice

Copy this and send it to your AI assistant to learn

Release with debug symbols (needed for readable profiles)

Cargo.toml:

Or quick: release + debug info inline

Install

Linux: uses perf (requires perf_event_paranoid ≤ 1)

macOS: uses DTrace (requires sudo)

Profile tests

Profile benchmarks

Output

Generates flamegraph.svg in current directory

Open in browser: firefox flamegraph.svg

More samples

Filter to specific threads

Using perf directly for more control

Install

Show top functions by size

Show per-crate size breakdown

Include only specific crate

Compare before/after a change

make changes

Install

Show LLVM IR line counts (proxy for monomorphization)

Filter to your crate only

Cargo.toml

Run all benchmarks

Run specific benchmark

Run with filter

Compare with baseline (save/load)

make changes

View HTML report

Record

Report

Annotate specific function

stat (quick counters)

heaptrack (Linux)

DHAT via Valgrind

Open dhat-out.* with dh_view.html

Source Transparency

Related Skills

cmake

static-analysis

llvm