Rust Profiling
Purpose
Guide agents through Rust performance profiling: flamegraphs via cargo-flamegraph, binary size analysis, monomorphization bloat measurement, Criterion microbenchmarks, and interpreting profiling results with inlined Rust frames.
Triggers
-
"How do I generate a flamegraph for a Rust program?"
-
"My Rust binary is huge — how do I find what's causing it?"
-
"How do I write Criterion benchmarks?"
-
"How do I measure monomorphization bloat?"
-
"Rust performance is worse than expected — how do I profile it?"
-
"How do I use perf with Rust?"
Workflow
- Build for profiling
Release with debug symbols (needed for readable profiles)
Cargo.toml:
[profile.release-with-debug] inherits = "release" debug = true
cargo build --profile release-with-debug
Or quick: release + debug info inline
CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release
- Flamegraphs with cargo-flamegraph
Install
cargo install flamegraph
Linux: uses perf (requires perf_event_paranoid ≤ 1)
sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid' cargo flamegraph --bin myapp -- arg1 arg2
macOS: uses DTrace (requires sudo)
sudo cargo flamegraph --bin myapp -- arg1 arg2
Profile tests
cargo flamegraph --test mytest -- test_filter
Profile benchmarks
cargo flamegraph --bench mybench -- --bench
Output
Generates flamegraph.svg in current directory
Open in browser: firefox flamegraph.svg
Custom flamegraph options:
More samples
cargo flamegraph --freq 1000 --bin myapp
Filter to specific threads
cargo flamegraph --bin myapp -- args 2>/dev/null
Using perf directly for more control
perf record -g -F 999 ./target/release-with-debug/myapp args perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg
- Binary size analysis with cargo-bloat
Install
cargo install cargo-bloat
Show top functions by size
cargo bloat --release -n 20
Show per-crate size breakdown
cargo bloat --release --crates
Include only specific crate
cargo bloat --release --filter myapp
Compare before/after a change
cargo bloat --release --crates > before.txt
make changes
cargo bloat --release --crates > after.txt diff before.txt after.txt
Typical output:
File .text Size Crate Name 2.4% 3.0% 47.0KiB std <std macros> 1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process 1.2% 1.5% 23.1KiB serde serde::de::...
- Monomorphization bloat with cargo-llvm-lines
Install
cargo install cargo-llvm-lines
Show LLVM IR line counts (proxy for monomorphization)
cargo llvm-lines --release | head -40
Filter to your crate only
cargo llvm-lines --release | grep '^myapp'
Typical output:
Lines Copies Function name 85330 1 [LLVM passes] 7761 92 core::fmt::write 4672 11 myapp::process::<impl MyTrait for T> 3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop
High Copies count = monomorphization expansion. Fix:
// Before: generic, gets monomorphized for every T fn process<T: AsRef<[u8]>>(data: T) -> usize { do_work(data.as_ref()) }
// After: thin generic wrapper + concrete inner fn process<T: AsRef<[u8]>>(data: T) -> usize { fn inner(data: &[u8]) -> usize { do_work(data) } inner(data.as_ref()) }
- Criterion microbenchmarks
Cargo.toml
[dev-dependencies] criterion = { version = "0.5", features = ["html_reports"] }
[[bench]] name = "my_bench" harness = false
// benches/my_bench.rs use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
fn bench_process(c: &mut Criterion) { // Simple benchmark c.bench_function("process 1000 items", |b| { let data: Vec<i32> = (0..1000).collect(); b.iter(|| process(black_box(&data))) // black_box prevents optimization }); }
fn bench_sizes(c: &mut Criterion) { let mut group = c.benchmark_group("process_sizes");
for size in [100, 1000, 10000].iter() {
let data: Vec<i32> = (0..*size).collect();
group.bench_with_input(
BenchmarkId::from_parameter(size),
&data,
|b, data| b.iter(|| process(black_box(data))),
);
}
group.finish();
}
criterion_group!(benches, bench_process, bench_sizes); criterion_main!(benches);
Run all benchmarks
cargo bench
Run specific benchmark
cargo bench --bench my_bench
Run with filter
cargo bench -- process_sizes
Compare with baseline (save/load)
cargo bench -- --save-baseline before
make changes
cargo bench -- --baseline before
View HTML report
open target/criterion/report/index.html
- perf with Rust (Linux)
Record
perf record -g ./target/release-with-debug/myapp args perf record -g -F 999 ./target/release-with-debug/myapp args # higher freq
Report
perf report # interactive TUI perf report --stdio --no-call-graph | head -40 # text
Annotate specific function
perf annotate myapp::hot_function
stat (quick counters)
perf stat ./target/release/myapp args
Rust-specific perf tips:
-
Build with debug = 1 (line tables only) for faster builds with line-level attribution
-
Use RUSTFLAGS="-C force-frame-pointers=yes" for better call graphs without DWARF unwinding
-
Disable ASLR for reproducible addresses: setarch $(uname -m) -R ./myapp
- heaptrack / DHAT for allocations
heaptrack (Linux)
heaptrack ./target/release/myapp args heaptrack_print heaptrack.myapp.*.zst | head -50
DHAT via Valgrind
valgrind --tool=dhat ./target/debug/myapp args
Open dhat-out.* with dh_view.html
For flamegraph setup and Criterion configuration, see references/cargo-flamegraph-setup.md.
Related skills
-
Use skills/rust/rustc-basics for build configuration (debug symbols, profiles)
-
Use skills/profilers/linux-perf for perf fundamentals
-
Use skills/profilers/flamegraphs for reading and interpreting flamegraph SVGs
-
Use skills/profilers/valgrind for allocation profiling with massif/DHAT