Linux perf
Purpose
Guide agents through perf for CPU profiling: sampling, hardware counter measurement, hotspot identification, and integration with flamegraph generation.
Triggers
-
"Which function is consuming the most CPU?"
-
"How do I measure cache misses / IPC?"
-
"How do I use perf to find hotspots?"
-
"How do I generate a flamegraph from perf data?"
-
"perf shows [unknown] or [kernel] frames"
Workflow
- Prerequisites
Install
sudo apt install linux-perf # Debian/Ubuntu (version-matched) sudo dnf install perf # Fedora/RHEL
Check permissions
By default perf requires root or paranoid level ≤ 1
cat /proc/sys/kernel/perf_event_paranoid
2 = only CPU stats (not kernel), 1 = user+kernel, 0 = all, -1 = no restrictions
Temporarily lower (session only)
sudo sysctl -w kernel.perf_event_paranoid=1
Persistent
echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf sudo sysctl -p /etc/sysctl.d/99-perf.conf
Compile the target with debug symbols for useful frame data:
gcc -g -O2 -fno-omit-frame-pointer -o prog main.c
-fno-omit-frame-pointer: essential for frame-pointer-based unwinding
Alternative: compile with DWARF CFI and use --call-graph=dwarf
- perf stat — quick counters
Basic hardware counters
perf stat ./prog
With specific events
perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog
Wall-clock comparison: N runs
perf stat -r 5 ./prog
Attach to existing process
perf stat -p 12345 sleep 10
Interpret perf stat output:
-
IPC (instructions per cycle) < 1.0: memory-bound or stalled pipeline
-
cache-miss rate > 5%: significant cache pressure
-
branch-miss rate > 5%: branch predictor struggling
- perf record — sampling
Default: sample at 1000 Hz (cycles event)
perf record -g ./prog
Specify frequency
perf record -F 999 -g ./prog
Specific event
perf record -e cache-misses -g ./prog
Attach to running process
perf record -F 999 -g -p 12345 sleep 30
Off-CPU profiling (time spent waiting)
perf record -e sched:sched_switch -ag sleep 10
DWARF call graphs (better for binaries without frame pointers)
perf record -F 999 --call-graph=dwarf ./prog
Save to named file
perf record -o myapp.perf.data -g ./prog
- perf report — interactive analysis
perf report # reads perf.data perf report -i myapp.perf.data perf report --no-children # self time only (not cumulative) perf report --sort comm,dso,sym # sort by fields perf report --stdio # non-interactive text output
Navigation in TUI:
-
Enter — expand a symbol
-
a — annotate (show assembly with hit counts)
-
s — show source (needs debug info)
-
d — filter by DSO (library)
-
t — filter by thread
-
? — help
- perf annotate — hot instructions
Show assembly with hit percentages
perf annotate sym_name
From report: press 'a' on a symbol
Or directly:
perf annotate -i perf.data --symbol=hot_function --stdio
High hit count on a mov or vmovdqa suggests a cache miss at that load.
- perf top — live profiling
Live top, like 'top' but for functions
sudo perf top -g
Filter by process
sudo perf top -p 12345
- Feed into flamegraphs
Generate perf script output
perf script > out.perf
Use Brendan Gregg's FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph ./FlameGraph/stackcollapse-perf.pl out.perf > out.folded ./FlameGraph/flamegraph.pl out.folded > flamegraph.svg
Open flamegraph.svg in browser
See skills/profilers/flamegraphs for reading flamegraphs and interpreting results.
- Common issues
Problem Cause Fix
Permission denied
perf_event_paranoid too high Lower paranoid level or run with sudo
[unknown] frames Missing frame pointers or debug info Recompile with -fno-omit-frame-pointer or use --call-graph=dwarf
[kernel] everywhere Kernel symbols not visible Use sudo perf record ; install linux-image-$(uname -r)-dbgsym
No kallsyms
Kernel symbols unavailable `echo 0
Empty report for short program Program exits too fast Use -F 9999 or instrument longer workload
DWARF unwinding slow Large DWARF stack Limit with --call-graph dwarf,512
- Useful events
List all available events
perf list
Common hardware events
cycles instructions cache-references cache-misses branch-instructions branch-misses stalled-cycles-frontend stalled-cycles-backend
Software events
context-switches cpu-migrations page-faults
Tracepoints (requires root)
sched:sched_switch syscalls:sys_enter_read
For a counter reference and interpretation guide, see references/events.md.
Related skills
-
Use skills/profilers/flamegraphs for SVG flamegraph generation and reading
-
Use skills/profilers/valgrind for cache simulation and memory profiling
-
Use skills/compilers/gcc or skills/compilers/clang for PGO from perf data (AutoFDO)