linux-perf

Guide agents through perf for CPU profiling: sampling, hardware counter measurement, hotspot identification, and integration with flamegraph generation.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "linux-perf" with this command: npx skills add mohitmishra786/low-level-dev-skills/mohitmishra786-low-level-dev-skills-linux-perf

Linux perf

Purpose

Guide agents through perf for CPU profiling: sampling, hardware counter measurement, hotspot identification, and integration with flamegraph generation.

Triggers

  • "Which function is consuming the most CPU?"

  • "How do I measure cache misses / IPC?"

  • "How do I use perf to find hotspots?"

  • "How do I generate a flamegraph from perf data?"

  • "perf shows [unknown] or [kernel] frames"

Workflow

  1. Prerequisites

Install

sudo apt install linux-perf # Debian/Ubuntu (version-matched) sudo dnf install perf # Fedora/RHEL

Check permissions

By default perf requires root or paranoid level ≤ 1

cat /proc/sys/kernel/perf_event_paranoid

2 = only CPU stats (not kernel), 1 = user+kernel, 0 = all, -1 = no restrictions

Temporarily lower (session only)

sudo sysctl -w kernel.perf_event_paranoid=1

Persistent

echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf sudo sysctl -p /etc/sysctl.d/99-perf.conf

Compile the target with debug symbols for useful frame data:

gcc -g -O2 -fno-omit-frame-pointer -o prog main.c

-fno-omit-frame-pointer: essential for frame-pointer-based unwinding

Alternative: compile with DWARF CFI and use --call-graph=dwarf

  1. perf stat — quick counters

Basic hardware counters

perf stat ./prog

With specific events

perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog

Wall-clock comparison: N runs

perf stat -r 5 ./prog

Attach to existing process

perf stat -p 12345 sleep 10

Interpret perf stat output:

  • IPC (instructions per cycle) < 1.0: memory-bound or stalled pipeline

  • cache-miss rate > 5%: significant cache pressure

  • branch-miss rate > 5%: branch predictor struggling

  1. perf record — sampling

Default: sample at 1000 Hz (cycles event)

perf record -g ./prog

Specify frequency

perf record -F 999 -g ./prog

Specific event

perf record -e cache-misses -g ./prog

Attach to running process

perf record -F 999 -g -p 12345 sleep 30

Off-CPU profiling (time spent waiting)

perf record -e sched:sched_switch -ag sleep 10

DWARF call graphs (better for binaries without frame pointers)

perf record -F 999 --call-graph=dwarf ./prog

Save to named file

perf record -o myapp.perf.data -g ./prog

  1. perf report — interactive analysis

perf report # reads perf.data perf report -i myapp.perf.data perf report --no-children # self time only (not cumulative) perf report --sort comm,dso,sym # sort by fields perf report --stdio # non-interactive text output

Navigation in TUI:

  • Enter — expand a symbol

  • a — annotate (show assembly with hit counts)

  • s — show source (needs debug info)

  • d — filter by DSO (library)

  • t — filter by thread

  • ? — help

  1. perf annotate — hot instructions

Show assembly with hit percentages

perf annotate sym_name

From report: press 'a' on a symbol

Or directly:

perf annotate -i perf.data --symbol=hot_function --stdio

High hit count on a mov or vmovdqa suggests a cache miss at that load.

  1. perf top — live profiling

Live top, like 'top' but for functions

sudo perf top -g

Filter by process

sudo perf top -p 12345

  1. Feed into flamegraphs

Generate perf script output

perf script > out.perf

Use Brendan Gregg's FlameGraph tools

git clone https://github.com/brendangregg/FlameGraph ./FlameGraph/stackcollapse-perf.pl out.perf > out.folded ./FlameGraph/flamegraph.pl out.folded > flamegraph.svg

Open flamegraph.svg in browser

See skills/profilers/flamegraphs for reading flamegraphs and interpreting results.

  1. Common issues

Problem Cause Fix

Permission denied

perf_event_paranoid too high Lower paranoid level or run with sudo

[unknown] frames Missing frame pointers or debug info Recompile with -fno-omit-frame-pointer or use --call-graph=dwarf

[kernel] everywhere Kernel symbols not visible Use sudo perf record ; install linux-image-$(uname -r)-dbgsym

No kallsyms

Kernel symbols unavailable `echo 0

Empty report for short program Program exits too fast Use -F 9999 or instrument longer workload

DWARF unwinding slow Large DWARF stack Limit with --call-graph dwarf,512

  1. Useful events

List all available events

perf list

Common hardware events

cycles instructions cache-references cache-misses branch-instructions branch-misses stalled-cycles-frontend stalled-cycles-backend

Software events

context-switches cpu-migrations page-faults

Tracepoints (requires root)

sched:sched_switch syscalls:sys_enter_read

For a counter reference and interpretation guide, see references/events.md.

Related skills

  • Use skills/profilers/flamegraphs for SVG flamegraph generation and reading

  • Use skills/profilers/valgrind for cache simulation and memory profiling

  • Use skills/compilers/gcc or skills/compilers/clang for PGO from perf data (AutoFDO)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

cmake

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

static-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

llvm

No summary provided by upstream source.

Repository SourceNeeds Review