LLVM IR and Tooling

Purpose

Guide agents through the LLVM IR pipeline: generating IR, running optimisation passes with opt , lowering to assembly with llc , and inspecting IR for debugging or performance work.

Triggers

"Show me the LLVM IR for this function"
"How do I run an LLVM optimisation pass?"
"What does this LLVM IR instruction mean?"
"How do I write a custom LLVM pass?"
"Why isn't auto-vectorisation happening in LLVM?"

Workflow

Generate LLVM IR

Emit textual IR (.ll)

clang -O0 -emit-llvm -S src.c -o src.ll

Emit bitcode (.bc)

clang -O2 -emit-llvm -c src.c -o src.bc

Disassemble bitcode to text

llvm-dis src.bc -o src.ll

Run optimisation passes with opt

Apply a specific pass

opt -passes='mem2reg,instcombine,simplifycfg' src.ll -S -o out.ll

Standard optimisation pipelines

opt -passes='default<O2>' src.ll -S -o out.ll opt -passes='default<O3>' src.ll -S -o out.ll

List available passes

opt --print-passes 2>&1 | less

Print IR before and after a pass

opt -passes='instcombine' --print-before=instcombine --print-after=instcombine src.ll -S -o out.ll 2>&1 | less

Lower IR to assembly with llc

Compile IR to object file

llc -filetype=obj src.ll -o src.o

Compile to assembly

llc -filetype=asm -masm-syntax=intel src.ll -o src.s

Target a specific CPU

llc -mcpu=skylake -mattr=+avx2 src.ll -o src.s

Show available targets

llc --version

Inspect IR

Key IR constructs to understand:

Construct Meaning

alloca

Stack allocation (pre-SSA; mem2reg promotes to registers)

load /store

Memory access

getelementptr (GEP) Pointer arithmetic / field access

phi

SSA φ-node: merges values from predecessor blocks

call /invoke

Function call (invoke has exception edges)

icmp /fcmp

Integer/float comparison

Branch (conditional or unconditional)

ret

Return

bitcast

Reinterpret bits (no-op in codegen)

ptrtoint /inttoptr

Pointer↔integer (avoid where possible)

Key passes

Pass Effect

mem2reg

Promote alloca to SSA registers

instcombine

Instruction combining / peephole

simplifycfg

CFG cleanup, dead block removal

loop-vectorize

Auto-vectorisation

slp-vectorize

Superword-level parallelism (straight-line vectorisation)

inline

Function inlining

gvn

Global value numbering (common subexpression elimination)

licm

Loop-invariant code motion

loop-unroll

Loop unrolling

argpromotion

Promote pointer args to values

sroa

Scalar Replacement of Aggregates

Debugging missed optimisations

Why was a loop not vectorised?

clang -O2 -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize src.c

Dump pass pipeline

clang -O2 -mllvm -debug-pass=Structure src.c -o /dev/null 2>&1 | less

Print IR after each pass (very verbose)

opt -passes='default<O2>' -print-after-all src.ll -S 2>&1 | less

Useful llvm tools

Tool Purpose

llvm-dis

Bitcode → textual IR

llvm-as

Textual IR → bitcode

llvm-link

Link multiple bitcode files

llvm-lto

Standalone LTO

llvm-nm

Symbols in bitcode/object

llvm-objdump

Disassemble objects

llvm-profdata

Merge/show PGO profiles

llvm-cov

Coverage reporting

llvm-mca

Machine code analyser (throughput/latency)

For binutils equivalents, see skills/binaries/binutils .

Related skills

Use skills/compilers/clang for source-level Clang flags
Use skills/binaries/linkers-lto for LTO at link time
Use skills/profilers/linux-perf combined with llvm-mca for micro-architectural analysis

llvm

Safety Notice

Copy this and send it to your AI assistant to learn