LLVM Optimization Skill
This skill covers LLVM optimization infrastructure, pass development, and performance tuning techniques.
Optimization Pipeline Overview
Pipeline Stages
Source → Frontend → LLVM IR → Optimization Passes → CodeGen → Machine Code ↓ [Transform Passes] [Analysis Passes]
Optimization Levels
No optimization
clang -O0 source.c
Basic optimization (most optimizations enabled)
clang -O1 source.c
Full optimization (aggressive inlining, vectorization)
clang -O2 source.c
Maximum optimization (may increase code size)
clang -O3 source.c
Size optimization
clang -Os source.c # Optimize for size clang -Oz source.c # Aggressive size optimization
Core Optimization Passes
Scalar Optimizations
-
Constant Propagation: Replace variables with known constant values
-
Dead Code Elimination (DCE): Remove unreachable or unused code
-
Common Subexpression Elimination (CSE): Avoid redundant computations
-
Instruction Combining: Merge multiple instructions into simpler forms
-
Scalar Replacement of Aggregates (SROA): Break up aggregate allocations
Loop Optimizations
-
Loop Invariant Code Motion (LICM): Hoist invariant computations
-
Loop Unrolling: Duplicate loop body to reduce overhead
-
Loop Vectorization: Convert scalar loops to vector operations
-
Loop Fusion/Fission: Combine or split loops
-
Induction Variable Simplification: Optimize loop counters
Interprocedural Optimizations
-
Inlining: Replace call sites with function body
-
Dead Argument Elimination: Remove unused function parameters
-
Interprocedural Constant Propagation: Propagate constants across functions
-
Link-Time Optimization (LTO): Whole-program optimization
Writing Custom Optimization Passes
New Pass Manager (LLVM 13+)
#include "llvm/IR/PassManager.h" #include "llvm/Passes/PassBuilder.h" #include "llvm/Passes/PassPlugin.h"
struct MyOptimizationPass : public llvm::PassInfoMixin<MyOptimizationPass> { llvm::PreservedAnalyses run(llvm::Function &F, llvm::FunctionAnalysisManager &FAM) { bool Changed = false;
for (auto &BB : F) {
for (auto &I : BB) {
// Implement optimization logic
if (optimizeInstruction(I)) {
Changed = true;
}
}
}
if (Changed)
return llvm::PreservedAnalyses::none();
return llvm::PreservedAnalyses::all();
}
private: bool optimizeInstruction(llvm::Instruction &I) { // Example: Replace add x, 0 with x if (auto *BinOp = llvm::dyn_cast<llvm::BinaryOperator>(&I)) { if (BinOp->getOpcode() == llvm::Instruction::Add) { if (auto *C = llvm::dyn_cast<llvm::ConstantInt>(BinOp->getOperand(1))) { if (C->isZero()) { I.replaceAllUsesWith(BinOp->getOperand(0)); return true; } } } } return false; } };
// Plugin registration extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo llvmGetPassPluginInfo() { return {LLVM_PLUGIN_API_VERSION, "MyOptPass", LLVM_VERSION_STRING, [](llvm::PassBuilder &PB) { PB.registerPipelineParsingCallback( [](llvm::StringRef Name, llvm::FunctionPassManager &FPM, llvm::ArrayRef<llvm::PassBuilder::PipelineElement>) { if (Name == "my-opt") { FPM.addPass(MyOptimizationPass()); return true; } return false; }); }}; }
Analysis Dependencies
struct MyAnalysis : public llvm::AnalysisInfoMixin<MyAnalysis> { using Result = MyAnalysisResult;
Result run(llvm::Function &F, llvm::FunctionAnalysisManager &FAM) {
// Compute analysis result
return Result();
}
static llvm::AnalysisKey Key;
};
// Using analysis in a pass llvm::PreservedAnalyses run(llvm::Function &F, llvm::FunctionAnalysisManager &FAM) { auto &DT = FAM.getResult<llvm::DominatorTreeAnalysis>(F); auto &LI = FAM.getResult<llvm::LoopAnalysis>(F); auto &AA = FAM.getResult<llvm::AAManager>(F);
// Use analysis results...
}
Instruction Patterns
Strength Reduction
// Replace expensive operations with cheaper ones // x * 2 → x << 1 // x / 4 → x >> 2 // x % 8 → x & 7
bool reduceStrength(llvm::BinaryOperator *BO) { if (BO->getOpcode() == llvm::Instruction::Mul) { if (auto *C = llvm::dyn_cast<llvm::ConstantInt>(BO->getOperand(1))) { if (C->getValue().isPowerOf2()) { unsigned Shift = C->getValue().exactLogBase2(); auto *Shl = llvm::BinaryOperator::CreateShl( BO->getOperand(0), llvm::ConstantInt::get(C->getType(), Shift)); BO->replaceAllUsesWith(Shl); return true; } } } return false; }
Algebraic Simplification
// x + 0 → x // x * 1 → x // x * 0 → 0 // x - x → 0 // x | x → x // x & 0 → 0
Dominator Tree Usage
Finding Optimization Opportunities
void optimizeWithDominators(llvm::Function &F, llvm::DominatorTree &DT) { // Use dominance for safe code motion for (auto &BB : F) { for (auto &I : BB) { if (auto *Load = llvm::dyn_cast<llvm::LoadInst>(&I)) { // Check if we can hoist this load if (canHoist(Load, DT)) { hoistInstruction(Load, DT); } } } } }
bool canHoist(llvm::Instruction *I, llvm::DominatorTree &DT) { llvm::BasicBlock *DefBB = I->getParent();
// Check all uses are dominated
for (auto *U : I->users()) {
if (auto *UI = llvm::dyn_cast<llvm::Instruction>(U)) {
if (!DT.dominates(DefBB, UI->getParent())) {
return false;
}
}
}
return true;
}
Loop Optimization Techniques
Loop Analysis
void analyzeLoops(llvm::Function &F, llvm::LoopInfo &LI) { for (auto *L : LI) { // Get loop trip count if (auto *TC = L->getTripCount()) { llvm::errs() << "Trip count: " << *TC << "\n"; }
// Check if loop is simple
if (L->isLoopSimplifyForm()) {
llvm::BasicBlock *Header = L->getHeader();
llvm::BasicBlock *Latch = L->getLoopLatch();
llvm::BasicBlock *Exit = L->getExitBlock();
}
// Get induction variables
llvm::PHINode *IV = L->getCanonicalInductionVariable();
}
}
Loop Unrolling
// Manually trigger loop unrolling #pragma unroll 4 for (int i = 0; i < N; i++) { // Loop body will be unrolled 4x }
// LLVM unroll metadata !llvm.loop.unroll.count = !{i32 4}
Vectorization
Auto-Vectorization Hints
// Enable vectorization #pragma clang loop vectorize(enable) for (int i = 0; i < N; i++) { a[i] = b[i] + c[i]; }
// Specify vector width #pragma clang loop vectorize_width(8) for (int i = 0; i < N; i++) { a[i] = b[i] * c[i]; }
SLP Vectorization
Superword Level Parallelism - vectorize straight-line code:
// Before SLP a[0] = b[0] + c[0]; a[1] = b[1] + c[1]; a[2] = b[2] + c[2]; a[3] = b[3] + c[3];
// After SLP (conceptual) <4 x float> tmp = load <4 x float> b <4 x float> tmp2 = load <4 x float> c <4 x float> result = fadd tmp, tmp2 store result to a
Debugging Optimizations
Viewing Pass Execution
Print passes being run
opt -debug-pass-manager input.ll -O2
Print IR after each pass
opt -print-after-all input.ll -O2
Print specific pass output
opt -print-after=instcombine input.ll -O2
Statistics
opt -stats input.ll -O2
Optimization Remarks
Enable all optimization remarks
clang -Rpass=.* source.c
Specific remarks
clang -Rpass=loop-vectorize source.c clang -Rpass-missed=inline source.c clang -Rpass-analysis=loop-vectorize source.c
Link-Time Optimization (LTO)
Enabling LTO
Full LTO
clang -flto source1.c source2.c -o program
Thin LTO (faster, parallel)
clang -flto=thin source1.c source2.c -o program
LTO Benefits
-
Whole-program dead code elimination
-
Interprocedural constant propagation
-
Cross-module inlining
-
Better devirtualization
Correctness Verification
Alive2
Automatic verification of LLVM optimizations:
Verify transformation correctness
alive-tv before.ll after.ll
Check specific optimization
opt -instcombine input.ll | alive-tv input.ll -
Resources
See Optimization section in README.md for specific commits and optimization-related projects.
Getting Detailed Information
When you need detailed and up-to-date resource links, tool lists, or project references, fetch the latest data from:
https://raw.githubusercontent.com/gmh5225/awesome-llvm-security/refs/heads/main/README.md
This README contains comprehensive curated lists of:
-
LLVM optimization commits and patches (Optimization section)
-
Alive2 and verification tools
-
Optimization courses and tutorials (CSCD70)