Compiler Development Skill
This skill provides comprehensive knowledge of building compilers and language implementations using the LLVM infrastructure.
Compiler Architecture Overview
Classic Three-Phase Design
Source Code → Frontend → Middle-End (Optimizer) → Backend → Machine Code ↓ ↓ ↓ AST/IR LLVM IR Passes Target Code
Frontend Development
Lexical Analysis
// Token types for a simple language enum class TokenKind { Identifier, Number, String, Keyword, Operator, Punctuation, EndOfFile };
struct Token { TokenKind kind; std::string value; SourceLocation location; };
Parser Implementation
-
Recursive Descent: Easy to implement, good error messages
-
Operator Precedence Parsing: Efficient for expression parsing
-
LALR/LR: Use tools like Bison for complex grammars
AST Design
class Expr { public: virtual ~Expr() = default; virtual llvm::Value* codegen() = 0; };
class BinaryExpr : public Expr { std::unique_ptr<Expr> LHS, RHS; char Op; public: llvm::Value* codegen() override { llvm::Value* L = LHS->codegen(); llvm::Value* R = RHS->codegen();
switch (Op) {
case '+': return Builder.CreateFAdd(L, R, "addtmp");
case '-': return Builder.CreateFSub(L, R, "subtmp");
case '*': return Builder.CreateFMul(L, R, "multmp");
case '/': return Builder.CreateFDiv(L, R, "divtmp");
}
}
};
LLVM IR Generation
Module and Context Setup
#include "llvm/IR/LLVMContext.h" #include "llvm/IR/Module.h" #include "llvm/IR/IRBuilder.h"
class CodeGen { std::unique_ptr<llvm::LLVMContext> Context; std::unique_ptr<llvm::Module> Module; std::unique_ptr<llvm::IRBuilder<>> Builder;
public: CodeGen() { Context = std::make_unique<llvm::LLVMContext>(); Module = std::make_unique<llvm::Module>("my_module", *Context); Builder = std::make_unique<llvm::IRBuilder<>>(*Context); } };
Function Generation
llvm::Function* createFunction(const std::string& name, llvm::Type* returnType, std::vector<llvm::Type*> params) { llvm::FunctionType* FT = llvm::FunctionType::get(returnType, params, false); llvm::Function* F = llvm::Function::Create( FT, llvm::Function::ExternalLinkage, name, Module.get());
llvm::BasicBlock* BB = llvm::BasicBlock::Create(*Context, "entry", F);
Builder->SetInsertPoint(BB);
return F;
}
JIT Compilation
LLVM ORC JIT
#include "llvm/ExecutionEngine/Orc/LLJIT.h"
auto JIT = llvm::orc::LLJITBuilder().create(); if (!JIT) { handleError(JIT.takeError()); }
// Add module (*JIT)->addIRModule(llvm::orc::ThreadSafeModule( std::move(Module), std::move(Context)));
// Look up symbol and execute auto Sym = (JIT)->lookup("main"); auto MainFn = (int(*)())Sym->getAddress(); int result = MainFn();
Optimization Pass Pipeline
New Pass Manager (Recommended)
#include "llvm/Passes/PassBuilder.h"
void optimizeModule(llvm::Module& M) { llvm::PassBuilder PB; llvm::LoopAnalysisManager LAM; llvm::FunctionAnalysisManager FAM; llvm::CGSCCAnalysisManager CGAM; llvm::ModuleAnalysisManager MAM;
PB.registerModuleAnalyses(MAM);
PB.registerCGSCCAnalyses(CGAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
llvm::ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(
llvm::OptimizationLevel::O2);
MPM.run(M, MAM);
}
Custom Pass Implementation
struct MyPass : public llvm::PassInfoMixin<MyPass> { llvm::PreservedAnalyses run(llvm::Function& F, llvm::FunctionAnalysisManager& FAM) { for (auto& BB : F) { for (auto& I : BB) { // Transform instructions } } return llvm::PreservedAnalyses::none(); } };
Language Implementation Patterns
Memory-Safe Languages
-
Use LLVM's memory sanitizer hooks
-
Implement bounds checking with GEP introspection
-
Reference counting or garbage collection integration
Type Systems
-
Implement type inference during AST construction
-
Generate appropriate LLVM types (i32, float, struct, ptr)
-
Handle generic types via monomorphization or boxing
Error Handling
-
Generate exception handling via LLVM's landingpad/invoke
-
Implement Result/Option types as tagged unions
-
Use LLVM's personality functions for unwinding
Notable Language Implementations
Systems Languages
-
Rust: Complex borrow checker, trait system → LLVM
-
Zig: Comptime evaluation, safety features
-
Carbon: C++ interop, modern syntax
Scripting Languages
-
Julia: JIT-compiled scientific computing
-
Crystal: Ruby-like syntax, static typing
-
Nim: Python-like, multi-backend
Domain-Specific
-
Solidity: Ethereum smart contracts
-
MLIR: Multi-level IR for ML/AI workloads
-
Halide: Image processing DSL
Development Workflow
-
Start Simple: Begin with Kaleidoscope tutorial
-
Incremental Features: Add one language feature at a time
-
Test Extensively: Unit tests for each compiler phase
-
Use LLVM Tools: opt, llc, llvm-dis for debugging IR
-
Profile and Optimize: Focus on common code patterns
Resources
Official Tutorials
-
LLVM Kaleidoscope: Building a language from scratch
-
Clang internals: Frontend implementation patterns
-
Writing an LLVM Backend: Target code generation
Community Projects
See DIY Compiler section in README.md for 100+ example implementations across different language paradigms.
Getting Detailed Information
When you need detailed and up-to-date resource links, tool lists, or project references, fetch the latest data from:
https://raw.githubusercontent.com/gmh5225/awesome-llvm-security/refs/heads/main/README.md
This README contains comprehensive curated lists of:
-
100+ DIY compiler implementations (DIY Compiler section)
-
Toolchain configurations and IDE setup
-
Compiler development tutorials and books