After we get Machine Instructions, MachineInstrs, the compiler will optimize it to produce Optimized MachineInstrs, and then do register allocation for those MachineInstrs. This post skips that optimization to focus on Register Allocation.
In LLVM IR, due to SSA, there are infinite virtual registers for storing values. However, in physical machine, the number of registers are quite a few, therefore we need to transform those MachineInstrs, which use infinitive registers, so that the registers used can fit into specific target machine, e.g. AArch64.
The Register Allocation problem consists in mapping a program Pv, that can use an unbounded number of virtual registers, to a program Pp that contains a finite (possibly small) number of physical registers. Each target architecture has a different number of physical registers. If the number of physical registers is not enough to accommodate all the virtual registers, some of them will have to be mapped into memory. These virtuals are called spilled virtuals. -- The LLVM Target-Independent Code Generator
Let us dive in the code which implement Register Allocation for our example with lldb, stop at RegAllocFast::runOnMachineFunction.
/usr/bin/lldb -- \
./bin/swift-frontend \
-frontend \
-c \
-primary-file hello.swift \
-sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk \
-color-diagnostics \
-Xllvm --global-isel=false -Xllvm --fast-isel=false
(llbd) b RegAllocFast::runOnMachineFunction
(lldb) r
(lldb) settings set frame-format "#${frame.index}: ${ansi.fg.yellow}${ansi.normal}{{${function.name-without-args}{${frame.no-debug}${function.pc-offset}}}}{ at ${ansi.fg.cyan}${line.file.basename}${ansi.normal}:${ansi.fg.yellow}${line.number}${ansi.normal}{:${ansi.fg.yellow}${line.column}${ansi.normal}}}{${function.is-optimized} [opt]}{${frame.is-artificial} [artificial]}\n"
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* #0: (anonymous namespace)::RegAllocFast::runOnMachineFunction at RegAllocFast.cpp:1294:3
#1: llvm::MachineFunctionPass::runOnFunction at MachineFunctionPass.cpp:73:13
#2: llvm::FPPassManager::runOnFunction at LegacyPassManager.cpp:1516:27
#3: llvm::FPPassManager::runOnModule at LegacyPassManager.cpp:1552:16
#4: (anonymous namespace)::MPPassManager::runOnModule at LegacyPassManager.cpp:1617:27
#5: llvm::legacy::PassManagerImpl::run at LegacyPassManager.cpp:614:44
#6: llvm::legacy::PassManager::run at LegacyPassManager.cpp:1737:14
#7: swift::compileAndWriteLLVM at IRGen.cpp:605:14
#8: swift::performLLVM at IRGen.cpp:557:10
#9: generateCode at FrontendTool.cpp:1486:10
#10: performCompileStepsPostSILGen at FrontendTool.cpp:1637:10
#11: performCompileStepsPostSema at FrontendTool.cpp:773:17
#12: performAction(swift::CompilerInstance&, int&, swift::FrontendObserver*)::$_18::operator()(swift::CompilerInstance&) const at FrontendTool.cpp:1239:18
#13: llvm::function_ref<bool (swift::CompilerInstance&)>::callback_fn<performAction(swift::CompilerInstance&, int&, swift::FrontendObserver*)::$_18> at STLExtras.h:185:12
#14: llvm::function_ref<bool (swift::CompilerInstance&)>::operator() at STLExtras.h:203:12
#15: withSemanticAnalysis at FrontendTool.cpp:1109:10
#16: performAction at FrontendTool.cpp:1235:12
#17: performCompile at FrontendTool.cpp:1287:19
#18: swift::performFrontend at FrontendTool.cpp:2166:19
#19: run_driver at driver.cpp:153:14
#20: main at driver.cpp:348:12
#21: start + 4
The following is the code of the stopped function.
/* /wtsc/llvm-project/llvm/lib/CodeGen/RegAllocFast.cpp */
...
bool RegAllocFast::runOnMachineFunction(MachineFunction &MF) {
...
// Loop over all of the basic blocks, eliminating virtual register references
for (MachineBasicBlock &MBB : MF)
allocateBasicBlock(MBB);
...
}
...
By default for debug build of LLVM and Swift Compiler, the Register Allocator is Fast Register Allocator, and it is top-down local register allocation, that is the most heavily used values should stay at register. Moreover, 'local' means this allocator is focus on single basic block.
Each basic block is scanned from top to bottom, and virtual registers are assigned to physical registers as they appear. There are no live registers between blocks. Everything is spilled at the end of each block. -- LLVM Mail List
The following is allocateBasicBlock() method, which implement the register allocation algorithm of Fast Register Allocator.
/* /wtsc/llvm-project/llvm/lib/CodeGen/RegAllocFast.cpp */
...
void RegAllocFast::allocateBasicBlock(MachineBasicBlock &MBB) {
...
assert(LiveVirtRegs.empty() && "Mapping not cleared from last block?");
MachineBasicBlock::iterator MII = MBB.begin();
// Add live-in registers as live.
for (const MachineBasicBlock::RegisterMaskPair &LI : MBB.liveins())
if (MRI->isAllocatable(LI.PhysReg))
definePhysReg(MII, LI.PhysReg, regReserved);
...
// Otherwise, sequentially allocate each instruction in the MBB.
for (MachineInstr &MI : MBB) {
...
allocateInstruction(MI);
}
// Spill all physical registers holding virtual registers now.
...
spillAll(MBB.getFirstTerminator(), /*OnlyLiveOut*/ true);
...
}
This method is quite straightforward, first it checks if all live virtual registers of the last processed block is clear, then adds all live-in registers as live, afterward, it allocate each MachineInstr, later on, spill all unallocated virtual register in memory.
In addition, when it allocates MachineInstr, it uses some tricks make register allocation better with more aggressive hinting by peeking at futur instructions.
In void RegAllocFast::allocateInstruction(MachineInstr &MI) method, you can see the detail how it implements the algorithm roughly stated in Fast Register Allocation LLVM Mail List. I don't repeat it in this post.
There are three more register allocation algorithms in LLVM as well.
The LLVM infrastructure provides the application developer with three different register allocators:
Fast — This register allocator is the default for debug builds. It allocates registers on a basic block level, attempting to keep values in registers and reusing registers as appropriate.
Basic — This is an incremental approach to register allocation. Live ranges are assigned to registers one at a time in an order that is driven by heuristics. Since code can be rewritten on-the-fly during allocation, this framework allows interesting allocators to be developed as extensions. It is not itself a production register allocator but is a potentially useful stand-alone mode for triaging bugs and as a performance baseline.
Greedy — The default allocator. This is a highly tuned implementation of the Basic allocator that incorporates global live range splitting. This allocator works hard to minimize the cost of spill code.
PBQP — A Partitioned Boolean Quadratic Programming (PBQP) based register allocator. This allocator works by constructing a PBQP problem representing the register allocation problem under consideration, solving this using a PBQP solver, and mapping the solution back to a register assignment.
I will look through those other three algorithm as well in later posts. Stay tuned.
More References for Register Allocation in LLVM: