LLVM IR to Asm Part 1

780 阅读2分钟

This aritcle is to show how a simple LLVM IR example is transformed into RISCV Assembly.

1 LLVM IR Example: a.ll

define i32 @main(i32 signext %argc, i8** %argv) {
%1 = alloca i32
store i32 1, i32* %1
ret i32 0
}

2 The transformed RISCV Assembly: a.s

    .text
.attribute	4, 16
.attribute	5, "rv32i2p0"
.file	"localvar.ll"
.globl	main                            # -- Begin function main
.p2align	2
.type	main,@function
 main:                                   # @main
.cfi_startproc
# %bb.0:
addi	sp, sp, -16
.cfi_def_cfa_offset 16
addi	a0, zero, 1
sw	a0, 12(sp)
mv	a0, zero
addi	sp, sp, 16
ret
.Lfunc_end0:
.size	main, .Lfunc_end0-main
.cfi_endproc
                                        # -- End function
.section	".note.GNU-stack","",@progbits

3 Tranformation Command:

llc\
    -march=riscv32 -relocation-model=pic -filetype=asm --fast-isel=false --global-isel=false\
    a.ll\
    -o a.s

--fast-isel=false --global-isel=false is to enable DAG-Based Instruction Selection only in order to see how the DAGIsel works.

4 Steps for ll-to-asm transformation

  1. a.ll — llc loads file –> LLVM IR memory representation.
  2. DAG Lowering in TargetLowering –> Initailized DAGs.
  3. DAG Legalization –> Legalized DAG, (Type Legalization, Operation Legalization).
  4. Instruction Selection (DAG-Based) –> DAG with RISCV Instructions.
  5. Instruction Scheduling (pre-register allocation) –> MachineInstr.
  6. SSA-Based optimazation –> Optimized MachineInstr.
  7. Register Allocation –> MachineInstr with physical register.
  8. Instruction Scheduling (post-register allocation) –> MachineInstr with physical register.
  9. Pro-/Epilogue Insertion –> MachineInstr with resolved stack references.
  10. Peehole Optimazation –> Optimized MachineInstr with resolved stack references.
  11. Code Emission –> RISCV Assembly.

In Overview of the Swift Compiler, there is a graph showing above steps, and related WTSC series articles shows how a swift source file is transformed into an exexcutable.

Moreover, Life of an instruction in LLVM and A deeper look into the LLVM code generator also have a detailed description about the tranformation process.

This article is to focus on the backend part of tranformation for RISCV ISA started from LLVM IR to RISCV Assembly with greater details.

5 Three parts of coding.

Ultimately, an LLVM backend is a C++ program or library, that means all the coding or information to coding becomes C++ source code, and get compiled. Roughly based on where we need to supply information, I separate all coding into three parts.

  1. LLVM Backend Infrastructure source files. Those coding are offered by LLVM project, we don't need to modify them mostly.
  2. Tablegen Target desciption files. Those coding is to supply Information of an Instruction Set Architecture(ISA) to LLVM Backend Infrastructure, which would be transformed into C++ source files. (including instruction, register, calling convention, scheduling etc.)
  3. C++ source files we need to code under lib/Target/<TargetISA> directory. (Our example is lib/Target/RISCV)

Through the tranformation, I will show that which part of code is taken effect in which file. And set up a bunch of breakpoints in lldb to see the intermediate results or states.

6 LLDB source file:

I set the breakpoints in LLDB source file to list all the breakpoints we need and with comments. Here is desciption of LLDB source file.

--source <file>
Tells the debugger to read in and execute the lldb commands in the given file, after any file has been loaded.

The command to invoke LLDB for llc to compile a.ll to a.s:

lldb \
    # lldb source file where we set breakpoints.
    -s source.lldb --\
        # path to a debugable llc.
    llc -march=riscv32 -relocation-model=pic -filetype=asm --fast-isel=false --global-isel=false\
        # path to a.ll if a.ll is not in current working directory.
    a.ll\
        # path to a.s as output of llc if not in current working directory.
    -o a.s