LLVM IR to Asm Part 5

857 阅读12分钟

11 Step Two: DAG Lowering (under RISCV DAG->DAG Pattern Instruction Selection )

In above Passes, The RISCV DAG->DAG Pattern Instruction Selection is the one for DAG-based Instruction Selection, and its corresponding code is in RISCVDAGToDAGISel:SelectionDAGISel, and its name is from RISCVDAGToDAGISel::getPassName().

The main goal of this step is to transform a sequence of IR instructions into a Direct-Acyclic Graph (DAG) for later pattern matching of Instruction Selection.

Because this step is under a pass, thus we can see the runOnMachineFunction() of RISCVDAGToDAGISel:SelectionDAGISel, which calls SelectionDAGISel::runOnMachineFunction().

Majorly, in SelectionDAGISel::runOnMachineFunction(), it does some preparation, initialize SelectionDAG *CurDAG and std::uniqueptr<SelectionDAGBuilder> SDB, then calls SelectionDAGISel::SelectALLBasicBlocks() for this function.

It calls SelectionDAGISel::LowerArgument(), which in turn calls RISCVTargetLowering::LowerFormalArguments(), to lower the arugments of this function, can construct corresponding SDNodes in CurDAG.

Because DAG-Based instruction selection is based on BasicBlock, it iterates all basic block of this function in reversed post order, and calls SelectionDAGISel::SelectBasicBlock() for each one.

In SelectionDAGISel::SelectBasicBlock(), it uses std::uniqueptr<SelectionDAGBuilder> SDB to visit each IR intruction to build a DAG for the given BasicBlock. There 67 instructions in the LLVM IR Instruction set, LLVM Language Reference Manual, defined in IR/Instruction.def.

The SelectionDAGBuilder visits each instruction, and construct corresponding SDNodes based on which instruction it is. When the SDB encounters the Ret IR, it will check if RISCVTargetLowering::CanLowerReturn() is true, then calls RISCVTargetLowering::LowerReturn(). If SDB encounters call IR, it will calls RISCVTargetLowering::LowerCall() to handle it. And RISCVTargetLowering::LowerOperation() is to lower the requested operation conventionally.

There are more hooks in TargetLowering to be overridden. And TargetLowering helps SDB to lower IR into a DAG for later steps.

After the SDB finishs its job, we get the initialized DAG that is ready to next step.

See LLVM SelectionDAG Image Generation and WTSC 11: Instruction Selection - SelectionDAG to visualize the created DAG.

The structure of a node (SDNode) of DAG(SelectionDAG) is in LLVM SDNode.

void SelectionDAGISel::SelectBasicBlock(BasicBlock::const_iterator Begin,
                                        BasicBlock::const_iterator End,
                                        bool &HadTailCall) {
  // Allow creating illegal types during DAG building for the basic block.
  CurDAG->NewNodesMustHaveLegalTypes = false;

// Lower the instructions. If a call is emitted as a tail call, cease emitting // nodes for this block. for (BasicBlock::const_iterator I = Begin; I != End && !SDB->HasTailCall; ++I) { if (!ElidedArgCopyInstrs.count(&*I)) SDB->visit(*I); }

// Make sure the root of the DAG is up-to-date. CurDAG->setRoot(SDB->getControlRoot()); HadTailCall = SDB->HasTailCall; SDB->resolveOrClearDbgInfo(); SDB->clear();

// Final step, emit the lowered DAG as machine code. CodeGenAndEmitDAG(); }

12 Step Three: DAG Legalization (under RISCV DAG->DAG Pattern Instruction Selection)

After we got the DAG, in SelectionDAGISel::CodeGenAndEmitDAG(), it will do DAG Combination, DAG Type Legalization, DAG Operation Legalization. Then it is ready for next step, Instruction Selection. See a clear process in DAG-Based Instruction Selection Steps. TargetLowering also kick in DAG Type Legalization and DAG Operation Legalization as well. Later on, in the same SelectionDAGISel::CodeGenAndEmitDAG(), the compiler will do instruction selection then instruction scheduling, explained in next two parts.

13 Step Four: Instruction Selection (DAG-Based) (under RISCV DAG->DAG Pattern Instruction Selection)

After the DAG for current Basic block is ready, it calls SelectionDAGISel::DoInstructionSelection() to select instruction with DAG-based pattern matching.

It starts from the root of the graph and preceding back toward the beginning, the entry node. Then calls RISCVDAGToDAGISel::Select() for each node. If not finish instruction selection, it in turns calls RISCVDAGToDAGISel::SelectCode(), then calls SelectionDAGISel::SelectCodeCommon() with a MatcherTable, details in The DAG State Machine for DAG-based Instructin Selection.

If we define complex patterns for some IR instructions, we will do instruction selection manually in RISCVDAGToDAGISel:SelectionDAGISel, such as RISCVDAGToDAGISel::SelectAddrFI().

RISCV ADD instruction in Tablegen is an article to show how RISCV ADD instruction is defined in Tablegen and its pattern to match its IR counterpart.

14 Step Five: Instruction Scheduling (pre register allocation) (under RISCV DAG->DAG Pattern Instruction Selection)

The final step of Instruction Selection is to serialize DAG into a sequence of MachineInstr. Instruction Scheduling is for this goal in ScheduleDAGSDNodes::Run(SelectionDAG* dag, MachineBasicBlock* bb).

In createDefaultScheduler(), we can see which ScheduleDAGSDNodes:ScheduleDAG is used to serialize our DAG. In our case, it is ScheduleDAGRRList::Schedule() of ScheduleDAGRRList:ScheduleDAGSDNodes:ScheduleDAG, in createBURRListDAGScheduler().

ScheduleDAGSDNodes *createDefaultScheduler(SelectionDAGISel *IS,
                                           CodeGenOpt::Level OptLevel) {
  const TargetLowering *TLI = IS->TLI;
  const TargetSubtargetInfo &ST = IS->MF->getSubtarget();

// Try first to see if the Target has its own way of selecting a scheduler if (auto *SchedulerCtor = ST.getDAGScheduler(OptLevel)) { return SchedulerCtor(IS, OptLevel); }

if (OptLevel == CodeGenOpt::None || (ST.enableMachineScheduler() && ST.enableMachineSchedDefaultSched()) || TLI->getSchedulingPreference() == Sched::Source) return createSourceListDAGScheduler(IS, OptLevel); if (TLI->getSchedulingPreference() == Sched::RegPressure) return createBURRListDAGScheduler(IS, OptLevel); if (TLI->getSchedulingPreference() == Sched::Hybrid) return createHybridListDAGScheduler(IS, OptLevel); if (TLI->getSchedulingPreference() == Sched::VLIW) return createVLIWDAGScheduler(IS, OptLevel); assert(TLI->getSchedulingPreference() == Sched::ILP && "Unknown sched type!"); return createILPListDAGScheduler(IS, OptLevel); }

Therefore, it is ScheduleDAGRRList:ScheduleDAGSDNodes:ScheduleDAG with BURegReductionPriorityQueue.

14.1 Instruction Scheduling:

The task of ordering the operations in block or a procedure to make effective use of process resources.

The scheduler takes as input a partially ordered list of operations in that target machine's assembly language; it produces as output an ordered version of the same list.

It packs operations into the available cycles and functional unit issue slots so that the code will run as quickly as possible.

The scheduler's choices are constrainted by the flow of data, by the delays associated with individual operations, and by the capabilities of the target processor.

List schedulers operate on straightline code and use a variety of priority ranking schemes to guide their choices.

List-Scheduling Algorithm:

  1. Rename to avoid antidependences.
  2. Build a Dependence Graph.
  3. Assign priorities to each operation.
  4. Iteratively select an operation and schedule it.
unsigned cycle = 1;
auto ready =  <Leaves of Dependence Graph D>;
auto active = <Empty Set>;
while (ready == <Empty Set> && active == <Empty Set>) {
  for (auto op : active) {
    if (S[op] + delay[op] < cycle>) {
      active.remove(op);
      for (auto s: Successors(op, D)) {
        if (isReady(s)) ready.add(s);
      }
    }
  }
  if (ready != <Empty Set>) {
    ready.remove(op);
    S[op] = cycle;
    active.add(op);
  }
  cycle += 1;
 }

Computing Delays for Load Operations:

for (auto load: BB) {if(load.isLoadInstr) delay[load] = 1;}
for (auto instr: D) {
  auto Di = IndependenceGraph(instr, D);
  for (auto connectedComponent: Di) {
    auto N = MaxNumLoad(C);
    for (auto load: C) delay[load] += delay[i] / N;
  }
 }

14.2 MIScheduler

14.2.1 Modeling Hardware execution in TD files.

  1. Instruction Operand Categorization. SchedWrite: SchedReadWrite SchedRead: SchedReadWrite RISCV Operand Categories
  2. Association between Instruction and Operand Categories. def Instr: Instruction<>, Sched<>. RISCV Instruction and Operand Categories Associations
  3. Processor Pipeline and Resources Machine model for schedling and other instruction cost heuristics. RISCV Rocket64 Machine model (SchedMachineModel) Computation components of a Processor. RISCV Rocket64 Computation units (ALU, Multiplier, etc.) (ProcResource<n>)
  4. Associations between Operand Categories and Hardware resources. Operand Categories with Hardware resources (WriteRes<OC, [ProcResource]>{SchedModel = SchedMachineModel})
  5. Refinement InstRW<[OC], (instrs pattern)> ReadAdvance<ReadOC, nstagesahead>

15 Step Six: SSA-Based Machine code optimazations (48-81)

Two kinds of compiler: Debugging Compiler and Optimizing Compiler.

15.1 Optimizations

The goal of code optimization is to discover, at compile time, information about the runtime behavior of the program and to use that information to improve the code generated by the compiler.

15.2 Opportunities for optimizations

  1. Reducing the overhead of abstraction.
  2. Taking advantage of special cases.
  3. Matching the code to system resources.

15.3 Basic Block:

a maximal-length sequence of branch-free code. property 1: Statements are executed sequentially. property 2: if any statements executes, the entire block executes, unless a runtime exception occurs.

15.4 Region: (Regional pass, loop pass)

Regional methods operate over scopes large than a single block but smaller than a full procedure. Examples:

  1. source-code control structures: e.g. a loop nest.
  2. Extended Basic Block: a set of blocks b1, b2, …, bn, when b1 has multiple CFG predecessors and each other bi has just one, which is some bj in the set.
  3. Dominator: In a CFG, x dominates y if and only if every path from the root to y includes x.

15.5 Global, Intraprocedure: (function pass, machine function pass)

15.6 whole program, Interprocedure: (module pass, CallGraphSCC pass, Strongly Connected Components)

Interprocedural analyis and optimization occurs, at least conceptually, on the program's call graph. Examples:

  1. Inline substitution: replaces a procedure call with a copy of the body of the callee.
  2. interprocedural constant propagation: propagates and folds information about constants throughout the entire program.

15.7 Local Optimazation:

15.7.1 Local Value Numbering (LVN):

find redundant expressions in a basic block and replaces the redundant evaluations with reuse of a previously computed value.

The Algorithm: (hash table for key and value number) The algorithm traverses a basic block and assigns a distinct number to each value that the block computes. It chooses the numbers so that two expressions, ei and ej, have the same value number if and only ei and ej have provably equal values for all possible operands of the expressions.

for i <- 0 to n - 1, where the block has n operations in form, Ti <- Li Opi Ri.

  1. get the value numbers for Li and Ri.
  2. construct a hash key from Opi and the value numbers for Li and Ri.
  3. if the hash key is already present in that table then replace operation i with a copy of the value into Ti and associate the value number with Ti. else insert a new value number into the table at the hash key location record that new value number for Ti.

Extension on LVN framework:

  1. Commutative Operations.
  2. Constant folding. associates constant value with value number.
  3. Algebraic Identities.

for i <- 0 to n - 1, where the block has n operations in form, Ti <- Li Opi Ri.

  1. get the value numbers for Li and Ri.
  2. if Li and Ri are both constant then evaluate Li Opi Ri, assign the result to Ti, and mark Ti as constant.
  3. if Li Opi Ri matches an identity, then replace it with a copy operation or an assignment.
  4. construct a hash key from Opi and the value numbers for Li and Ri, using value numbers in ascending order, if Opi commutes.
  5. if the hash key is already present in that table then replace operation i with a copy of the value into Ti and associate the value number with Ti. else insert a new value number into the table at the hash key location record that new value number for Ti.

15.7.2 Tree-Height Balancing:

Reorganizes expression trees to expose more instruction level parallelism.

15.8 Data Analysis:

15.8.1 Framework for Data Analysis:

  1. Data flow direction along control flow: forward or backward.
  2. Tranfer function: computes statement effects. tranfer_function(node) e.g. Out(s) = Gen(s) U (In(s) - Kill(s))
  3. Meet operator: Merges values from multiple incoming edges along control flow. meet(node) e.g. In(s) = {U Out(p) | p in Pred(s)}
  4. Value Set: the bits information being passed around. e.g. Sets of Reaching Definitions.
  5. Initial Values: initialize(node);
    1. should be most conservative value.
    2. Start node often a special case.
  6. Some properties of the above to ensure termination.

15.8.2 Round-Robin Iterative Algorithm for Data Analysis:

void round_robin_iter_alg() {
  for(auto node: Graph) {
    initialize(node);
  }
  bool changed = true;
  while(changed) {
    for (auto node: Graph) {
      tranfer_function(node);
      meet(node);
    }
  }
}

15.8.3 Data Analysis Examples:

  1. Reaching Definitions:
    1. In(s): the set of available definitions at the incoming point of statement s.
    2. Out(s): the set of available definitions at the outgoing point of statement s.
    3. Kill(s): the set of definitions killed or redefined by statement s.
    4. Gen(s): the set of definitions generated or defined by statement s.
    5. Pred(s): the set of predecessor statements of statement s.
    6. In(s) = {U Out(p) | p in Pred(s)}
    7. Out(s: di = xxx) = (In(s) - Kill(s)) U Gen(s)
    8. RD(s) = In(s) = {U Out(p) | p in Pred(s)} = {U ((In(p) - Kill(p)) U Gen(p)) | p in Pred(s)}
    9. RD(s) = {U ((RD(p) - Kill(p)) U Gen(p)) | p in Pred(s)}

16 Step Seven: Register Allocation (under Greedy Register Allocator and Virtual Register Rewriter)

17 Step Eight: Pro-/Epilogue Insertion (under Prologue/Epilogue Insertion & Frame Finalization)

18 Step Nine: Late machine code Optimazations (91-93)

20 Step Eleven: Code Emission (under RISCV Assembly Printer)

20.1 ELF File Format:

  1. ELFHeader

    // e_ident size and indices.
    enum {
      EI_MAG0 = 0,       // File identification index. 0x7f
      EI_MAG1 = 1,       // File identification index. E
      EI_MAG2 = 2,       // File identification index. L
      EI_MAG3 = 3,       // File identification index. F
      EI_CLASS = 4,      // File class. (ELFClASS*)
      EI_DATA = 5,       // Data encoding.(ELFDATA*)
      EI_VERSION = 6,    // File version. (EV_*)
      EI_OSABI = 7,      // OS/ABI identification. (ELFOSABI*)
      EI_ABIVERSION = 8, // ABI version.
      EI_PAD = 9,        // Start of padding bytes.
      EI_NIDENT = 16     // Number of bytes in e_ident.
    };
    struct Elf32_Ehdr {
      unsigned char e_ident[EI_NIDENT]; // ELF Identification bytes (see EI_*)
      Elf32_Half e_type;                // Type of file (see ET_* below)
      Elf32_Half e_machine;   // Required architecture for this file (see EM_*)
      Elf32_Word e_version;   // Must be equal to 1
      Elf32_Addr e_entry;     // Address to jump to in order to start program
      Elf32_Off e_phoff;      // Program header table's file offset, in bytes
      Elf32_Off e_shoff;      // Section header table's file offset, in bytes
      Elf32_Word e_flags;     // Processor-specific flags (see EF_*)
      Elf32_Half e_ehsize;    // Size of ELF header, in bytes
      Elf32_Half e_phentsize; // Size of an entry in the program header table
      Elf32_Half e_phnum;     // Number of entries in the program header table
      Elf32_Half e_shentsize; // Size of an entry in the section header table
      Elf32_Half e_shnum;     // Number of entries in the section header table
      Elf32_Half e_shstrndx;  // Sect hdr table index of sect name string table
    };
    
  2. Program Header Table ([Elf32Phdr])

    // Program header for ELF32.
    struct Elf32_Phdr {
      Elf32_Word p_type;   // Type of segment (see PT_*)
      Elf32_Off p_offset;  // File offset where segment is located, in bytes
      Elf32_Addr p_vaddr;  // Virtual address of beginning of segment
      Elf32_Addr p_paddr;  // Physical address of beginning of segment (OS-specific)
      Elf32_Word p_filesz; // Num. of bytes in file image of segment (may be zero)
      Elf32_Word p_memsz;  // Num. of bytes in mem image of segment (may be zero)
      Elf32_Word p_flags;  // Segment flags (see PF_*)
      Elf32_Word p_align;  // Segment alignment constraint
    };
    
  3. Sections
    1. Symbol Table Section

      // Symbol table entries for ELF32.
      struct Elf32_Sym {
        Elf32_Word st_name;     // Symbol name (index into string table)
        Elf32_Addr st_value;    // Value or address associated with the symbol
        Elf32_Word st_size;     // Size of the symbol
        unsigned char st_info;  // Symbol's type and binding attributes
        unsigned char st_other; // Must be zero; reserved
        Elf32_Half st_shndx;    // Which section (header table index) it's defined in
      };
      
    2. Relocation Entry Section

      // Relocation entry, without explicit addend.
      struct Elf32_Rel {
        Elf32_Addr r_offset; // Location (file byte offset, or program virtual addr)
        Elf32_Word r_info;   // Symbol table index and type of relocation to apply
      

      // These accessors and mutators correspond to the ELF32_R_SYM, ELF32_R_TYPE, // and ELF32_R_INFO macros defined in the ELF specification: Elf32_Word getSymbol() const { return (r_info >> 8); } unsigned char getType() const { return (unsigned char)(r_info & 0x0ff); } void setSymbol(Elf32_Word s) { setSymbolAndType(s, getType()); } void setType(unsigned char t) { setSymbolAndType(getSymbol(), t); } void setSymbolAndType(Elf32_Word s, unsigned char t) { r_info = (s << 8) + t; } };

  4. Section Header Table ([Elf32Shdr])

    struct Elf32_Shdr {
      Elf32_Word sh_name;      // Section name (index into string table)
      Elf32_Word sh_type;      // Section type (SHT_*)
      Elf32_Word sh_flags;     // Section flags (SHF_*)
      Elf32_Addr sh_addr;      // Address where section is to be loaded
      Elf32_Off sh_offset;     // File offset of section data, in bytes
      Elf32_Word sh_size;      // Size of section, in bytes
      Elf32_Word sh_link;      // Section type-specific header table index link
      Elf32_Word sh_info;      // Section type-specific extra information
      Elf32_Word sh_addralign; // Section address alignment
      Elf32_Word sh_entsize;   // Size of records contained within the section
    };
    

    // Section types. enum : unsigned { SHT_NULL = 0, // No associated section (inactive entry). SHT_PROGBITS = 1, // Program-defined contents. SHT_SYMTAB = 2, // Symbol table. SHT_STRTAB = 3, // String table. SHT_RELA = 4, // Relocation entries; explicit addends. SHT_HASH = 5, // Symbol hash table. SHT_DYNAMIC = 6, // Information for dynamic linking. SHT_NOTE = 7, // Information about the file. SHT_NOBITS = 8, // Data occupies no space in the file. SHT_REL = 9, // Relocation entries; no explicit addends. SHT_SHLIB = 10, // Reserved. SHT_DYNSYM = 11, // Symbol table. SHT_INIT_ARRAY = 14, // Pointers to initialization functions. SHT_FINI_ARRAY = 15, // Pointers to termination functions. SHT_PREINIT_ARRAY = 16, // Pointers to pre-init functions. SHT_GROUP = 17, // Section group. SHT_SYMTAB_SHNDX = 18, // Indices for SHN_XINDEX entries. // Experimental support for SHT_RELR sections. For details, see proposal // at groups.google.com/forum/#!top… SHT_RELR = 19, // Relocation entries; only offsets. SHT_LOOS = 0x60000000, // Lowest operating system-specific type. // Android packed relocation section types. // android.googlesource.com/platform/bi… SHT_ANDROID_REL = 0x60000001, SHT_ANDROID_RELA = 0x60000002, SHT_LLVM_ODRTAB = 0x6fff4c00, // LLVM ODR table. SHT_LLVM_LINKER_OPTIONS = 0x6fff4c01, // LLVM Linker Options. SHT_LLVM_CALL_GRAPH_PROFILE = 0x6fff4c02, // LLVM Call Graph Profile. SHT_LLVM_ADDRSIG = 0x6fff4c03, // List of address-significant symbols // for safe ICF. SHT_LLVM_DEPENDENT_LIBRARIES = 0x6fff4c04, // LLVM Dependent Library Specifiers. SHT_LLVM_SYMPART = 0x6fff4c05, // Symbol partition specification. SHT_LLVM_PART_EHDR = 0x6fff4c06, // ELF header for loadable partition. SHT_LLVM_PART_PHDR = 0x6fff4c07, // Phdrs for loadable partition. // Android's experimental support for SHT_RELR sections. // android.googlesource.com/platform/bi… SHT_ANDROID_RELR = 0x6fffff00, // Relocation entries; only offsets. SHT_GNU_ATTRIBUTES = 0x6ffffff5, // Object attributes. SHT_GNU_HASH = 0x6ffffff6, // GNU-style hash table. SHT_GNU_verdef = 0x6ffffffd, // GNU version definitions. SHT_GNU_verneed = 0x6ffffffe, // GNU version references. SHT_GNU_versym = 0x6fffffff, // GNU symbol versions table. SHT_HIOS = 0x6fffffff, // Highest operating system-specific type. SHT_LOPROC = 0x70000000, // Lowest processor arch-specific type. // Fixme: All this is duplicated in MCSectionELF. Why?? // Exception Index table SHT_ARM_EXIDX = 0x70000001U, // BPABI DLL dynamic linking pre-emption map SHT_ARM_PREEMPTMAP = 0x70000002U, // Object file compatibility attributes SHT_ARM_ATTRIBUTES = 0x70000003U, SHT_ARM_DEBUGOVERLAY = 0x70000004U, SHT_ARM_OVERLAYSECTION = 0x70000005U, SHT_HEX_ORDERED = 0x70000000, // Link editor is to sort the entries in // this section based on their sizes SHT_X86_64_UNWIND = 0x70000001, // Unwind information

    SHT_MIPS_REGINFO = 0x70000006, // Register usage information SHT_MIPS_OPTIONS = 0x7000000d, // General options SHT_MIPS_DWARF = 0x7000001e, // DWARF debugging section. SHT_MIPS_ABIFLAGS = 0x7000002a, // ABI information.

    SHT_MSP430_ATTRIBUTES = 0x70000003U,

    SHT_RISCV_ATTRIBUTES = 0x70000003U,

    SHT_HIPROC = 0x7fffffff, // Highest processor arch-specific type. SHT_LOUSER = 0x80000000, // Lowest type reserved for applications. SHT_HIUSER = 0xffffffff // Highest type reserved for applications. };

20.2 MC Layer: MC Project

  1. The major components.
    1. the instruction printer (MCInstPrinter API) MCInst –> Assembly line in .s file
    2. the instruction encoder (MCCodeEmitter API) MCInst –> Binary Code with relocations output to a rawostream
    3. the instruction parser (MCTargetAsmParser:MCAsmParserExtension API) Assembly line in .s file –> MCInst
    4. the instruction decoder (MCDisassembler API) Binary Code –> MCInst
    5. the assembly parser .s file –> MCStreamer
    6. the assembler backend (MCAsmStreamer:MCStreamer MCAssembler MCAsmBackend API)
    7. the compiler integration
  2. MCStreamer –> MCSection –> MCFragment –> MCInst

20.3 RISCV Assembly Printer

  1. rawfdostream:rawpwritestream:rawostrem
  2. LLVMTargetMachine::addAsmPrinter()

    bool LLVMTargetMachine::addAsmPrinter(PassManagerBase &PM,
                                          raw_pwrite_stream &Out,
                                          raw_pwrite_stream *DwoOut,
                                          CodeGenFileType FileType,
                                          MCContext &Context) {
      if (Options.MCOptions.MCSaveTempLabels)
        Context.setAllowTemporaryLabels(false);
      const MCSubtargetInfo &STI = *getMCSubtargetInfo();
      const MCAsmInfo &MAI = *getMCAsmInfo();
      const MCRegisterInfo &MRI = *getMCRegisterInfo();
      const MCInstrInfo &MII = *getMCInstrInfo();
      std::unique_ptr<MCStreamer> AsmStreamer;
      switch (FileType) {
      case CGFT_AssemblyFile: {
        MCInstPrinter *InstPrinter = getTarget()
          .createMCInstPrinter(getTargetTriple(),
                               MAI.getAssemblerDialect(), MAI, MII, MRI);
        // Create a code emitter if asked to show the encoding.
        std::unique_ptr<MCCodeEmitter> MCE;
        if (Options.MCOptions.ShowMCEncoding)
          MCE.reset(getTarget().createMCCodeEmitter(MII, MRI, Context));
        std::unique_ptr<MCAsmBackend> MAB(getTarget().createMCAsmBackend(STI, MRI, Options.MCOptions));
        auto FOut = std::make_unique<formatted_raw_ostream>(Out);
        MCStreamer *S = getTarget()
          .createAsmStreamer(Context, std::move(FOut), Options.MCOptions.AsmVerbose,
                             Options.MCOptions.MCUseDwarfDirectory, InstPrinter, std::move(MCE),
                             std::move(MAB), Options.MCOptions.ShowMCInst);
        AsmStreamer.reset(S);
        break;}
        // ...
      }
      // Create the AsmPrinter, which takes ownership of AsmStreamer if successful.
      FunctionPass *Printer =
        getTarget().createAsmPrinter(*this, std::move(AsmStreamer));
      PM.add(Printer);
      return false;
    }
    
  3. RISCV Asm Printer Registration Register RISCVAsmPrinter into RISCV Target.
  4. RISCVAsmPrinter:AsmPrinter:MachineFunctionPass
  5. AsmPrinter::doIntitialization(Module &M)
  6. AsmPrinter::doFinalization(Module &M)
  7. RISCVAsmPrinter::runOnMachineFunction(MachineFunction &MF)