Mapping the control flow to basic blocks – Basics of IR Code Generation
By Reginald Bellamy / October 18, 2022 / No Comments / Emitting the function body, Exams of IT, Generating IR from the AST, IT Certifications, Understanding the IR code
The conceptual idea of a basic block is that it is a linear sequence of instructions that are executed in that order. A basic block has exactly one entry at the beginning, and it ends with a terminator instruction, which is an instruction that transfers the control flow to another basic block, such as a branch instruction, a switch instruction, or a return instruction. See https://llvm.org/docs/LangRef.htmlterminator-instructions for a complete list of terminator instructions. A basic block can begin with phi instructions, but inside a basic block, neither phi nor branch instructions are allowed. In other words, you can only enter a basic block at the first instruction, and you can only leave a basic block at the last instruction, which is the terminator instruction. It is not possible to branch to an instruction inside a basic block or to branch to another basic block from the middle of a basic block. Please note that a simple function call with the call instruction can occur inside a basic block. Each basic block has exactly one label, marking the first instruction of the basic block. Labels are the targets of branch instructions. You can view branches as directed edges between two basic blocks, resulting in the control flow graph (CFG). A basic block can have predecessors and successors. The first basic block of a function is special in the sense that no predecessors are allowed.
As a consequence of these restrictions, control statements of the source language, such as WHILE and IF, produce several basic blocks. Let’s look at the WHILE statement. The condition of the WHILE statement controls if the loop body or the next statement is executed. The condition must be generated in a basic block of its own because there are two predecessors:
- The basic block resulting from the statement before WHILE
- The branch from the end of the loop body back to the condition
There are also two successors:
- The beginning of the loop body
- The basic block resulting from the statement following WHILE
The loop body itself has at least one basic block:
Figure 4.1 – The basic blocks of a WHILE statement
The IR code generation follows this structure. We store a pointer to the current basic block in the CGProcedure class and use an instance of llvm::IRBuilder<> to insert instructions into the basic block. First, we create the basic blocks:
void emitStmt(WhileStatement *Stmt) {
llvm::BasicBlock *WhileCondBB = llvm::BasicBlock::Create(
CGM.getLLVMCtx(), “while.cond”, Fn);
llvm::BasicBlock *WhileBodyBB = llvm::BasicBlock::Create(
CGM.getLLVMCtx(), “while.body”, Fn);
llvm::BasicBlock *AfterWhileBB = llvm::BasicBlock::Create(
CGM.getLLVMCtx(), “after.while”, Fn);
The Fn variable denotes the current function and getLLVMCtx() returns the LLVM context. Both are set later. We end the current basic block with a branch to the basic block, which will hold the condition:
Builder.CreateBr(WhileCondBB);
The basic block for the condition becomes the new current basic block. We generate the condition and end the block with a conditional branch:
setCurr(WhileCondBB);
llvm::Value *Cond = emitExpr(Stmt->getCond());
Builder.CreateCondBr(Cond, WhileBodyBB, AfterWhileBB);
Next, we generate the loop body. Finally, we add a branch back to the basic block of the condition:
setCurr(WhileBodyBB);
emit(Stmt->getWhileStmts());
Builder.CreateBr(WhileCondBB);
With that, we have generated the WHILE statement. Now that we’ve generated the WhileCondBB and Curr blocks, we can seal them:
sealBlock(WhileCondBB);
sealBlock(Curr);
The empty basic block for statements following WHILE becomes the new current basic block:
setCurr(AfterWhileBB);
}
Following this schema, you can create an emit() method for each statement of the source language.