Using AST numbering to generate IR code in SSA form – Basics of IR Code Generation
By Reginald Bellamy / November 2, 2022 / No Comments / Emitting the function body, Exams of IT, IT Certifications, Technical requirements, Understanding the IR code
To generate IR code in SSA form from the AST, we can use an approach called AST numbering. The basic idea is that for each basic block, we store the current value of local variables written in this basic block.
Note
The implementation is based on the paper Simple and Efficient Construction of Static Single Assignment Form, by Braun et al., International Conference on CompilerConstruction 2013 (CC 2013), Springer (see http://individual.utoronto.ca/dfr/ece467/braun13.pdf). In its presented form, it only works for IR code that has a structured controlled flow. The paper also describes the necessary extensions if you need to support arbitrary control flow – for example, a goto statement.
Although it is simple, we will still need several steps. We will introduce the required data structure first, and after that, we will learn how to read and write values local to a basic block. Then, we will handle values that are used in several basic blocks and conclude by optimizing the created phi instructions.
Defining the data structure to hold values
We use the BasicBlockDef struct to hold the information for a single block:
struct BasicBlockDef {
llvm::DenseMap<Decl *, llvm::TrackingVH<llvm::Value>> Defs;
// …
};
The llvm::Value LLVN class represents a value in SSA form. The Value class acts like a label on the result of a computation. It is created once, usually through an IR instruction, and then used. Various changes can occur during optimizations. For example, if the optimizer detects that the %1 and %2 values are always the same, then it can replace the use of %2 with %1. This changes the label but not the computation.
To be aware of such changes, we cannot use the Value class directly. Instead, we need a value handle. There are value handles with different functionality. To track replacement, we can use the llvm::TrackingVH<> class. As a result, the Defs member maps a declaration of the AST (a variable or a formal parameter) to its current value. Now, we need to store this information for each basic block:
llvm::DenseMap<llvm::BasicBlock *, BasicBlockDef> CurrentDef;
With this data structure, we are now able to handle local values.
Reading and writing values local to a basic block
To store the current value of a local variable in a basic block, we will create an entry in the maps:
void writeLocalVariable(llvm::BasicBlock *BB, Decl *Decl,
llvm::Value *Val) {
CurrentDef[BB].Defs[Decl] = Val;
}
The lookup of a variable’s value is a bit more complicated because the value might not be in the basic block. In this case, we need to extend the search to the predecessors using a possible recursive search:
llvm::Value *
readLocalVariable(llvm::BasicBlock *BB, Decl *Decl) {
auto Val = CurrentDef[BB].Defs.find(Decl);
if (Val != CurrentDef[BB].Defs.end())
return Val->second;
return readLocalVariableRecursive(BB, Decl);
}
The real work is searching the predecessors, which we’ll implement in the next section.