SSA – Basics of IR Code Generation
By Reginald Bellamy / July 17, 2022 / No Comments / Handling the scope of names, IT Certifications, LLVM IR REFERENCE, Technical requirements
The SSA form was developed in the late 1980s. Since then, it has been widely used in compilers because it simplifies data flow analysis and optimizations. For example, the identification of common sub-expressions inside a loop becomes much easier if the IR is in SSA form. A basic property of SSA is that it establishes def-use and use-def chains: for a single definition, you know of all uses (def-use), and for each use, you know the unique definition (use-def). This knowledge is used a lot, such as in constant propagation: if a definition is determined to be a constant, then all uses of this value can be easily replaced with that constant value.
To construct the SSA form, the algorithm from Cytron et al. (1989) is very popular, and it is also used in the LLVM implementation. Other algorithms have been developed too. An early observation is that these algorithms become simpler if the source language does not have a goto statement.
An in-depth treatment of SSA can be found in the book SSA-based Compiler Design, by F. Rastello and F. B. Tichadou, Springer 2022.
The next basic block is the body of the while loop:
while.body:
%b.loop = phi i32 [ %rem, %while.body ],
[ %b, %entry ]
%a.loop = phi i32 [ %b.loop, %while.body ],
[ %a, %entry ]
%rem = urem i32 %a.loop, %b.loop
%cmp1 = icmp eq i32 %rem, 0
br i1 %cmp1, label %return, label %while.body
Inside the loop of gcd, the a and b parameters are assigned new values. If a register can be only written once, then this is not possible. The solution is to use the special phi instruction. The phi instruction has a list of basic blocks and values as parameters. A basic block presents the incoming edge from that basic block, and the value is the value from that basic block. At runtime, the phi instruction compares the label of the previously executed basic block with the labels in the parameter list.
The value of the instruction is the value that’s associated with the label. For the first phi instruction, the value is the %rem register if the previously executed basic block was while.body. The value is %b if entry was the previously executed basic block. The values are the ones at the start of the basic block. The %b.loop register gets a value from the first phi instruction. The same register is used in the parameter list of the second phi instruction, but the value is assumed to be the one before it is changed through the first phi instruction.
After the loop’s body, the return value must be chosen:
return:
%retval = phi i32 [ %a, %entry ],
[ %b.loop, %while.body ]
ret i32 %retval
}
Again, a phi instruction is used to select the desired value. The ret instruction not only ends this basic block but also denotes the end of this function at runtime. It has the return value as a parameter.
There are some restrictions on the use of phi instructions. They must be the first instructions of a basic block. The first basic block is special: it has no previously executed block. Therefore, it cannot begin with a phi instruction.