The provided target data layout must match what the backend expects. Its purpose is to communicate the captured information to the target-independent optimization passes. For example, an optimization pass can query the data layout to get the size and alignment of a pointer. However, changing the size of a pointer in the data layout does not change the code generation in the backend.

A lot more information is provided with the target data layout. You can find more information in the reference manual at https://llvm.org/docs/LangRef.htmldata-layout.

Last, the target triple string specifies the architecture we are compiling for. This reflects the information we gave on the command line. The triple is a configuration string that usually consists of the CPU architecture, the vendor, and the operating system. More information about the environment is often added. For example, the x86_64-pc-win32 triple is used for a Windows system running on a 64-bit X86 CPU. x86_64 is the CPU architecture, pc is a generic vendor, and win32 is the operating system. The parts are connected by a hyphen. A Linux system running on an ARMv8 CPU uses aarch64-unknown-linux-gnu as its triple. aarch64 is the CPU architecture, while the operating system is linux running a gnu environment. There is no real vendor for a Linux-based system, so this part is unknown. Parts that are not known or unimportant for a specific purpose are often omitted: the aarch64-linux-gnu triple describes the same Linux system.

Next, the gcd function is defined in the IR file:
define i32 @gcd(i32 %a, i32 %b) {

This resembles the function signature in the C file. The unsigned data type is translated into the 32-bit integer type, i32. The function name is prefixed with @, and the parameter names are prefixed with %. The body of the function is enclosed in curly braces. The code of the body follows:
entry:
  %cmp = icmp eq i32 %b, 0
  br i1 %cmp, label %return, label %while.body

The IR code is organized into so-called basic blocks. A well-formed basic block is a linear sequence of instructions, which begins with an optional label and ends with a terminator instruction. So, each basic block has one entry point and one exit point. LLVM allows malformed basic blocks at construction time. The label of the first basic block is entry. The code in the block is simple: the first instruction compares the %b parameter against 0. The second instruction branches to the return label if the condition is true and to the while.body label if the condition is false.

Another characteristic of the IR code is that it is in a static single assignment (SSA) form. The code uses an unlimited number of virtual registers, but each register is only written once. The result of the comparison is assigned to the named virtual register, %cmp. This register is then used, but it is never written again. Optimizations such as constant propagation and common-sub-expression elimination work very well with the SSA form and all modern compilers are using it.

Leave a Reply

Your email address will not be published. Required fields are marked *