Getting the application binary interface right – IR Generation for High-Level Language Constructs-1 – IT Exams and Basics of IR Code Generation

With the addition of arrays and records to the code generator, you can note that sometimes, the generated code does not execute as expected. The reason is that we have ignored the calling conventions of the platform so far. Each platform defines its own rules on how one function can call another function in the same program or library. These rules are summarized in the ABI documentation. Typical information includes the following:

Are machine registers used for parameter passing? If yes, which ones?
How are aggregates such as arrays and structs passed to a function?
How are return values handled?

There is a wide variety in use. On some platforms, aggregates are always passed indirectly, meaning that a copy of the aggregate is placed on the stack and only a pointer to the copy is passed as a parameter. On other platforms, a small aggregate (say 128 or 256 bit wide) is passed in registers, and only above that threshold is indirect parameter passing used. Some platforms also use floating-point and vector registers for parameter passing, while others demand that floating-point values be passed in integer registers.

Of course, this is all interesting low-level stuff. Unfortunately, it leaks into LLVM IR. At first, this is surprising. After all, we define the types of all parameters of a function in LLVM IR! It turns out that this is not enough. To understand this, let’s consider complex numbers. Some languages have built-in data types for complex numbers. For example, C99 has float _Complex (among others). Older versions of C do not have complex number types, but you can easily define struct Complex { float re, im; } and create arithmetic operations on this type. Both types can be mapped to the { float, float } LLVM IR type.

If the ABI now states that values of a built-in, complex-number type are passed in two floating-point registers, but user-defined aggregates are always passed indirectly, then the information given with the function is not enough for LLVM to decide how to pass this particular parameter. The unfortunate consequence is that we need to provide more information to LLVM, and this information is highly ABI-specific.

There are two ways to specify this information to LLVM: parameter attributes and type rewriting. What you need to use depends on the target platform and the code generator. The most commonly used parameter attributes are the following:

inreg specifies that the parameter is passed in a register
byval specifies that the parameter is passed by value. The parameter must be a pointer type. A hidden copy is made of the pointed-to data, and this pointer is passed to the called function.
zeroext and signext specify that the passed integer value should be zero or sign extended.
sret specifies that this parameter holds a pointer to memory which is used to return an aggregate type from the function.

While all code generators support zeroext, signext, and sret attributes, only some support inreg and byval. An attribute can be added to the argument of a function with the addAttr() method. For example, to set the inreg attribute on argument Arg, you call the following:
Arg->addAttr(llvm::Attribute::InReg);

Getting the application binary interface right – IR Generation for High-Level Language Constructs-1

Leave a Reply Cancel reply