Working with arrays, structs, and pointers – IR Generation for High-Level Language Constructs-2 – IT Exams and Basics of IR Code Generation

The getelementptr instruction is the workhorse for address calculations. As such, it needs some more explanation. The first operand, [8 x i64], is the base type the instruction is operating on. The second operand, ptr @arr, specifies the base pointer. Please note the subtle difference here: we declared an array of eight elements, but because all global values are treated as pointers, we have a pointer to the array. In C syntax, we really work with long (*arr)[8]! The consequence is that we first have to dereference the pointer before we can index the element, such as arr[0][1] in C. The third operand, i64 0, dereferences the pointer, and the fourth operand, i64 1, is the element index. The result of this computation is the address of the indexed element. Please note that no memory is touched by this instruction.
Except for structs, the index parameters do not need to be constant. Therefore, the getelementptr instruction can be used in a loop to retrieve the elements of an array. Structs are treated differently here: only constants can be used, and the type must be i32.
With this knowledge, arrays are easily integrated into the code generator from Chapter 4, Basics of IR Code Generation. The convertType() method must be extended to create the type. If the Arr variable holds the type denoter of an array, and assuming the number of elements within an array is an integer literal, we then can add the following to the convertType() method to handle arrays:

if (auto *ArrayTy =
llvm::dyn_cast(Ty)) {
llvm::Type *Component =
convertType(ArrayTy->getType());
Expr *Nums = ArrayTy->getNums();
uint64_t NumElements =
llvm::cast(Nums)
->getValue()
.getZExtValue();
llvm::Type *T =
llvm::ArrayType::get(Component, NumElements);
// TypeCache is a mapping between the original
// TypeDeclaration (Ty) and the current Type (T).
return TypeCache[Ty] = T;
}

This type can be used to declare global variables. For local variables, we need to allocate memory for the array. We do this in the first basic block of the procedure:

for (auto *D : Proc->getDecls()) {
if (auto *Var =
llvm::dyn_cast(D)) {
llvm::Type *Ty = mapType(Var);
if (Ty->isAggregateType()) {
llvm::Value *Val = Builder.CreateAlloca(Ty);
// The following method requires a BasicBlock (Curr),
// a VariableDeclation (Var), and an llvm::Value (Val)
writeLocalVariable(Curr, Var, Val);
}
}
}

To read and write an element, we have to generate the getelementptr instruction. This is added to the emitExpr() (reading a value) and emitStmt() (writing a value) methods. To read an element of an array, the value of the variable is read first. Then, the selectors of the variable are processed. For each index, the expression is evaluated and the value is stored. Based on this list, the address of the referenced element is calculated and the value is loaded:

auto &Selectors = Var->getSelectors();
for (auto I = Selectors.begin(), E = Selectors.end();
     I != E; ) {
  if (auto *IdxSel =
          llvm::dyn_cast<IndexSelector>(*I)) {
    llvm::SmallVector<llvm::Value *, 4> IdxList;
    while (I != E) {
      if (auto *Sel =
              llvm::dyn_cast<IndexSelector>(*I)) {
        IdxList.push_back(emitExpr(Sel->getIndex()));
        ++I;
      } else
        break;
    }
    Val = Builder.CreateInBoundsGEP(Val->getType(), Val, IdxList);
    Val = Builder.CreateLoad(
        Val->getType(), Val);
  }
  // . . . Check for additional selectors and handle
  // appropriately by generating getelementptr and load.
  else {
    llvm::report_fatal_error("Unsupported selector");
  }
}

Writing to an array element uses the same code, with the exception that you do not generate a load instruction. Instead, you use the pointer as the target in a store instruction. For records, you use a similar approach. The selector for a record member contains the constant field index, named Idx. You convert this constant into a constant LLVM value:

llvm::Value *FieldIdx = llvm::ConstantInt::get(Int32Ty, Idx);

Then you can use value in the Builder.CreateGEP() methods as in for arrays.
Now, you should know how to translate aggregate data types to LLVM IR. Passing values of those types in a system-compliant way requires some care, and you will learn to implement it correctly in the next section.

Working with arrays, structs, and pointers – IR Generation for High-Level Language Constructs-2

Leave a Reply Cancel reply