Controlling visibility with linkage and name mangling – Basics of IR Code Generation – IT Exams and Basics of IR Code Generation

Functions (and also global variables) have a linkage style attached. With the linkage style, we define the visibility of a symbol name and what should happen if more than one symbol has the same name. The most basic linkage styles are private and external. A symbol with private linkage is only visible in the current compilation unit, while a symbol with external linkage is globally available.

For a language without a proper module concept, such as C, this is adequate. With modules, we need to do more. Let’s assume that we have a module called Square that provides a Root() function and a Cube module, which also provides a Root() function. If the functions are private, then there is no problem. The function gets the name Root and private linkage. The situation is different if the function is exported so that it can be called from other modules. Using the function name alone is not enough, because this name is not unique.

The solution is to tweak the name to make it globally unique. This is called name mangling. How this is done depends on the requirements and characteristics of the language. In our case, the base idea is to use a combination of the module and the function name to create a globally unique name. Using Square.Root as the name looks like an obvious solution, but it may lead to problems with assemblers as the dot may have a special meaning. Instead of using a delimiter between the name components, we can get a similar effect by prefixing the name components with their length: 6Square4Root. This is no legal identifier for LLVM, but we can fix this by prefixing the whole name with _t (with t for tinylang): _t6Square4Root. In this way, we can create unique names for exported symbols:

std::string CGModule::mangleName(Decl *D) {
  std::string Mangled(“_t”);
  llvm::SmallVector<llvm::StringRef, 4> List;
  for (; D; D = D->getEnclosingDecl())
    List.push_back(D->getName());
  while (!List.empty()) {
    llvm::StringRef Name = List.pop_back_val();
    Mangled.append(
        llvm::Twine(Name.size()).concat(Name).str());
  }
  return Mangled;
}

If your source language supports type overloading, then you need to extend this scheme with type names. For example, to distinguish between the int root(int) and double root(double) C++ functions, the type of the parameter and the return value must be added to the function name.

You also need to think about the length of the generated name since some linkers place restrictions on the length. With nested namespaces and classes in C++, the mangled names can be rather long. There, C++ defines a compression scheme to avoid repeating name components over and over again.

Next, we’ll look at how to treat parameters.

Controlling visibility with linkage and name mangling – Basics of IR Code Generation

Leave a Reply Cancel reply