Handling the scope of names – Turning the Source File into an Abstract Syntax Tree
By Reginald Bellamy / July 13, 2021 / No Comments / IT Certifications, LLVM IR REFERENCE, Technical requirements, Understanding the IR code
Let’s have a look at the scope of names first. The scope of a name is the range where the name is visible. Like C, tinylang uses a declare-before-use model. For example, the B and X variables are declared at the module level to be of the INTEGER type:
VAR B, X: INTEGER;
Before the declaration, the variables are not known and cannot be used. That’s only possible after the declaration. Inside a procedure, more variables can be declared:
PROCEDURE Proc;
VAR B: BOOLEAN;
BEGIN
(* Statements *)
END Proc;
Inside the procedure, at the point where the comment is, a use of B refers to the B local variable while a use of X refers to the X global variable. The scope of the local variable, B, is Proc. If a name cannot be found in the current scope, then the search continues in the enclosing scope. Therefore, the X variable can be used inside the procedure. In tinylang, only modules and procedures open a new scope. Other language constructs, such as structs and classes, usually also open a scope. Predefined entities such as the INTEGER type and the TRUE literal are declared in a global scope, enclosing the scope of the module.
In tinylang, only the name is crucial. Therefore, a scope can be implemented as a mapping from a name to its declaration. A new name can only be inserted if it is not already present. For the lookup, the enclosing or parent scope must also be known. The interface (in the include/tinylang/Sema/Scope.h file) looks as follows:
ifndef TINYLANG_SEMA_SCOPE_H
define TINYLANG_SEMA_SCOPE_H
include “tinylang/Basic/LLVM.h”
include “llvm/ADT/StringMap.h”
include “llvm/ADT/StringRef.h”
namespace tinylang {
class Decl;
class Scope {
Scope *Parent;
StringMap Symbols;
public:
Scope(Scope *Parent = nullptr) : Parent(Parent) {}
bool insert(Decl *Declaration);
Decl *lookup(StringRef Name);
Scope *getParent() { return Parent; }
};
} // namespace tinylang
endif
The implementation in the lib/Sema/Scope.cpp file looks as follows:
include “tinylang/Sema/Scope.h”
include “tinylang/AST/AST.h”
using namespace tinylang;
bool Scope::insert(Decl *Declaration) {
return Symbols
.insert(std::pair(
Declaration->getName(), Declaration))
.second;
}
Please note that the StringMap::insert() method does not override an existing entry. The second member of the resulting std::pair indicates if the table was updated. This information is returned to the caller.
To implement the search for the declaration of a symbol, the lookup() method searches in the current scope and, if nothing is found, searches the scopes linked by the parent member:
Decl *Scope::lookup(StringRef Name) {
Scope *S = this;
while (S) {
StringMap::const_iterator I =
S->Symbols.find(Name);
if (I != S->Symbols.end())
return I->second;
S = S->getParent();
}
return nullptr;
}
The variable declaration is then processed as follows:
- The current scope is the module scope.
- The INTEGER type declaration is looked up. It’s an error if no declaration is found or if it is not a type declaration.
- A new AST node called VariableDeclaration is instantiated, with the important attributes being the name, B, and the type.
- The name, B, is inserted into the current scope, mapping to the declaration instance. If the name is already present in the scope, then this is an error. The content of the current scope is not changed in this case.
- The same is done for the X variable.
Two tasks are performed here. As in the calc example, AST nodes are constructed. At the same time, attributes of the node, such as the type, are computed. Why is this possible?
The semantic analyzer can fall back on two different sets of attributes. The scope is inherited from the caller. The type declaration can be computed (or synthesized) by evaluating the name of the type declaration. The language is designed in such a way that these two sets of attributes are sufficient to compute all attributes of the AST node.
An important aspect is the declare-before-use model. If a language allows the use of names before declaration, such as members inside a class in C++, then it is not possible to compute all attributes of an AST node at once. In such a case, the AST node must be constructed with only partially computed attributes or just with plain information (such as in the calc example).
The AST must then be visited one or more times to determine the missing information. In the case of tinylang (and Modula-2), it would be possible to dispense with the AST construction – the AST is indirectly represented through the call hierarchy of the parseXXX() methods. Code generation from an AST is much more common, so we construct an AST here, too.
Before we put the pieces together, we need to understand the LLVM style of using runtime type information (RTTI).