Creating the semantic analyzer – Turning the Source File into an Abstract Syntax Tree-2
By Reginald Bellamy / November 30, 2021 / No Comments / Emitting the function body, Exams of IT, IT Certifications, Technical requirements, Understanding the IR code
This must be done reliably because we do not want to add names to the wrong scope in case of a syntax error. This is a classic use of the Resource Acquisition Is Initialization (RAII) idiom in C++. Another complication comes from the fact that a procedure can recursively call itself. Therefore, the name of the procedure must be added to the current scope before it can be used. The semantic analyzer has two methods to enter and leave a scope. The scope is associated with a declaration:
void Sema::enterScope(Decl *D) {
CurrentScope = new Scope(CurrentScope);
CurrentDecl = D;
}
void Sema::leaveScope() {
Scope *Parent = CurrentScope->getParent();
delete CurrentScope;
CurrentScope = Parent;
CurrentDecl = CurrentDecl->getEnclosingDecl();
}
A simple helper class is used to implement the RAII idiom:
class EnterDeclScope {
Sema &Semantics;
public:
EnterDeclScope(Sema &Semantics, Decl *D)
: Semantics(Semantics) {
Semantics.enterScope(D);
}
~EnterDeclScope() { Semantics.leaveScope(); }
};
When parsing a module or procedure, two interactions occur with the semantic analyzer. The first is after the name is parsed. Here, the (almost empty) AST node is constructed and a new scope is established:
bool Parser::parseProcedureDeclaration(/* … /) { / … */
if (consume(tok::kw_PROCEDURE)) return _errorhandler();
if (expect(tok::identifier)) return _errorhandler();
ProcedureDeclaration *D =
Actions.actOnProcedureDeclaration(
Tok.getLocation(), Tok.getIdentifier());
EnterDeclScope S(Actions, D);
/* … */
}
The semantic analyzer checks the name in the current scope and returns the AST node:
ProcedureDeclaration *
Sema::actOnProcedureDeclaration(SMLoc Loc, StringRef Name) {
ProcedureDeclaration *P =
new ProcedureDeclaration(CurrentDecl, Loc, Name);
if (!CurrentScope->insert(P))
Diags.report(Loc, diag::err_symbold_declared, Name);
return P;
}
The real work is done after all the declarations and the procedure body have been parsed. You only need to check if the name at the end of the procedure declaration is equal to the name of the procedure and if the declaration used for the return type is a type declaration:
void Sema::actOnProcedureDeclaration(
ProcedureDeclaration *ProcDecl, SMLoc Loc,
StringRef Name, FormalParamList &Params, Decl *RetType,
DeclList &Decls, StmtList &Stmts) {
if (Name != ProcDecl->getName()) {
Diags.report(Loc, diag::err_proc_identifier_not_equal);
Diags.report(ProcDecl->getLocation(),
diag::note_proc_identifier_declaration);
}
ProcDecl->setDecls(Decls);
ProcDecl->setStmts(Stmts);
auto *RetTypeDecl =
dyn_cast_or_null(RetType);
if (!RetTypeDecl && RetType)
Diags.report(Loc, diag::err_returntype_must_be_type,
Name);
else
ProcDecl->setRetType(RetTypeDecl);
}
Some declarations are inherently present and cannot be defined by the developer. This includes the BOOLEAN and INTEGER types and the TRUE and FALSE literals. These declarations exist in the global scope and must be added programmatically. Modula-2 also predefines some procedures, such as INC or DEC, that can be added to the global scope. Given our classes, initializing the global scope is simple:
void Sema::initialize() {
CurrentScope = new Scope();
CurrentDecl = nullptr;
IntegerType =
new TypeDeclaration(CurrentDecl, SMLoc(), “INTEGER”);
BooleanType =
new TypeDeclaration(CurrentDecl, SMLoc(), “BOOLEAN”);
TrueLiteral = new BooleanLiteral(true, BooleanType);
FalseLiteral = new BooleanLiteral(false, BooleanType);
TrueConst = new ConstantDeclaration(CurrentDecl, SMLoc(),
“TRUE”, TrueLiteral);
FalseConst = new ConstantDeclaration(
CurrentDecl, SMLoc(), “FALSE”, FalseLiteral);
CurrentScope->insert(IntegerType);
CurrentScope->insert(BooleanType);
CurrentScope->insert(TrueConst);
CurrentScope->insert(FalseConst);
}
With this scheme, all required calculations for tinylang can be done. For example, let’s look at how to compute if an expression results in a constant value:
• We must ensure literal or a reference to a constant declaration is a constant
• If both sides of an expression are constant, then applying the operator also yields a constant