语义分析

在编译器的编译过程中，语义分析是非常重要的一环。语义分析的主要目的是检查程序语义的正确性，如变量和函数的类型是否正确、变量是否被声明、语句是否合法等。本文将介绍语义分析的一些基本概念和实现方法。

首先，我们已经对程序进行词法和语法分析，确保程序的结构正确无误。接下来，我们需要对程序进行语义分析，包括以下内容：

算术和布尔运算
赋值和初始化（例如：int x = “hello”）
函数调用
返回语句
异常处理（声明或捕获）
Java、C++：访问修饰符是否被遵守

实例

请看下边的Parser

class Program {
		List<Function> functions;
}
class Function {
		List<Parameter> parameters;
		List<Statement> body;
}
// Very simple example!
// no new types, no global vars,
// no return types in functions,...
abstract class Statement { }
class AssignmentStatement extends Statement {
		Identifier left side;
		Expression rightSide;
}
class IfStatement extends Statement {
		Expression condition;
		List<Statement> thenStatements;
		List<Statement> elseStatements;
}
... and so on

Type Checking

类型检查是语义分析的重要组成部分，主要用于检查表达式和语句是否正确使用类型。我们可以从AST的根开始遍历它。

void checkTypes(Program prog) {
		for(var func : prog.functions) {
				checkTypes(func);
		}
}
void checkTypes(Function func) {
		for(var stmt : func.body) {
				checkTypes(stmt);
	}
}

void checkTypes(Statement stmt) {
		if(stmt instanceof AssignmentStatement s) {
				var leftType = getTypeOfExpression(s.leftSide);
				var rightType = getTypeOfExpression(s.rightSide);
				if(!leftType.equals(rightType))
						throw new TypeErrorException();
		}
		else if(stmt instanceof IfStatement) {
				// do type checking of if statement
				...
		}
		else if...
		// and so on
}

Symbol Table

在上边的例子中，我们已经实现了对Assignment statement进行检查，例如x = 123;.接下来，在我们讨论怎么实现getTypeOfExpression之前，我们需要考虑我们怎么知道x是否被定义，以及x的类型呢？这就用到了symbol table.

symbol table is a data structure(for example: set) that stores information about all identifiers and their type.

例如

int x;
double y;
void f(){
		x = 3;
}

Identifier	Type
x	int
y	double
f	() → void

现在我们可以实现getTypeOfExpression.

HashMap<String, Type> symbolTable;

Type getTypeOfExpression(Expression e){
	if(e instanceof Identifier id){
		if(symboleTable.contains(id.name)
			return symbolTable.get(id.name);
		else
			throw new UnkonwnIdentifierExpression();
	}else if...
}

Forward Reference

请看下边的例子

int x;
double y;

void f(){
}

void g(){
...
}

Identifier	Type
x	int
y	double
f	()→ void

在遍历函数f()的AST时，我们还没有将g()添加到符号表中。解决向前引用问题有两种方法

像C一样使用prototype（forward declaration）

void g();
void f(){
  g();
}
void g(){
  //...
}

对AST进行多次遍历
- 第一遍遍历：收集全局变量、函数等的类型信息；
- 第二遍遍历：遍历AST，进行类型检查等。

scope

看下边这个例子，

int x;
double y;
void f(int y, int z){
	x = y;
}

在函数f()中，参数y的声明隐藏了全局变量y。在函数f()中，标识符y指代的是参数y，而不是全局变量y。

在这个例子中，我们有不同的作用域：

全局变量x和y的作用域：整个程序
参数y和z的作用域：仅在函数f()中

Implement multiple Scopes

The first table is T1 and the Second is T2.

Identifier	Type
x	int
y	double
f	(int, int) → void

Identifier	Type
y	int
z	int

在函数f()中，我们创建一个新的符号表，它与全局符号表相链接，以便在检查y时首先在T2中查找，如果找不到，则在T1中查找。

class SymbolTable {
	SymbolTable previousTable;
	HashMap<String,Type> entries;
	SymbolTable(SymbolTable prev) { 
		previousTable = prev;
	}
}
void checkTypes(Function f, SymbolTable globalTable) {
		// new symbol table linked to global table
		var localTable = new SymbolTable(globalTable);
		// add parameters of the function to local table
		localTable.add(f.parameters);
		// use local table when type-checking body of f
		checkTypes(f.body, localTable);
		// note that we didn’t modify globalTable. Outside f, the parameters
		// of f are not visible
}

结语

socpe 远远不止这一种情况，下面的文章我会和大家讨论如何解决其它的问题。

其实这篇文章中的代码并不是特别优雅，感兴趣的读者应该使用Visitor pattern。点击跳转Visitor pattern

一起打造编译器：语义分析理论与实践（一）

语义分析

实例

Symbol Table

Forward Reference

scope

结语