编译原理-前端概述(Compiler theory-front end)

674 阅读3分钟

记录一下自己学习编译原理的过程(just a record of my complier theory study)

这一篇是前端概述(This page is the abstract of front end )

前端技术(Front technology)

这里的前端指的是编译器对代码的分析和理解过程,不牵扯目标机器。后端则是生成目标代码的过程(汇编->二进制机器码),与目标机器有关(不同指令集),编译器的前端有三个流程和功能:词法分析,语法分析,语义分析

The front-end here means the process for complier machine to analyse and understand the code, and this process have no link to objective machine. Back-end is the process of creating objective code ( assembly language -> binary machine code), and this process have link to objective machine (different mechines have different instruction set). The front-end of one complier machine have three process and capability : Lexical Analysis, Syntactic Analysis( or Parsing), Semantic Analysis

词法分析 (Lexical Analysis)

词法记号:Token
指具有意义的最小词,也可以是符号(means the minimum meaningful word or symble in code ) 例如(for example): int main = - >= a etc.

我们需要依据正则文法来生成有限自动机,有限自动机用于提取Token

We need to establish Finite-state Automaton(FSA,or Finite Automaton) according to the Regular Grammar which used to get Token.

语法分析 (Syntactic Analysis, or Parsing)

词法分析是提取词,那语法分析就是获得句子的逻辑结构,执行顺序

We use lexical analysis to get word then we use syntactic analysis to get the logic structure and executing sequence or order.

我们需要获得一颗抽象语法树(Abstract Syntax Tree,AST),这颗树表明了程序的结构和执行顺序。我们可以用两种不同的理念来构造这颗树:自顶向下(从程序根节点开始向下构造叶子节点,常用的有递归下降法Recursive Descent Parsing,即遇到一个新的Token就生成一个左叶子节点然后整句结束return开始回溯递归),自底向上(从底部叶子节点开始向上构造根节点)

We need to get a abstrct syntax tree (AST) and this tree show us the structure and execute order of programme. We can use two different ideas to build this tree: top to bottom(which means that create new leaf node from root node to bottom, the common way is recursive descent parsing which means once we get a new Token, we create a left leaf node and when we meet the end of one sentence we can start recall(return from recursion)), bottom to top(create node from the leaf node in bottom to the root node)

语义分析(Semantic Analysis)

我们现在已经获得了抽象语法树,它的节点上是Token,同时这些节点自己还需要有一些属性,结合上下文还需要有自己的意义,上下文中也不能出现语义混淆,语义分析就是给这个树上的节点添加它们各自对应的语义,保证编译出来的程序执行结果与设想相同。例如:给自变量添加类型(int char) ,检查同一作用域内变量是否重定义等等

Now we get a abstract syntax tree and there are Tokens in nodes. These nodes need to have their properties and meanings according to the context. We can't tolerate the confusion of semantic in context so we need semantic analysis to check the property and meaning of node which insure that we have a programme result same to our imagining.

总结(summary)

这是编译器前端的概述,后面会对每一小部分学习。在语义分析结束后我们得到完整的抽象语法树,可以用于在后端生成目标代码(汇编,机器码)。

This is the abstract to front-end of complier machine, and I will study every little section later. After the semantic analysis we can get a intergrated abstract syntax tree which will be used to create objective code by back-end of complier machine(assembly langauge, machine code)