编译原理-词法分析

496 阅读2分钟

词法分析主要包括正则文法和有限自动机

Lexical analysis have two section: Finite-state Automaton(FSA,or Finite Automaton) and Regular Grammar

正则文法(Regular Grammar)

正则文法是从输入字符串中解析出正则表达式的法则,例如从int a = 10 中区分出 int这样的关键字 a这样的变量名 = 这样的赋值词 10这样的常量,可以使用条件判断语句写出简单的识别程序

正则表达式用于表征一类Token的规则,使词法规则更严谨

Regular Grammar is the grammar to analyse regular expression from character string, for example: we can distinguish key word like 'int', variable name like 'a', assignment word like '=' and constant like '10' from 'int a = 10'. We can use conditional judgment statement to write easy detection programme.

Regular expression can be used to represent a kind of Token's rule which can make the lexical rule more precise.

有限自动机(Finite-state Automaton,FSA,or Finite Automaton)

有限自动机就是上面的用于识别不同类型Token的条件判断语句的跳转结构,例如识别int的时候,当第一个字符为i下一条有两种状态:下一个为n(可能还是为关键字),下一个为别的字母(变量)。第二个字符为n的时候也同样可以有两种状态(同上),直到识别出一个完整的int关键字,识别关键字的这一部分判断跳转结构只是有限自动机的一部分,因为有限自动机也要识别其他很多种Token(正则表达式),但Token(正则表达式)的种类数量有限,所以可以预见的是有限自动机的状态也有限。 综上,有限自动机是条件判断Token(正则表达式)的结构以图的方式更直观的体现出来。

FSA is the conditional judgment structure to recognize kinds of Token. For example: when it recognize 'int', the first character is 'i' and it has two next statements: 'n' or not'n'(which means still can be key word 'int' or no way to be key word 'int'). And when the next character is 'n', we can still have two next statements which is similar to above. After we get key word or variable word we can go back to the root node(initial statement) of FSA. FSA can have many part like above and amounts of parts and statements build our FSA. In conclusion, FSA is a map structure which show us the procedure to get Token and use conditional judgment.