一个规则由称为head或left-hand side的非终结符(non-terminal)开始,然后跟上一个冒号(colon),最后由称为body或right-hand side的终结符(terminal)和/或非终结符(non-terminal)组成:(A rule consists of a non-terminal, called the head or left-hand side of the production, a colon, and a sequence of terminals and/or non-terminals, called the body or right-hand side of the production)
(这里我翻译的很绕,直接看图会更清楚一点)
在上面显示的文法中,诸如MUL,DIV和INTEGER之类的标记称为终结符(terminals),而诸如expr和factor之类的变量称为非终结符(non-terminals),非终结符通常由一系列终结符和/或非终结符组成:(In the grammar I showed above, tokens like MUL, DIV, and INTEGER are called terminals and variables like expr and factor are called non-terminals. Non-terminals usually consist of a sequence of terminals and/or non-terminals)
第一条规则左侧的非终结符称为开始符(start symbol),在我们的文法中,开始符是expr:
你可以将规则expr解释为: “expr可以只是一个因数(factor),也可以可选地跟上乘法或除法运算符,然后再乘以另一个因数,之后又可选地跟上乘法或除法运算符,然后再乘以另一个因数,当然之后也可以继续循环下去”(An expr can be a factor optionally followed by a multiplication or division operator followed by another factor, which in turn is optionally followed by a multiplication or division operator followed by another factor and so on and so forth)
是什么因数(factor)?在本文中,它只是一个整数。
让我们快速浏览一下文法中使用的符号及其含义:
"|",表示或者,因此(MUL | DIV)表示MUL或DIV。
"(…)",表示(MUL | DIV)中的终结符和/或非终结符的一个组(grouping)。
"(…)*",表示该组可以出现零次或多次。
如果你过去使用过正则表达式,那么这些符号你应该非常熟悉。
文法通过解释如何形成句子来定义语言(A grammar defines a language by explaining what sentences it can form),那么我们如何使用文法来推导出算术表达式呢?有这几个步骤:首先从起始符expr开始,然后用该非终止符的规则主体重复替换非终止符,直到生成仅包含终止符的句子为止,这样我们就通过文法来形成了语言。(first you begin with the start symbol expr and then repeatedly replace a non-terminal by the body of a rule for that non-terminal until you have generated a sentence consisting solely of terminals. Those sentences form a language defined by the grammar)
语法中定义的每个规则R可以成为具有相同名称的函数(method),并且对该规则的引用将成为方法调用:R(),函数的主体中的语句流也依照与同样的指导方法。(Each rule, R, defined in the grammar, becomes a method with the same name, and references to that rule become a method call: R(). The body of the method follows the flow of the body of the rule using the very same guidelines.)
(a1 | a2 | aN)转换为if-elif-else语句。
(…)* 转换while语句,可以循环零次或多次。
对Token的引用T转换为对eat函数的调用:eat(T),也就是如果T的类型与当前的Token类型一致的话,eat函数将消耗掉T,然后从词法分析器获取一个新的Token并将其赋值给current_token这个变量。(Each token reference T becomes a call to the method eat: eat(T). The way the eat method works is that it consumes the token T if it matches the current lookahead token, then it gets a new token from the lexer and assigns that token to the current_token internal variable.)
# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, MUL, DIV, EOF = 'INTEGER', 'MUL', 'DIV', 'EOF'
class Token(object):
def __init__(self, type, value):
# token type: INTEGER, MUL, DIV, or EOF
self.type = type
# token value: non-negative integer value, '*', '/', or None
self.value = value
def __str__(self):
"""String representation of the class instance.
Examples:
Token(INTEGER, 3)
Token(MUL, '*')
"""
return 'Token({type}, {value})'.format(
type=self.type,
value=repr(self.value)
)
def __repr__(self):
return self.__str__()
class Lexer(object):
def __init__(self, text):
# client string input, e.g. "3 * 5", "12 / 3 * 4", etc
self.text = text
# self.pos is an index into self.text
self.pos = 0
self.current_char = self.text[self.pos]
def error(self):
raise Exception('Invalid character')
def advance(self):
"""Advance the `pos` pointer and set the `current_char` variable."""
self.pos += 1
if self.pos > len(self.text) - 1:
self.current_char = None # Indicates end of input
else:
self.current_char = self.text[self.pos]
def skip_whitespace(self):
while self.current_char is not None and self.current_char.isspace():
self.advance()
def integer(self):
"""Return a (multidigit) integer consumed from the input."""
result = ''
while self.current_char is not None and self.current_char.isdigit():
result += self.current_char
self.advance()
return int(result)
def get_next_token(self):
"""Lexical analyzer (also known as scanner or tokenizer)
This method is responsible for breaking a sentence
apart into tokens. One token at a time.
"""
while self.current_char is not None:
if self.current_char.isspace():
self.skip_whitespace()
continue
if self.current_char.isdigit():
return Token(INTEGER, self.integer())
if self.current_char == '*':
self.advance()
return Token(MUL, '*')
if self.current_char == '/':
self.advance()
return Token(DIV, '/')
self.error()
return Token(EOF, None)
class Interpreter(object):
def __init__(self, lexer):
self.lexer = lexer
# set current token to the first token taken from the input
self.current_token = self.lexer.get_next_token()
def error(self):
raise Exception('Invalid syntax')
def eat(self, token_type):
# compare the current token type with the passed token
# type and if they match then "eat" the current token
# and assign the next token to the self.current_token,
# otherwise raise an exception.
if self.current_token.type == token_type:
self.current_token = self.lexer.get_next_token()
else:
self.error()
def factor(self):
"""Return an INTEGER token value.
factor : INTEGER
"""
token = self.current_token
self.eat(INTEGER)
return token.value
def expr(self):
"""Arithmetic expression parser / interpreter.
expr : factor ((MUL | DIV) factor)*
factor : INTEGER
"""
result = self.factor()
while self.current_token.type in (MUL, DIV):
token = self.current_token
if token.type == MUL:
self.eat(MUL)
result = result * self.factor()
elif token.type == DIV:
self.eat(DIV)
result = result / self.factor()
return result
def main():
while True:
try:
# To run under Python3 replace 'raw_input' call
# with 'input'
text = raw_input('calc> ')
except EOFError:
break
if not text:
continue
lexer = Lexer(text)
interpreter = Interpreter(lexer)
result = interpreter.expr()
print(result)
if __name__ == '__main__':
main()