需要解析如下形式的字符串:
- atomname * and atomindex 1,2,3
- atomname xxx,yyy or atomtype rrr,sss
- thiol
- not atomindex 1,2,3
- not (atomindex 4,5,6) or atomname *
这些字符串用于选择原子,因此需要将匹配项链接到特定函数调用以执行原子选择。所有的选择关键字(atomname、atomindex、thiol ...)都存储在一个列表中。
2. 解决方案
使用 pyparsing 解析字符串的步骤如下:
- 定义选择关键字列表 selkwds。
- 定义函数名称 func_name,它是选择关键字的第一个匹配项。
- 定义布尔运算符 NOT、AND 和 OR。
- 定义关键字 keyword,它是函数名称、NOT、AND 或 OR 之一。
- 定义函数调用 func_call,它是关键字和参数列表组成的组。
- 定义整数 integer,它是一个数字组成的单词。
- 定义字母单词 alphaword,它是一个字母或字母数字组成的单词。
- 定义函数参数 func_arg,它是没有关键字的整数、函数调用或字母单词。
- 定义函数调用表达式 func_call_expr,它是由布尔运算符优先级解析的函数调用。
以下是完整的代码示例:
from pyparsing import *
selkwds = "atomname atomindex atomtype thiol".split()
func_name = MatchFirst(map(CaselessKeyword, selkwds))
NOT,AND,OR = map(CaselessKeyword,"NOT AND OR".split())
keyword = func_name | NOT | AND | OR
func_call = Forward()
integer = Word(nums).setParseAction(lambda t: int(t[0]))
alphaword = Word(alphas,alphanums)
# you have to be specific about what kind of things can be an arg,
# otherwise, an argless function call might process the next
# keyword or boolean operator as an argument;
# this kind of lookahead is commonly overlooked by those who
# assume that the parser will try to do some kind of right-to-left
# backtracking in order to implicitly find a token that could be
# mistaken for the current repetition type; pyparsing is purely
# left-to-right, and only does lookahead if you explicitly tell it to
# I assume that a func_call could be a function argument, otherwise
# there is no point in defining it as a Forward
func_arg = ~keyword + (integer | func_call | alphaword)
# add Groups to give structure to your parsed data - otherwise everything
# just runs together - now every function call parses as exactly two elements:
# the keyword and a list of arguments (which may be an empty list, but will
# still be a list)
func_call << Group(func_name + Group(Optional(delimitedList(func_arg) | '*')))
# don't name this func_call, its confusing with what you've
# already defined above
func_call_expr = operatorPrecedence(func_call, [(NOT, 1, opAssoc.RIGHT),
(AND, 2, opAssoc.LEFT),
(OR , 2, opAssoc.LEFT)])
tests = """\
atomname * and atomindex 1,2,3
atomname xxx,yyy or atomtype rrr,sss
thiol
not atomindex 1,2,3
not (atomindex 4,5,6) or atomname *""".splitlines()
for test in tests:
print test.strip()
print func_call_expr.parseString(test).asList()
print
输出结果如下:
atomname * and atomindex 1,2,3
[[['atomname', ['*']], 'AND', ['atomindex', [1, 2, 3]]]]
atomname xxx,yyy or atomtype rrr,sss
[[['atomname', ['xxx', 'yyy']], 'OR', ['atomtype', ['rrr', 'sss']]]]
thiol
[['thiol', []]]
not atomindex 1,2,3
[['NOT', ['atomindex', [1, 2, 3]]]]
not (atomindex 4,5,6) or atomname *
[[['NOT', ['atomindex', [4, 5, 6]]], 'OR', ['atomname', ['*']]]]