嵌套括号中的字符串提取

96 阅读2分钟

给定一个带有嵌套括号的字符串,如 "[ this is [ hello [ who ] [what ] from the other side ] slim shady ]", 如何从嵌套括号中提取字符串?

2、解决方案

方法一:栈

使用栈来存储当前解析的字符串。当遇到一个 "[" 时,将当前栈内容压入栈中,并新建一个栈来存储接下来的内容。当遇到一个 "]" 时,将当前栈的内容弹出来,并将其与前一个栈的内容连接起来。最后,将栈中所有的内容依次弹出来,即为所要提取的字符串。

def parse(text):
  stack = []
  for char in text:
    if char == '[':
      stack.append([])
    elif char == ']':
      yield ''.join(stack.pop())
    else:
      stack[-1].append(char)

print(tuple(parse(text)))

输出:

(' who ', 'what ', ' hello   from the other side ', ' this is  slim shady ')

方法二:正则表达式

可以使用正则表达式来匹配嵌套括号中的字符串。正则表达式 r'[([^[]]*)]' 可以匹配非嵌套的方括号中的字符串。然后,可以使用该正则表达式来迭代地匹配字符串中的所有嵌套括号,并将匹配到的字符串添加到结果列表中。

import re

s= '[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'

result= []
pattern= r'[([^[]]*)]'
while '[' in s:
  result.extend(re.findall(pattern, s))
  s= re.sub(pattern, '', s)
result= filter(None, (t.strip() for t in result))

print(result)

输出:

['who', 'what', 'side', 'd', 'hello   from the other', 'w', 'this is  slim shady', 'a', 'g', 'oh my']

方法三:树状结构

将嵌套括号中的字符串表示为一个树状结构,其中每个节点代表一个括号对,其子节点代表嵌套在该括号对中的字符串。然后,可以通过遍历树状结构来提取嵌套括号中的字符串。

class BracketMatch:
  def __init__(self, refstr, parent=None, start=-1, end=-1):
    self.parent = parent
    self.start = start
    self.end = end
    self.refstr = refstr
    self.nested_matches = []

  def __str__(self):
    cur_index = self.start+1
    result = ""
    if self.start == -1 or self.end == -1:
      return ""
    for child_match in self.nested_matches:
      if child_match.start != -1 and child_match.end != -1:
        result += self.refstr[cur_index:child_match.start]
        cur_index = child_match.end + 1
      else:
        continue
    result += self.refstr[cur_index:self.end]
    return result

haystack = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'
root = BracketMatch(haystack)
cur_match = root
for i in range(len(haystack)):
  if '[' == haystack[i]:
    new_match = BracketMatch(haystack, cur_match, i)
    cur_match.nested_matches.append(new_match)
    cur_match = new_match
  elif ']' == haystack[i]:
    cur_match.end = i
    cur_match = cur_match.parent
  else:
    continue

nodes_list = root.nested_matches
while nodes_list != []:
  node = nodes_list.pop(0)
  nodes_list.extend(node.nested_matches)
  print("Match: " + str(node).strip())

输出:

Match: this is slim shady
  Match: hello from the other side
  Match: who
  Match: what