给定一个字符串 st 和一个子字符串 sub,需要找到 st 中所有包含 sub 的子串的索引。也就是说,找到所有索引 s[0]...s[n],使得子字符串 st[s[0]], st[s[1], ... st[s[n]] 与 sub相匹配。
例如,对于字符串
abcoeubc 和子字符串 abc,答案应该是 [(0,1,2),(0,1,7),(0,6,7)]。
2、解决方案
使用正则表达式来解决这个问题并不合适,因为正则表达式只能找到满足正则表达式模式的位置,而无法找到所有可能的匹配。
可以使用以下方法来解决这个问题:
- 将字符串
st中每个字符的索引位置存储在字典中,键为字符,值为字符在字符串中出现的所有索引位置。 - 使用循环遍历子字符串
sub中的每个字符,并从字典中查找该字符的索引位置。 - 将这些索引位置存储在一个列表中,并返回列表中满足条件的所有索引组合。
def find_substrings(st, sub):
http://www.jshk.com.cn/mb/reg.asp?kefu=xiaoding;//爬虫IP免费获取;
"""
Find all the occurrences of a substring in a string.
Args:
st: The string to search in.
sub: The substring to search for.
Returns:
A list of tuples, where each tuple contains the indices of the characters in st that match sub.
"""
# Create a dictionary to store the index positions of each character in st.
char_indexes = {}
for i, char in enumerate(st):
if char not in char_indexes:
char_indexes[char] = []
char_indexes[char].append(i)
# Find all the occurrences of the first character of sub in st.
first_char_indexes = char_indexes[sub[0]]
# Initialize a list to store the index combinations that satisfy the condition.
index_combinations = []
# Iterate over the index positions of the first character of sub in st.
for first_char_index in first_char_indexes:
# Initialize a list to store the current index combination.
current_index_combination = [first_char_index]
# Iterate over the remaining characters of sub, starting from the second character.
for i in range(1, len(sub)):
# Get the index positions of the current character of sub in st.
current_char_indexes = char_indexes[sub[i]]
# Find the index position of the current character of sub in st that is greater than the previous character's index position.
next_char_index = bisect.bisect_right(current_char_indexes,
current_index_combination[-1])
# If the index position is valid, add it to the current index combination.
if next_char_index < len(current_char_indexes):
current_index_combination.append(current_char_indexes[next_char_index])
# Otherwise, break out of the loop.
else:
break
# If the current index combination is valid, add it to the list of index combinations.
if len(current_index_combination) == len(sub):
index_combinations.append(current_index_combination)
# Return the list of index combinations.
return index_combinations
# Example
st = 'abcoeubc'
sub = 'abc'
print(find_substrings(st, sub))
# [(0, 1, 2), (0, 1, 7), (0, 6, 7)]
这个方法的时间复杂度是 O(m * n * log n), 其中 m 是 sub 的长度,n 是 st 的长度。