每天两道LeetCodeHard:(4)

124 阅读2分钟

30.Substring with Concatenation of All Words

串联所有单词的子串

题干:

You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

解释:

题意的描述比较简单,就是找到s中的字串集合中所有由words中每个单词都出现一次组成的字符串,返回这些字串的起始位置。 有两个需要注意的条件: 1,所有的word长度都相同 2,每个单词都只出现一次,并且不可交叉

思考:

首先既然是找所有字串,dfs肯定是可以的,但是一般不是最优的,要想在解决字符串问题中实现最优无非就是dp或者滑窗。降低时间复杂度的关键就在于滑窗中包含的字符串无需重复统计,只需要更新每次出队的即可。 我们可以根据k的余数来进行滑窗分析,每次滑动的长度是单词的长度,这样最后就一定可以不重复的找到所有的字串。

答案:

from collections import defaultdict
def findSubstring(s, words):
        """
        :type s: str
        :type words: List[str]
        :rtype: List[int]
        """
        if len(s) ==0 or len(words)==0:
            return []
        wordLength = len(words[0])
        winSize = len(words)*wordLength
        wordsSet=defaultdict(int)
        res=[]
        for word in words:
            wordsSet[word]+=1
        for i in range(wordLength):
            #for each position we need to traverse the whole str
            #first to get the init set of this loop
            left,right = i,i+winSize
            tmpSet=defaultdict(int)
            for j in range(left,right,wordLength):
                tmp =s[j:j+wordLength]
                tmpSet[tmp]+=1
            while right<=len(s):
                #比较两个字典是否完全相同
                if check(tmpSet,wordsSet):
                    res.append(left)
                if right+wordLength>len(s):
                    break
                tmpSet[s[left:left+wordLength]]-=1
                tmpSet[s[right:right+wordLength]]+=1
                left+=wordLength
                right+=wordLength
        return res
def check(tmpSet,wordsSet):
        for word in wordsSet:
            if word not in tmpSet:
                return False
            elif tmpSet[word] != wordsSet[word]:
                return False
        return True    

答案补充:

最终结果是击败了93%,还有优化空间不过也差不多了。字串匹配还是要往滑窗上想呀,