题目描述:
给你一个字符串 s ,考虑其所有 重复子串 :即,s 的连续子串,在 s 中出现 2 次或更多次。这些出现之间可能存在重叠。
返回 任意一个 可能具有最长长度的重复子串。如果 s 不含重复子串,那么答案为 "" 。
示例:
示例 1:
输入:s = "banana"
输出:"ana"
示例 2:
输入:s = "abcd"
输出:""
提示:
s 由小写英文字母组成
分析:
- 首先题目需要找出最长长度的重复字串。若存在长度为L的字串满足条件,则字串长度小于L时,必然会存在重复字串,当长度大于L时,则没有重复字串。所以第一步可以用二分法来猜最长重复字串的长度。
- 确定完字串长度后,我们就可以采用滑动窗口移动来遍历字符串s,看是否存在重复字串.
编码:
class Solution {
public String longestDupSubstring(String s) {
int len = s.length();
int start = 0;
int end = len - 1;
String res = "";
while (start <= end){
int mid = start + (end - start + 1) / 2;
String subStr = check(mid, s, len);
if (!("".equals(subStr))) {
start = mid + 1;
res = subStr;
} else {
end = mid - 1;
}
}
return res;
}
private String check(int mid, String s, int strLen) {
HashSet<String> set = new HashSet<>();
for (int i = 0; i <= strLen - mid; i++) {
String substring = s.substring(i, i + mid);
if (set.contains(substring)) {
return substring;
} else {
set.add(substring);
}
}
return "";
}
}
啪,超出内存限制,太年轻了! 提示很明显了。
官方题解:二分 + Rabin-Karp 字符串编码
- 第一步思路一致,用二分来确定重复字串的长度 L。
- 第二步采用 Rabin-Karp 字符串编码高效判断 s 中是否有长度为 L 的重复子串
那么什么是 Rabin-Karp 字符串编码呢?
核心 就是:用 hash 来判断字符串是否重复(若两个子字符串hash一致,则重复(ps:别扣着hash冲突不放)),并且计算下一滑动窗口的字符串的 hash 仅需 O(1) 的时间。
那么它是如何实现的呢?
- 首先,我们需要对 s 的每个字符进行编码,得到一个数组 arr。因为本题中 s 仅包含小写字母,我们可按照 arr[i] = (int)s.charAt(i) - (int)a,将所有字母编码为 0-25 之间的数字。比如字符串 "abcde" 可以编码为数组 [0,1,2,3,4]。
- 我们将子串看成一个 26 进制的数,它对应的 10 进制数就是它的编码。假设此时我们需要求长度为 3 的子串的编码。那么第一个子串 “abc” 的编码就是 ,抽象成一般形式就是:
- 接下来我们求取下一滑动窗口字符字串的编码时就相当于该26进制数左移一位后掐头加尾了,例如:第二个字串为“bcd”,则编码为。一般形式为:
这样仅需O(1)的时间就可求得下一字串的hash值,我们再用一个hashset来存储该hash值,若存在相同的hash值,则存在长度为L的重复字串。
大佬解法:
class Solution {
public String longestDupSubstring(String S) {
char[] sc = S.toCharArray();
// Check if there aren't any duplicate substrings. There can
// only be no duplicates if the string does not have more than
// one occurrence of any character in the string. Since the
// string only contains lowercase characters, the string
// length must be less than 26 characters, otherwise at least
// one character must be duplicated.
int longestSubstringIdx = 0;
int longestSubstringLen = 0;
int[] found = new int[26];
for (int i = sc.length - 1; i >= 0; i--) {
if (found[sc[i] - 'a']++ > 0) {
longestSubstringIdx = i;
longestSubstringLen = 1;
break;
}
}
if (longestSubstringLen == 0) return "";
// Check for the same character over a large contiguous area.
// If we find a long repeat of the same character, then we can
// use this to set a minimum length for the longest duplicate
// substring, and therefore we don't have to check any shorter
// substrings.
for (int i = sc.length - 1; i > 0; i--) {
if (sc[i] == sc[i - 1]) {
char c = sc[i];
int startI = i;
int reptCount = 2;
for (i = i - 2; i >= 0 && sc[i] == c; i--) { }
i++;
if (startI - i > longestSubstringLen) {
longestSubstringLen = startI - i;
longestSubstringIdx = i + 1;
}
}
}
if (longestSubstringLen == sc.length - 1) return S.substring(0, longestSubstringLen);
// Build a table of two-charactar combined values for the
// passed String. These combined values are formed for any
// index into the String, by the character at the current
// index reduced to the range [0..25] times 26, plus the
// next character in the string reduced to the range [0..25].
// This combined value is used to index into the array
// twoCharHead[], which contains the index into the string of
// the first character pair with this combined value, which is
// also used to index into the array twoCharList[]. The
// twoCharList[] array is a "linked list" of String indexes
// that have the same combined values for a character pair.
//
// To look up all character pairs with the same combined
// value N, start at twoCharHead[N]. This will give the
// String index X of the first character pair with that
// combined value. To find successive String indexes, lookup
// in twoCharList[X] to get the new String index X. Then
// repeatedly lookup new X values in twoCharList[X], until
// X equals zero, which indicates the end of the character
// pairs with the same combined value.
short[] twoCharHead = new short[26 * 26];
short[] twoCharList = new short[sc.length + 1];
for (int i = sc.length - longestSubstringLen - 1; i > 0; i--) {
int twoCharNum = (sc[i] - 'a') * 26 + sc[i + 1] - 'a';
twoCharList[i] = twoCharHead[twoCharNum];
twoCharHead[twoCharNum] = (short)i;
}
// Search the String for matching substrings that are longer
// than the current longest substring found. Start at the
// beginning of the string, and successively get a character
// pair's combined value. Use that character pair's combined
// value to find all other character pair's with the same
// combined value. In the process, remove any character pairs
// that occur in the String before the current character pair.
// For two character pairs that appear that they may be a
// possible matching substring longer than the currently
// longest found match, then test to see if the substrings
// match.
int curIdxLimit = sc.length - longestSubstringLen - 1;
for (int i = 0; i <= curIdxLimit; i++) {
int twoCharNum = (sc[i] - 'a') * 26 + sc[i + 1] - 'a';
while (twoCharHead[twoCharNum] <= i && twoCharHead[twoCharNum] != 0)
twoCharHead[twoCharNum] = twoCharList[twoCharHead[twoCharNum]];
int compIdx = twoCharHead[twoCharNum];
while (compIdx != 0 && compIdx <= curIdxLimit) {
if (sc[i + longestSubstringLen] == sc[compIdx + longestSubstringLen] &&
sc[i + longestSubstringLen / 2] == sc[compIdx + longestSubstringLen / 2]) {
int lowIdx = i + 2;
int highIdx = compIdx + 2;
while (highIdx < sc.length && sc[lowIdx] == sc[highIdx]) {
lowIdx++;
highIdx++;
}
if (lowIdx - i > longestSubstringLen) {
longestSubstringLen = lowIdx - i;
longestSubstringIdx = i;
curIdxLimit = sc.length - longestSubstringLen - 1;
}
}
compIdx = twoCharList[compIdx];
}
}
return S.substring(longestSubstringIdx, longestSubstringIdx + longestSubstringLen);
}
}