太长不看版 这篇文章讲了啥
spring.replaceAll(String regex, String replacement)是用于正则替换的常见方法。但是如果你的replacement有反斜杠或者美元符有可能会报错。
有💲报:
java.lang.IllegalArgumentException: Illegal group reference: group index is missing
有反斜杠\在最后一位报错:
java.lang.IllegalArgumentException: character to be escaped is missing
测试代码
final String ALERT_FORMAT = "hello @name I am a rap star";
System.out.println(ALERT_FORMAT.replaceAll("@name","Galaxy\\22"));
System.out.println(ALERT_FORMAT.replaceAll("@name","Galaxy\\"));
System.out.println(ALERT_FORMAT.replaceAll("@name","Galaxy$"));
原因
spring.replaceAll底层是matcher.replaceAll。 输入的正则字符串会通过pattern.compile编译后调用matcher()生成matcher。之后调用matcher的replaceAll方法。
问题出在matcher这里,matcher在计算是否和正则匹配的时候,会把未命中的字符append到结果字符串后。这个过程中用到了appendReplacement()。更重要的是这个计算过程中有两个字符有特别的含义,💲将会被当作开始匹配的索引,反斜杠用于转义。 所以如果在使用string.replaceAll的时候,有💲会导致本来应该被匹配的group丢掉index。 如果反斜杠在字符的最后一位,jvm将会认为还有字符需要被转义【不在最后一位是不会报错的】。都会抛出IllegalArgumentException。
解决办法
提前处理一下字符,去掉美元和\。或者使用hutool这个工具类的springUtils。
return StrUtil.isEmpty(content) ? StrUtil.EMPTY : content
.replaceAll("[^\\u0000-\\uFFFF]", "")
.replaceAll("[$]", "")
.replaceAll("[\\\\]","");
我这的代码还额外去掉了emoji。不需要的删掉就好了。
content = StrUtil.replace(content, wildcard.getWildcard(), value);
源码
/**
* Replaces every subsequence of the input sequence that matches the
* pattern with the given replacement string.
*
* <p> This method first resets this matcher. It then scans the input
* sequence looking for matches of the pattern. Characters that are not
* part of any match are appended directly to the result string; each match
* is replaced in the result by the replacement string. The replacement
* string may contain references to captured subsequences as in the {@link
* #appendReplacement appendReplacement} method.
*
* <p> Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in
* the replacement string may cause the results to be different than if it
* were being treated as a literal replacement string. Dollar signs may be
* treated as references to captured subsequences as described above, and
* backslashes are used to escape literal characters in the replacement
* string.
*
* <p> Given the regular expression <tt>a*b</tt>, the input
* <tt>"aabfooaabfooabfoob"</tt>, and the replacement string
* <tt>"-"</tt>, an invocation of this method on a matcher for that
* expression would yield the string <tt>"-foo-foo-foo-"</tt>.
*
* <p> Invoking this method changes this matcher's state. If the matcher
* is to be used in further matching operations then it should first be
* reset. </p>
*
* @param replacement
* The replacement string
*
* @return The string constructed by replacing each matching subsequence
* by the replacement string, substituting captured subsequences
* as needed
*/
public String replaceAll(String replacement) {
reset();
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}