Pattern隐藏了哪些Java8追加的新功能

171 阅读5分钟

要想知道Pattern隐藏了哪些Java8追加的新功能、先来复习 Pattern 类

Pattern在java.util.regex包中,是正则表达式的编译表示形式,此类的实例是不可变的,可供多个并发线程安全使用。

代码结构

image.png

获取Pattern类的实例

public static Pattern compile(String regex) {
    return new Pattern(regex, 0);
}

public static Pattern compile(String regex, int flags) {
    return new Pattern(regex, flags);
}

UNIX_LINES

Enables Unix lines mode.
1In this mode, only the '\n' line terminator is recognized in the behavior of ., ^, and $.
2Unix lines mode can also be enabled via the embedded flag expression (?d).


unix行模式
1、大多数系统的行都是以\n结尾的,但是少数系统,比如Windows,却是以\r\n组合来结尾的;
启用这个模式之后,将会只以\n作为行结束符,这会影响到^、$和点号(点号匹配换行符)。
2、通过嵌入式标志表达式(?d)也可以启用Unix行模式。

CASE_INSENSITIVE

Enables case-insensitive matching.
1By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched.
2、Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag.
3Case-insensitive matching can also be enabled via the embedded flag expression (?i).
4、Specifying this flag may impose a slight performance penalty.

这个标志能让表达式忽略大小写进行匹配。
1、默认情况下,大小写不敏感的匹配只适用于US-ASCII字符集。
2、要想对Unicode字符进行大小不敏感的匹配,只要将UNICODE_CASE与这个标志合起来就行了
3、通过嵌入式标志表达式(?i)也可以启用不区分大小写的匹配。
4、指定此标志可能对性能产生一些影响。

COMMENTS

Permits whitespace and comments in pattern.
1、In this mode, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line.
2、Comments mode can also be enabled via the embedded flag expression (?x).

1、这种模式下,匹配时会忽略(正则表达式里的)空字符(不是指表达式里的”\\s”,而是指表达式里的空格,tab,回车之类)和注释(从#开始,一直到这行结束)
2、通过嵌入式标志表达式(?x)也可以启用注释模式。

MULTILINE

Enables multiline mode.
1、In multiline mode the expressions ^ and $ match just after or just before, 
respectively, a line terminator or the end of the input sequence. 
By default these expressions only match at the beginning and the end of the entire input sequence.
2、Multiline mode can also be enabled via the embedded flag expression (?m).

1、默认情况下,输入的字符串被看作是一行,即便是这一行中包含了换行符也被看作一行。当匹配 ^ 到 $ 之间的内容的时候,整个输入被看成一个一行。启用多行模式之后,包含换行符的输入将被自动转换成多行,然后进行匹配。
2、通过嵌入式标志表达式(?m)也可以启用多行模式。

LITERAL

Enables literal parsing of the pattern.
1、When this flag is specified then the input string that specifies the pattern is treated as a sequence of literal characters. Metacharacters or escape sequences in the input sequence will be given no special meaning.
2、The flags CASE_INSENSITIVE and UNICODE_CASE retain their impact on matching when used in conjunction with this flag. The other flags become superfluous.
3、There is no embedded flag character for enabling literal parsing.
Since:1.5

启用字面值解析模式
1、指定此标志后,指定模式的输入字符串就会作为字面值字符序列来对待。输入序列中的元字符或转义序列不具有任何特殊意义。
2、标志CASE_INSENSITIVE和UNICODE_CASE在与此标志一起使用时将对匹配产生影响。其他标志都变得多余了。
3、不存在可以启用字面值解析的嵌入式标志字符。

DOTALL

Enables dotall mode.
1、In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.
2Dotall mode can also be enabled via the embedded flag expression (?s). (The s is a mnemonic for "single-line" mode, which is what this is called in Perl.)

1、在这种模式中,表达式.可以匹配任何字符,包括行结束符。默认情况下,此表达式不匹配行结束符。
2、通过嵌入式标志表达式(?s)也可以启用此种模式(s是“single-line”模式的助记符,在Perl中也使用它)。

UNICODE_CASE

Enables Unicode-aware case folding.
1When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched.
2Unicode-aware case folding can also be enabled via the embedded flag expression (?u).
3、Specifying this flag may impose a performance penalty.

1、在这个模式下,如果你还启用了CASE_INSENSITIVE标志,那么它会对Unicode字符进行大小写不敏感的匹配。默认情况下,大小写不明感的匹配只适用于US-ASCII字符集。
2、通过嵌入式标志表达式(?u)也可以启用多行模式。
3、指定此标志可能对性能产生影响。

CANON_EQ

Enables canonical equivalence.
1When this flag is specified then two characters will be considered to match if, and only if, their full canonical decompositions match. The expression "a\u030A", for example, will match the string "\u00E5" when this flag is specified. By default, matching does not take canonical equivalence into account.
2、There is no embedded flag character for enabling canonical equivalence.
3、Specifying this flag may impose a performance penalty.

1、当且仅当两个字符的正规分解(canonicaldecomposition)都完全相同的情况下,才认定匹配。比如用了这个标志之后,表达式a/u030A会匹配?。默认情况下,不考虑规范相等性(canonicalequivalence)。
2、不存在可以启用字面值解析的嵌入式标志字符。
3、指定此标志可能对性能产生影响。

举例

Pattern p = Pattern.compile("a+", Pattern.CASE_INSENSITIVE);
Pattern p = Pattern.compile("a+");

以上内容辅助参考

blog.csdn.net/liupeifeng3…

方法介绍

// Returns this pattern's match flags.
int flags() 

// Returns the regular expression from which this pattern was compiled.
String pattern()


Pattern p = Pattern.compile("a+", Pattern.CASE_INSENSITIVE);
System.out.println(p.flags());    // 2
System.out.println(p.pattern());  // a+
System.out.println(p.toString()); // a+
String[] split(CharSequence input) 
String[] split(CharSequence input, int limit)
Stream<String> splitAsStream(final CharSequence input)

split可以参考 java split(String regex, int limit) 和 split(String regex)的区别

String actualString = "Welcomedalianfordalian!";
Pattern pattern = Pattern.compile("dalian");
Stream<String> stream = pattern.splitAsStream(actualString);
stream.forEach(System.out::println);

Welcome
for
!
String quote(String s) 

System.out.println(Pattern.compile(Pattern.quote(".*")).matcher("123").matches());  // false
System.out.println(Pattern.compile(Pattern.quote(".*")).matcher("foo").matches());  // false
System.out.println(Pattern.compile(Pattern.quote(".*")).matcher(".*").matches());   // true
// Creates a predicate which can be used to match a string.
Predicate<String> asPredicate()

Predicate<String> predicate = Pattern.compile("dalian").asPredicate();
System.out.println(predicate.test("welcome dalian"));

true
static boolean matches(String regex, CharSequence input)

Pattern.matches("^abc.*$", "abcasdf"));                     // true
// 和下面等价
Pattern.compile("^abc.*$").matcher("abcasdf").matches());   // true
Matcher matcher(CharSequence input) 

Pattern compile = Pattern.compile("^abc.*$");
Matcher matcher = compile.matcher("abcasdf");
if (matcher.matches()) {
    System.out.println(matcher.group());        // abcasdf
}