正则表达式基础语法记录

123 阅读2分钟
  1. 字符匹配:

    • 普通字符:匹配与其自身相等的字符。例如,a匹配字符"a"。
    • 字符类:使用方括号[]定义一个字符类,匹配方括号中列举的任意字符。例如,[abc]匹配字符"a"、"b"或"c"。
    • 范围类:在字符类中使用连字符-表示范围。例如,[a-z]匹配任意小写字母。
    • 反向类:在字符类前加上脱字符^,表示匹配除了字符类中列举的字符之外的任意字符。例如,[^0-9]匹配除了数字之外的任意字符。
  2. 重复匹配:

    • *:匹配前面的表达式零次或多次。
    • +:匹配前面的表达式一次或多次。
    • ?:匹配前面的表达式零次或一次。
    • {n}:匹配前面的表达式恰好 n 次。
    • {n,}:匹配前面的表达式至少 n 次。
    • {n,m}:匹配前面的表达式至少 n 次,但不超过 m 次。
  3. 特殊字符:

    • \d:匹配任意数字字符。
    • \w:匹配任意字母、数字或下划线字符。
    • \s:匹配任意空白字符。
    • \b:匹配单词边界。
    • .:匹配除换行符以外的任意字符。
  4. 分组和引用:

    • ():用于分组,将多个表达式组合为一个整体。
    • (?:):非捕获分组,用于分组但不捕获匹配的内容。
    • \n:引用前面的分组,n 表示分组的序号。

一份 Cheat Sheet

AnchorDescriptionExampleValid matchInvalid
^start of string or line^foamfoambath foam
\Astart of string in any match mode\Afoamfoambath foam
$end of string or linefinish$finishfinnish
\Zend of string, or char before last new line in any match modefinish\Zfinishfinnish
\zend of string, in any match mode.
\Gend of the previous match or the start of the string for the first match^(getset)\G\w+$setValueseValue
\bword boundary; position between a word character (\w), and a nonword character (\W)\bis\bThis island is beautifulThis island isn't beautiful
\Bnot-word-boundary.\Blandislandpeninsula
AssertionDescriptionExampleValid matchInvalid
(?=...)positive lookaheadquestion(?=s)questionsquestion
(?!...)negative lookaheadanswer(?!s)answeranswers
(?<=...)positive look-behind(?<=appl)eappleapplication
(?<!...)negative look-behind(?<!goo)dmoodgood
Char classDescriptionExampleValid matchInvalid
[ ]class definition[axf]a, x, fb
[ - ]class definition range[a-c]a, b, cd
[ \ ]escape inside class[a-f.]a, b, .g
[^ ]Not in class[^abc]d, ea
[:class:]POSIX class[:alpha:]string0101
.match any chars except new lineb.ttlebattle, bottlebttle
\swhite space, [\n\r\f\t ]good\smorninggood morninggood.morning
\Sno-white space, [^\n\r\f\t]good\Smorninggood.morninggood morning
\ddigit\d{2}231a
\Dnon-digit\D{3}foo, barfo1
\wword, [a-z-A-Z0-9_]\w{4}v411v4.1
\Wnon word, [^a-z-A-Z0-9_].$%?.$%?.ab?
Special characterDescription
general escape
\nnew line
\rcarriage return
\ttab
\vvertical tab
\fform feed
\aalarm
[\b]backspace
\eescape
\ccharCtrl + char(ie:\cc is Ctrl+c)
\ooothree digit octal (ie: \123)
\xhhone or two digit hexadecimal (ie: \x10)
\x{hex}any hexadecimal code (ie: \x{1234})
\p{xx}char with unicode property (ie: \p{Arabic}
\P{xx}char without unicode property
SequenceDescriptionExampleValid matchInvalid
alternationappleorangeapple, orangemelon
( )subpatternfoot(erball)footer or footballfootpath
(?P<name>...)subpattern, and capture submatch into name(?P<greeting>hello)hellohallo
(?:...)subpattern, but does not capture submatch(?:hello)hellohallo
+one or more quantifierye+ahyeah, yeeeahyah
*zero or more quantifierye*ahyeeah, yeeeah, yahyeh
?zero or one quantifieryes?yes, yeyess
??zero or one, as few times as possible (lazy)yea??hyeahyeaah
+?one or more lazy/<.+?>/g<P>foo</P> matches only <P> and </P>
*?zero or more, lazy/<.*?>/g<html>
{n}n times exactlyfo{2}foofooo
{n,m}from n to m timesgo{2,3}dgood,gooodgooood
{n,}at least n timesgo{2,}goo, gooogo
(?(condition)...)if-then pattern(<)?[p](?(1)>)<p>, p<p
(?(condition)......)if-then-else pattern`^(?(?=q)queans)`question, answer
Pattern modifierDescription
gglobal match
icase-insensitiv, match both uppercase and lowercase
mmultiple lines
ssingle line (by default)
xingore whitespace allows comments
Aanchored, the pattern is forced to ^
Ddollar end only, a dollar metacharacter matches only at the end
Sextra analysis performed, useful for non-anchored patterns
Uungreedy, greedy patterns becomes lazy by default
Xadditional functionality of PCRE (PCRE extra)
Jallow duplicate names for subpatterns
uunicode, pattern and subject strings are treated as UTF-8

资料

onevcat.com/2022/11/swi…