正则表达式

47 阅读9分钟

正则表达式.png

什么是正则表达式

正则表达式就是处理字串的方法,他是以行为单位来进行字串的处理行为, 正则表达式通过一些特殊符号的辅助,可以让使用者轻易的达到“搜寻/删除/取代”某特定字串的处理程序。

基础正则表达式

对字符排序有影响的语系数据就会对正则表达式的结果有影响,正则表达式也需要支持工具程序来辅助才行,也就是grep

语系对正则表达式的影响

不同的语言环境,编码数据是不相同的。

LANG=C 时:0 1 2 3 4 ... A B C D ... Z a b c d ...z

LANG=zh_TW 时:0 1 2 3 4 ... a A b B c C d D ... z Z

为了避免语言环境的变化而导致的数据截取不同,有一些特殊的字符用于使用

image.png

grep的进阶选项

[root@clay ~]# grep [-A] [-B] [--color=auto] '搜寻字符串' filename
选项与参数
-A:后面可加数字,为after的意思,除了列出该行外,后续的n行也列出来
-B:后面可加数字,为berfer的意思,除了列出该行外,前面的n行也列出来

范例一:
[root@clay ~]# cat /etc/passwd | grep -n -A3 -B2 'root'
1:root:x:0:0:root:/root:/bin/bash
2-bin:x:1:1:bin:/bin:/sbin/nologin
3-daemon:x:2:2:daemon:/sbin:/sbin/nologin
4-adm:x:3:4:adm:/var/adm:/sbin/nologin
--
8-halt:x:7:0:halt:/sbin:/sbin/halt
9-mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
10:operator:x:11:0:operator:/root:/sbin/nologin
11-games:x:12:100:games:/usr/games:/sbin/nologin
12-ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
13-nobody:x:99:99:Nobody:/:/sbin/nologin

grep 在数据中查寻一个字串时,是以"整行" 为单位来进行数据的撷取的!

基础正则表达式练习

环境准备

[root@clay ~]# cat regular_express.txt 
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
However, this dress is about $ 3183 dollars.^M
GNU is free air not free beer.^M
Her hair is very beauty.^M
I can't finish the test.^M
Oh!The soup taste good.^M
motorcycle is cheap than cat .
This window is clear.
the symbol '*' is represented as start.
Oh!     My god!
The gd software is a library for drafting programs.^M
You are the best is mean you are the no.1.
The world <Happy> is the same with "glad".
I like dog.
google is the best tools for search keyword.
goooooogle yes!
go! go! Let's go.
# I am VBird

例题一、搜寻特定字符串
[root@clay ~]# grep -n 'the' regular_express.txt 
8:I can't finish the test.^M
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no.1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

-n:显示行号

反向选择

[root@clay ~]# grep -vn 'the' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However, this dress is about $ 3183 dollars.^M
6:GNU is free air not free beer.^M
7:Her hair is very beauty.^M
9:Oh!The soup taste good.^M
10:motorcycle is cheap than cat .
11:This window is clear.
13:Oh!     My god!
14:The gd software is a library for drafting programs.^M
17:I like dog.
19:goooooogle yes!
20:go! go! Let's go.
21:# I am VBird
22:

-v:取反

忽视大小写

[root@clay ~]# grep -in 'the' regular_express.txt 
8:I can't finish the test.^M
9:Oh!The soup taste good.^M
12:the symbol '*' is represented as start.
14:The gd software is a library for drafting programs.^M
15:You are the best is mean you are the no.1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.
例题二、利用中括号[] 来搜寻集合字符

如果要搜寻的字符有共同字符

[root@clay ~]# grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.
9:Oh!The soup taste good.

取'oo'字符串

[root@clay ~]# grep -n 'oo' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh!The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

如果不想要'oo'前面带'g'

[root@clay ~]# grep -n '[^g]oo' regular_express.txt 
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

[^]:反向选择

但是为什么18,和19还是 是因为18行里面有tool,而19行也显示出来了,那是因为匹配的字符串是oo,也就是g0ooooogle

[root@clay ~]# grep -no '[^g]oo' regular_express.txt 
2:foo
3:Foo
18:too
19:ooo
19:ooo

-o:查看匹配过程

oo前不需要小写字符

[root@clay ~]# grep -n '[^a-z]oo' regular_express.txt 
3:Football game is not use feet only.

还可以使用前文提到的特殊字符

例题三、行首与行尾字符 ^$

the开头的行

[root@clay ~]# grep -n '^the' regular_express.txt 
12:the symbol '*' is represented as start.

小写字母开头的行

[root@clay ~]# grep -n '^[[:lower:]]' regular_express.txt 
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than cat .
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
[root@clay ~]# grep -n '^[a-z]' regular_express.txt 
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than cat .
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

取开头不是英文字母的

[root@clay ~]# grep -n '^[^a-zA-Z]' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird

注意:^ 在[]内和[]外的意义是不一样的,在外部表示头部,在内部表示取反

以.结尾的行

[root@clay ~]# grep -n '\.$' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However, this dress is about $ 3183 dollars.
6:GNU is free air not free beer.
7:Her hair is very beauty.
8:I can't finish the test.
9:Oh!The soup taste good.
10:motorcycle is cheap than cat .
11:This window is clear.
12:the symbol '*' is represented as start.
14:The gd software is a library for drafting programs.
15:You are the best is mean you are the no.1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.

搜寻空白行

[root@clay ~]# grep -n '^$' regular_express.txt 
22:

查看/etc/rsyslog.conf中生效的内容

[root@clay ~]# grep -v '^$' /etc/rsyslog.conf | grep -vn '^#' 
6:$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
7:$ModLoad imjournal # provides access to the systemd journal
18:$WorkDirectory /var/lib/rsyslog
20:$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
25:$IncludeConfig /etc/rsyslog.d/*.conf
28:$OmitLocalLogging on
30:$IMJournalStateFile imjournal.state
37:*.info;mail.none;authpriv.none;cron.none                /var/log/messages
39:authpriv.*                                              /var/log/secure
41:mail.*                                                  -/var/log/maillog
43:cron.*                                                  /var/log/cron
45:*.emerg                                                 :omusrmsg:*
47:uucp,news.crit                                          /var/log/spooler
49:local7.*                                                /var/log/boot.log

-v:代表取反
例题四、任意一个字符.与重复字符*

正则表达式中的*和bash中的*是不一样的,并不是万用字符

.(小数点):代表“一定有一个任意字符”的意思;

*(星星号):代表“重复前一个字符, 0 到无穷多次”的意思,为组合形态

任意单个字符,查找g..g的字符

[root@clay ~]# grep -n 'g..g' regular_express.txt 
18:google is the best tools for search keyword.

匹配前一个字符的0次或多次

[root@clay ~]# grep -n 'o*' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However, this dress is about $ 3183 dollars.
6:GNU is free air not free beer.
7:Her hair is very beauty.
8:I can't finish the test.
9:Oh!The soup taste good.
10:motorcycle is cheap than cat .
11:This window is clear.
12:the symbol '*' is represented as start.
13:Oh!     My god!
14:The gd software is a library for drafting programs.
15:You are the best is mean you are the no.1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
21:# I am VBird
22:

由于*代表的是匹配前一个字符的0次或多次,因此会匹配全文。

如果是oo*则表示,一个o后面接0个或者多个o

[root@clay ~]# grep -n 'oo*' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However, this dress is about $ 3183 dollars.
6:GNU is free air not free beer.
9:Oh!The soup taste good.
10:motorcycle is cheap than cat .
11:This window is clear.
12:the symbol '*' is represented as start.
13:Oh!     My god!
14:The gd software is a library for drafting programs.
15:You are the best is mean you are the no.1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

接着往后推,ooo*代表的是oo后面接0个或者多个o

[root@clay ~]# grep -n 'ooo*' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh!The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

如果现在需要查找一个g开头g结尾的字串,肯定不可能是g*g,道理是显然的。但是想到另外一个字符. 很容易就想到 g.*g

[root@clay ~]# grep -n 'g.*g' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
14:The gd software is a library for drafting programs.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
例题五、限定连续RE字符范围{}

如果想要限定范围就要用到 {},但是{}又有特殊的含义就只能加上转义符来使用。假设现在要寻找两个两个o的字符串

[root@clay ~]# grep -n  'o\{2\}' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh!The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

如果是g后面接2个到5个o,最后一个字符是g

[root@clay ~]# grep -n 'go\{2,5\}g' regular_express.txt 
18:google is the best tools for search keyword.

如果是两个以上的o,我们有两种方法

[root@clay ~]# grep -n 'gooo*g' regular_express.txt 
18:google is the best tools for search keyword.
19:goooooogle yes!
[root@clay ~]# grep -n 'go\{2,\}g' regular_express.txt 
18:google is the best tools for search keyword.
19:goooooogle yes!

基础正则表达式字符汇整

image.png

例题:以 ls -l 配合 grep 找出 /etc/ 下面文件类型为链接文件属性的文件名

[root@clay ~]# ls -l /etc/ | grep -n '^l' 
39:lrwxrwxrwx.  1 root root     56 Jul 12 17:51 favicon.png -> /usr/share/icons/hicolor/16x16/apps/fedora-logo-icon.png
52:lrwxrwxrwx.  1 root root     22 Jul 12 17:51 grub2.cfg -> ../boot/grub2/grub.cfg
62:lrwxrwxrwx.  1 root root     11 Jul 12 17:51 init.d -> rc.d/init.d
79:lrwxrwxrwx.  1 root root     35 Jul 12 17:53 localtime -> ../usr/share/zoneinfo/Asia/Shanghai
92:lrwxrwxrwx.  1 root root     17 Jul 12 17:50 mtab -> /proc/self/mounts
101:lrwxrwxrwx.  1 root root     21 Jul 12 17:50 os-release -> ../usr/lib/os-release
119:lrwxrwxrwx.  1 root root     10 Jul 12 17:51 rc0.d -> rc.d/rc0.d
120:lrwxrwxrwx.  1 root root     10 Jul 12 17:51 rc1.d -> rc.d/rc1.d
121:lrwxrwxrwx.  1 root root     10 Jul 12 17:51 rc2.d -> rc.d/rc2.d
122:lrwxrwxrwx.  1 root root     10 Jul 12 17:51 rc3.d -> rc.d/rc3.d
123:lrwxrwxrwx.  1 root root     10 Jul 12 17:51 rc4.d -> rc.d/rc4.d
124:lrwxrwxrwx.  1 root root     10 Jul 12 17:51 rc5.d -> rc.d/rc5.d
125:lrwxrwxrwx.  1 root root     10 Jul 12 17:51 rc6.d -> rc.d/rc6.d
127:lrwxrwxrwx.  1 root root     13 Jul 12 17:51 rc.local -> rc.d/rc.local
128:lrwxrwxrwx.  1 root root     14 Jul 12 17:50 redhat-release -> centos-release
162:lrwxrwxrwx.  1 root root     14 Jul 12 17:50 system-release -> centos-release

sed工具

[root@clay ~]# sed [-nefr] [动作]

-n: 使用安静(silent)模式。在一般 sed 的用法中,所有来自 STDIN 的数据一般都会被列出到屏幕上。 但如果加上 -n 参数后,则只有经过 sed 特殊处理的那一行(或者动作)才会被列出来。

-e: 直接在命令行界面上进行 sed 的动作编辑

-f: 直接将 sed 的动作写在一个文件内, -f filename 则可以执行 filename 内的 sed 动作;

-i: 直接修改读取的文件内容,而不是由屏幕输出。

-r: sed 的动作支持的是延伸型正则表达式的语法。(默认是基础正则表达式语法)


动作说明: [n1[,n2]]function 

n1, n2 :不见得会存在,一般代表“选择进行动作的行数”,举例来说,如果我的动作是需要在10到20 行之间进行的,则“10,20[动作行为] ”

function 有下面这些咚咚: 
a: 新增, a 的后面可以接字串,而这些字串会在新的一行出现(目前的下一行) 

c: 取代, c 的后面可以接字串,这些字串可以取代 n1,n2 之间的行! 

d: 删除,因为是删除啊,所以 d 后面通常不接任何咚咚; 

i: 插入, i 的后面可以接字串,而这些字串会在新的一行出现(目前的上一行); 

p: 打印,亦即将某个选择的数据印出。通常 p 会与参数 sed -n 一起运行~ 

s: 取代,可以直接进行取代的工作哩!通常这个 s 的动作可以搭配正则表达式! 例如 1,20s/old/new/g 就是啦!

以行为单位的新增/删除功能

范例一、将/etc/passwd的内容列出并且打印行号,同时请将第2-5行删除

[root@clay ~]# nl /etc/passwd | sed '2,5d'
     1	root:x:0:0:root:/root:/bin/bash
     6	sync:x:5:0:sync:/sbin:/bin/sync
     7	shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
     8	halt:x:7:0:halt:/sbin:/sbin/halt
     9	mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
    10	operator:x:11:0:operator:/root:/sbin/nologin
    11	games:x:12:100:games:/usr/games:/sbin/nologin
    12	ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
    13	nobody:x:99:99:Nobody:/:/sbin/nologin
    14	systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
    15	dbus:x:81:81:System message bus:/:/sbin/nologin
    16	polkitd:x:999:998:User for polkitd:/:/sbin/nologin
    17	tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
    18	sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
    19	postfix:x:89:89::/var/spool/postfix:/sbin/nologi

删除2,最后一行

[root@clay ~]# nl /etc/passwd | sed '2,$d'
     1	root:x:0:0:root:/root:/bin/bash

范例二、添加行和插入行

[root@clay ~]# nl /etc/passwd | sed '2i helloworld'
     1	root:x:0:0:root:/root:/bin/bash
helloworld
     2	bin:x:1:1:bin:/bin:/sbin/nologin
     
[root@clay ~]# nl /etc/passwd | sed '2a helloworld'
     1	root:x:0:0:root:/root:/bin/bash
     2	bin:x:1:1:bin:/bin:/sbin/nologin
helloworld

**范例三、在第二行后面加入两行字,例如“helloworld” 与“flyhigh”

[root@clay ~]# nl /etc/passwd | sed '2a helloworld \
> flyhigh'
     1	root:x:0:0:root:/root:/bin/bash
     2	bin:x:1:1:bin:/bin:/sbin/nologin
helloworld 
flyhigh

范例四、以行为单位的取代与显示功能

[root@clay ~]# nl /etc/passwd | sed '2,5c hello world'
     1	root:x:0:0:root:/root:/bin/bash
hello world
     6	sync:x:5:0:sync:/sbin:/bin/sync

范例五、仅列出/etc/passwd文件内容的第5-7行

[root@clay ~]# cat -n /etc/passwd | sed -n '2,5p'
     2	bin:x:1:1:bin:/bin:/sbin/nologin
     3	daemon:x:2:2:daemon:/sbin:/sbin/nologin
     4	adm:x:3:4:adm:/var/adm:/sbin/nologin
     5	lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

部分数据搜寻并取代功能

[root@clay ~]# ip a|grep '^.*inet .*ens33'
    inet 10.0.0.202/24 brd 10.0.0.255 scope global noprefixroute ens33
[root@clay ~]# ip a|grep '^.*inet .*ens33' | sed 's/^.*inet/ //g'
sed: -e expression #1, char 13: unknown option to `s'
[root@clay ~]# ip a|grep '^.*inet .*ens33' | sed 's/^.*inet//g'
 10.0.0.202/24 brd 10.0.0.255 scope global noprefixroute ens33
[root@clay ~]# ip a|grep '^.*inet .*ens33' | sed 's/^.*inet //g'
10.0.0.202/24 brd 10.0.0.255 scope global noprefixroute ens33
[root@clay ~]# ip a|grep '^.*inet .*ens33' | sed 's/^.*inet //g' | sed 's/b.*3'
sed: -e expression #1, char 6: unterminated `s' command
[root@clay ~]# ip a|grep '^.*inet .*ens33' | sed 's/^.*inet //g' | sed 's/b.*3'//g
10.0.0.202/24 

[root@clay ~]# cat /etc/man_db.conf | grep 'MAN'|sed 's/^#.*$//g' |sed '/^$/d'
MANDATORY_MANPATH			/usr/man
MANDATORY_MANPATH			/usr/share/man
MANDATORY_MANPATH			/usr/local/share/man
MANPATH_MAP	/bin			/usr/share/man
MANPATH_MAP	/usr/bin		/usr/share/man
MANPATH_MAP	/sbin			/usr/share/man
MANPATH_MAP	/usr/sbin		/usr/share/man
MANPATH_MAP	/usr/local/bin		/usr/local/man
MANPATH_MAP	/usr/local/bin		/usr/local/share/man
MANPATH_MAP	/usr/local/sbin		/usr/local/man
MANPATH_MAP	/usr/local/sbin		/usr/local/share/man
MANPATH_MAP	/usr/X11R6/bin		/usr/X11R6/man
MANPATH_MAP	/usr/bin/X11		/usr/X11R6/man
MANPATH_MAP	/usr/games		/usr/share/man
MANPATH_MAP	/opt/bin		/opt/man
MANPATH_MAP	/opt/sbin		/opt/man
MANDB_MAP	/usr/man		/var/cache/man/fsstnd
MANDB_MAP	/usr/share/man		/var/cache/man
MANDB_MAP	/usr/local/man		/var/cache/man/oldlocal
MANDB_MAP	/usr/local/share/man	/var/cache/man/local
MANDB_MAP	/usr/X11R6/man		/var/cache/man/X11R6
MANDB_MAP	/opt/man		/var/cache/man/opt

直接修改文件内容

延生正则表达式

image.png

文件格式化与相关处理

格式化打印:printf

[root@clay ~]# printf '%s\t %s\t %s\t %s\t %s\t \n' $(cat printf.txt)
name	 Chinese	 English	 Math	 Average	 
DmTsai	 80	 60	 92	 77.33	 
VBird	 75	 55	 80	 70.00	 
Ken	 60	 90	 70	 73.33