三剑客该聚会了

497 阅读5分钟

文本处理工具sed

简介

sed是非交互式的流编辑器,以行为单位处理流经它的文本,并打印到标准输出。如果要保存处理后的结果需要重定向到文件。默认只会修改流经sed的文本,不会修改原文件,如果要同时修改原文件需要添加参数-i。

格式

sed [option] 'command' file

为便于说明,新建sed.txt文件,内容如下

this is line 1, this is First line
this is line 2, the Second line, Empty line followed

this is line 4, this is Third line
this is line 5, this is Fifth line

打印(查看)

使用命令p以及-n参数打印,-n表示不打印没关系的行,不加-n会打印所有且重复打印。

[root@hangzhou01 ~]# sed -n '1p' sed.txt 
this is line 1, this is First line

[root@hangzhou01 ~]# sed -n 's/5/A/p' sed.txt 
this is line A, this is Fifth line

插入(新增)

使用命令i或a插入文本,i在匹配行之前插入,a在匹配行之后插入

##命令中的2表示第二行,在匹配到的第2行之前插入文本
[root@hangzhou01 ~]# sed '2i insert' sed.txt 
this is line 1, this is First line
insert
this is line 2, the Second line, Empty line followed

this is line 4, this is Third line
this is line 5, this is Fifth line
##命令中的2表示第二行,在匹配到的第2行之后插入文本
[root@hangzhou01 ~]# sed '2a insert' sed.txt 
this is line 1, this is First line
this is line 2, the Second line, Empty line followed
insert

this is line 4, this is Third line
this is line 5, this is Fifth line

##匹配到Second之后在Second所在行前一行插入文本
[root@hangzhou01 ~]# sed '/Second/i insert' sed.txt 
this is line 1, this is First line
insert
this is line 2, the Second line, Empty line followed

this is line 4, this is Third line
this is line 5, this is Fifth line

使用命令r可以读取指定文件,并插入匹配行后

新建文本文件part.txt
I am in beijing.
where are you?

##在sed.txt文件的空行之后插入part.txt文本内容
[root@hangzhou01 ~]# sed '/^$/r part.txt' sed.txt 
this is line 1, this is First line
this is line 2, the Second line, Empty line followed

I am in beijing.
where are you?
this is line 4, this is Third line
this is line 5, this is Fifth line

删除

使用d命令可以删除指定行

##删除某一行
[root@hangzhou01 ~]# sed '1d' sed.txt 
this is line 2, the Second line, Empty line followed

this is line 4, this is Third line
this is line 5, this is Fifth line
##删除第n到第m行
[root@hangzhou01 ~]# sed '1,3d' sed.txt 
this is line 4, this is Third line
this is line 5, this is Fifth line
##只保留指定行
[root@hangzhou01 ~]# sed '1!d' sed.txt 
this is line 1, this is First line
[root@hangzhou01 ~]# sed '1,4!d' sed.txt 
this is line 1, this is First line
this is line 2, the Second line, Empty line followed

this is line 4, this is Third line
##删除包含指定关键字的行
[root@hangzhou01 ~]# sed '/Empty/d' sed.txt 
this is line 1, this is First line

this is line 4, this is Third line
this is line 5, this is Fifth line
##删除空行
[root@hangzhou01 ~]# sed '/^$/d' sed.txt 
this is line 1, this is First line
this is line 2, the Second line, Empty line followed
this is line 4, this is Third line
this is line 5, this is Fifth line

替换

使用s命令可以将查找到的匹配文本替换为新文本

##将line替换成LINE,只匹配行顺数第1个
[root@hangzhou01 ~]# sed 's/line/LINE/' sed.txt 
this is LINE 1, this is First line
this is LINE 2, the Second line, Empty line followed

this is LINE 4, this is Third line
this is LINE 5, this is Fifth line
##将line替换成LINE,只匹配行顺数2个
[root@hangzhou01 ~]# sed 's/line/LINE/2' sed.txt 
this is line 1, this is First LINE
this is line 2, the Second LINE, Empty line followed

this is line 4, this is Third LINE
this is line 5, this is Fifth LINE
##将line替换成LINE,所有匹配到的都替换
[root@hangzhou01 ~]# sed 's/line/LINE/g' sed.txt 
this is LINE 1, this is First LINE
this is LINE 2, the Second LINE, Empty LINE followed

this is LINE 4, this is Third LINE
this is LINE 5, this is Fifth LINE
##将开头的thisline替换成LINE,所有匹配到的都替换
[root@hangzhou01 ~]# sed 's/^this/that/' sed.txt 
that is line 1, this is First line
that is line 2, the Second line, Empty line followed

that is line 4, this is Third line
that is line 5, this is Fifth line

使用c命令可以将指定的所有行用指定内容替换

[root@hangzhou01 ~]# sed '2c China' sed.txt 
this is line 1, this is First line
China

this is line 4, this is Third line
this is line 5, this is Fifth line
[root@hangzhou01 ~]# sed '2,$c China' sed.txt 
this is line 1, this is First line
China

使用y命令可以将一系列字符转换成另外一系列字符。转换字符和被转换字符长度必须一致。

##将1转换成A,2转换成B,3转换成C,4转换成D
[root@hangzhou01 ~]# sed 'y/1234/ABCD/' sed.txt 
this is line A, this is First line
this is line B, the Second line, Empty line followed

this is line D, this is Third line
this is line 5, this is Fifth line

指定脚本

使用命令f指定sed脚本文件,可以将一些列动作放到文件中,然后一起“装载”

新建脚本operation.txt,内容如下
s/this/that/
/^$/d

[root@hangzhou01 ~]# sed -f operation.txt sed.txt 
that is line 1, this is First line
that is line 2, the Second line, Empty line followed
that is line 4, this is Third line
that is line 5, this is Fifth line

文本处理工具awk

简介

awk是基于列的文本处理工具,按行读取文本并视为一条记录,每条记录以空格、tab以及空格和tab组成的字段分割成若干个域,然后输出各个域值。即非空白部分称为域,从左向右分别是第一个域、第二个域...

格式

awk '{pattern action}' file

为方便说明,新建awk.txt

john.wang  Male    30 021-1111
lucy.wang  Female  25 021-2222
jack.wang  Male    35 021-3333
lily.wang  Female  20 021-4444  ShangHai

打印指定域

0表示所有域,0表示所有域,n表示第n个域(n>0)

[root@hangzhou01 ~]# awk '{print $1,$3}' awk.txt 
john.wang 30
lucy.wang 25
jack.wang 35
lily.wang 20
[root@hangzhou01 ~]# awk '{print $0}' awk.txt 
john.wang  Male    30 021-1111
lucy.wang  Female  25 021-2222
jack.wang  Male    35 021-3333
lily.wang  Female  20 021-4444	ShangHai

NF最后一个域,NF最后一个域,(NF-1)倒数第二个域,$(NF-2)倒数第三个域,依次类推

[root@hangzhou01 ~]# awk '{print $NF}' awk.txt 
021-1111
021-2222
021-3333
ShangHai
[root@hangzhou01 ~]# awk '{print $(NF-1)}' awk.txt 
30
25
35
021-4444

指定打印的分隔符

默认使用空白字符作为分割出域的分隔符,通过命令-F可以指定分隔符

##使用默认分隔符
[root@hangzhou01 ~]# awk '{print $1,$2}' awk.txt 
john.wang Male
lucy.wang Female
jack.wang Male
lily.wang Female
##指定分隔符为.
[root@hangzhou01 ~]# awk -F. '{print $1,$2}' awk.txt 
john wang  Male    30 021-1111
lucy wang  Female  25 021-2222
jack wang  Male    35 021-3333
lily wang  Female  20 021-4444	ShangHai

打印截取的子字符串

使用substr(1,2,4)函数,1,2,4)函数,1表示截取的域,2表示从第几个字符开始(从1开始数,而不是0),4表示截取多少个字符(该参数可以省略,省略表示截取到末尾)

[root@hangzhou01 ~]# awk '{print substr($1,2,4)}' awk.txt 
ohn.
ucy.
ack.
ily.
[root@hangzhou01 ~]# awk '{print substr($1,2)}' awk.txt 
ohn.wang
ucy.wang
ack.wang
ily.wang

打印指定域的字符串长度

使用length($1)函数获取长度,括号内为指定域

[root@hangzhou01 ~]# awk '{print length($1)}' awk.txt 
9
9
9
9

printf

格式化输出

%ns

s代表字符串,如果n小于等于字符串的真实长度,则按照字符串的真实长度输出;否则,不够n长度的字符串左边补空字符,直到输出长度为n

%ni

i代表Integer,规则如上

%n.mf

f代表浮点,如果有m则表示必须保证m个小数位,如果没有.m则默认表示6个小数位,如果没有m则默认表示0个小数位;n表示整个输出的长度,在保证小数位的情况下,加上小数点和整数部分的输出长度大于n,则按照保证小数位的输出,否则左边补空字符,知道输出长度为n

grep

简介

查找文件里符合条件的字符串

格式

grep [option] something dir

常用选项

grep -w something dir  精准匹配
grep -v something dir  反转匹配
grep -r something dir  递归查找
grep -Cn something dir 输出文本之前和之后n行,n可以为1、2、3...