10-linux命令-文本处理-1大纲命令主要功能应用场景 cut 按列/字符截取文本提取日志中的IP、用户名等

大纲

命令	主要功能	应用场景
`cut`	按列/字符截取文本	提取日志中的IP、用户名等特定字段
`wc`	统计行/单词/字符数	统计日志条目数、代码行数
`sort`	文本排序	按大小/时间排序文件、去重后排序
`uniq`	消除重复行	统计重复日志条目、去重历史记录
`tr`	字符转换/删除	去除控制字符、统一大小写
`grep`	文本搜索	查找含关键词的配置行、过滤日志
`sed`	文本替换/编辑	批量修改配置文件、删除空行

cut

参数

参数	作用	适用场景	关键说明
-b	按字节截取	单字节字符（英文、数字）	中文等多字节字符可能乱码，需配合-n参数
-c	按字符截取	多字节字符（中文、日文）	每个字符计为1位，不受编码影响
-d	指定分隔符	非制表符分隔的文本（如CSV）	默认分隔符为制表符（Tab）
-f	提取指定字段	结构化数据（如/etc/passwd）	需与-d配合使用，指定字段编号

示例
- 按字节切割（-b）
  
  示例命令含义
  cut -b 1-5 test.txt 提取每行第1-5字节
  cut -b 1,3,5 test.txt 提取每行第1、3、5字节
  cut -b 3- test.txt 提取每行第3字节至结尾
- 按字符切割（-c）
  
  示例命令含义
  cut -c 2 test2.txt 提取每行第2个字符（如中文“期”）
  cut -c 1-3 test2.txt 提取每行第1-3个字符
- 按字段切割（-f与-d）
  
  示例命令含义
  cut -d ':' -f 1 /etc/passwd 以“:”为分隔符，提取第1字段（用户名）
  cut -d ',' -f 1,3 data.csv 以“,”为分隔符，提取第1和第3字段
使用关键说明
- 范围表示法：支持N-M（区间）、N-（从N到结尾）、-M（从开头到M）。
- 多字节字符：处理中文等时优先用-c，避免-b导致乱码。
- 必选选项：必须指定-b、-c或-f之一，否则命令报错。

示例命令	含义
`cut -b 1-5 test.txt`	提取每行第1-5字节
`cut -b 1,3,5 test.txt`	提取每行第1、3、5字节
`cut -b 3- test.txt`	提取每行第3字节至结尾

示例命令	含义
`cut -c 2 test2.txt`	提取每行第2个字符（如中文“期”）
`cut -c 1-3 test2.txt`	提取每行第1-3个字符

示例命令	含义
`cut -d ':' -f 1 /etc/passwd`	以“:”为分隔符，提取第1字段（用户名）
`cut -d ',' -f 1,3 data.csv`	以“,”为分隔符，提取第1和第3字段

wc

参数

参数	全称/含义	功能描述
-l	--lines	仅统计行数（按换行符`\n`计数，空行也计入）
-w	--words	统计单词数（以空格、制表符或换行符分隔的字符串）
-c	--bytes	统计字节数（含空格、换行符等所有字符，中文可能占多字节）
-m	--chars	统计字符数（与`-c`的区别：在UTF-8编码中，中文算1个字符，而`-c`可能显示3字节）
-L	--max-line-length	显示文件中最长行的长度（不含换行符）

示例

默认统计（行数+字数+字节数+文件名）

[root@localhost opt]# wc passwd 
 20  28 879 passwd
输出格式：行数 字数 字节数 文件名

单独统计行数

[root@localhost opt]# wc -l passwd 
20 passwd
# 若文件最后一行无换行符，行数统计可能少1

统计多个文件并显示总计

[root@localhost opt]# wc -l regular_express.txt  passwd 
  25 regular_express.txt
  20 passwd
  45 总用量

结合管道符读取标准输入

[root@localhost opt]# cat passwd |wc -w
28
# 统计文本内容的单词数

sort

参数

参数	描述	示例场景
`-b`	忽略行首空白字符	排序含前置空格的数字行
`-d`	按字典顺序排序，忽略非字母数字字符	仅对单词排序，忽略符号
`-f`	忽略大小写	混合大小写字母排序
`-g`	按数值大小排序	处理含小数的数字排序
`-i`	不区分大小写	字母排序时统一大小写
`-M`	按月份排序（如 Jan < Feb）	日期字符串按月份排序
`-n`	按整数数值排序	纯数字列排序
`-r`	逆序排序	从大到小排列
`-t'X'`	指定列分隔符（如 `:` 或 `,`）	非空格分隔的文件
`-kM,N`	指定排序列范围（如 `-k4,4`）	精确控制单列排序
`-u`	去重并排序	保留唯一行

示例

基础排序

[root@localhost opt]# sort passwd 
adm:x:3:4:adm:/var/adm:/sbin/nologin
bin:x:1:1:bin:/bin:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
# 对单词文件按字母 a-z 排序

[root@localhost opt]# sort  -n 1.txt 
# 随便搞一个乱序以数字打头的文档， 从小到大排序

按列与分隔符

# sort -t'分隔符' -k起始列,结束列n -f 文件名 
-t'分隔符'：指定列之间的分隔符（如空格、冒号等）。
-k4,4n：对第4列按数值排序（,4表示仅针对第4列，n表示数值比较）。
-f：忽略所有列的大小写差异。

[root@localhost opt]# sort  -t':' -k4,4n  passwd 
sync:x:5:0:sync:/sbin:/bin/sync
....
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
...
games:x:12:100:games:/usr/games:/sbin/nologin
....
polkitd:x:999:998:User for polkitd:/:/sbin/nologin

倒序排列

[root@localhost opt]# sort  -t':' -k4,4r  passwd 
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
...
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
....
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
...
root:x:0:0:root:/root:/bin/bash
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
sync:x:5:0:sync:/sbin:/bin/sync

# 注意看，这里其实只是按实际第一个字母9-1倒序

按数字大小排序

[root@localhost opt]# sort  -t':' -k4,4nr  passwd 
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
。。。
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
。。。
daemon:x:2:2:daemon:/sbin:/sbin/nologin
。。。
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
sync:x:5:0:sync:/sbin:/bin/sync

忽略行首空格
```
[root@localhost opt]# sort -b data.txt
```
按月份排序
```
sort -M dates.txt
```

uniq

参数

参数	作用	示例命令	示例说明
`-c`	显示每行重复次数	`uniq -c test.txt`	输出每行内容及其连续重复次数
`-d`	仅显示重复行（每组一次）	`uniq -d test.txt`	仅输出至少重复一次的行，每组重复行显示一条
`-u`	仅显示唯一行	`uniq -u test.txt`	仅输出未重复的行
`-D`	显示所有重复行	`uniq -D test.txt`	输出所有重复行（包括重复出现的每行）
`-f N`	忽略前 N 个字段	`uniq -f 1 test.txt`	比较时跳过每行前 1 个字段（字段以空格/制表符分隔）
`-s N`	忽略前 N 个字符	`uniq -s 3 test.txt`	比较时跳过每行前 3 个字符

示例

统计重复行次数并排序

[root@localhost opt]#  sort test.txt  | uniq -c | sort -nr
# 先排序文本，再统计重复次数，最后按次数倒序排列

筛选日志中重复的 IP 地址

grep "GET" access.log  | cut -d ' ' -f 1 | sort | uniq -c | sort -nr | head -5
# 提取日志中的 IP 列，去重计数后取访问次数前 5 的IP

tr

参数

参数	全称	功能描述
`-c`	`--complement`	对SET1取补集，转换不在SET1中的字符
`-d`	`--delete`	删除输入中属于SET1的字符
`-s`	`--squeeze-repeats`	压缩连续重复的字符为单个（仅保留第一个）
`-t`	`--truncate-set1`	将SET1截断为与SET2相同长度，避免多余字符替换

示例

大小写转换

[root@localhost opt]# echo "HELLO" | tr '[a-z]' '[A-Z]'
HELLO
[root@localhost opt]# echo "HELLO" | tr '[:lower:]' '[:upper:]'
HELLO

删除指定字符

[root@localhost opt]# echo "helo 1230934435" | tr -d "0-9"
helo

压缩重复字符

[root@localhost opt]# echo "helo 1124433335" | tr -s "0-9"
helo 12435
# 11244335 --> 12435  多次重复的会被删除

替换字符

[root@localhost opt]# echo "helo 1124433335" | tr '112' 'abc'
helo bbc4433335

删除Windows文件换行符

[root@localhost opt]# tr -d '\r' < windows.txt  > linux.txt
# 删除文件中的`^M`控制字符（Windows换行符）

路径格式化

# 将: 替换为换行
[root@localhost opt]# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

[root@localhost opt]# echo "$PATH" | tr ":" "\n"
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/root/bin

grep

使用基本正则表达式定义的模式来过滤文本的命令， grep -[acinvE ］ ’ 搜索内容串’ filename

参数

选项	说明
-m	# 匹配#次后停止
-v	反向选择，即找到没有搜索字符串的行
-i	忽略字符大小写
-n	显示匹配的行号,输出行号
-c	表示计算找到符合行的次数
-o	仅显示匹配到的字符串
-q	静默模式，不输出任何信息
-A	匹配之后再显示多少行后加两行显示，后面加参加数字
-B	匹配之后再显示多少行前加两行显示，后面加参加数字
-C	前后各加两行显示，后面加参加数字
-e	实现多个选项间的逻辑or关系 grep –e ‘cat ’ -e ‘dog’ file
-w	匹配整个单词
-E	使grep 支持扩展正则表达式，作用等同于egrep

正则表达式

匹配字符串

通配符	说明
*	匹配零个或多个字符
?	匹配任何单个字符
~	当前用户家目录
~mage	用户mage家目录
~+	当前工作目录
~-	前一个工作目录
[0-9]	匹配数字范围
[a-z]	字母
[A-Z]	字母
[wang]	匹配列表中的任何的一个字符
[^wang]	匹配列表中的所有字符以外的字符

字符串-2

字符匹配	说明
.	匹配任意单个字符
[]	匹配指定范围内的任意单个字符看通配符
[^]	匹配指定范围外的任意单个字符
[:digit:]	任意数字，相当于0-9
[:lower:]	任意小写字母
[:upper:]	任意大写字母
[:alpha:]	任意大小写字母
[:alnum:]	任意数字或字母
[:blank:]	水平空白字符（空格和制表符）
[:space:]	水平或垂直空白字符
[:punct:]	标点符号
[:print:]	可打印字符
[:cntrl:]	控制（非打印）字符
[:graph:]	可打印的非空白字符

匹配次数：用在要指定次数的字符后面，用于指定前面的字符要出现的次数

匹配符	说明
*	匹配前面的字符任意次，包括0次贪婪模式：尽可能长的匹配
.*	任意长度的任意字符
?	匹配其前面的字符0或1次
+	匹配其前面的字符至少1次
{n}	匹配前面的字符n次
{m,n}	匹配前面的字符至少m次，至多n次
{,n}	匹配前面的字符至多n次
{n,}	匹配前面的字符至少n次

位置锚定

位置锚定	定位出现的位置
^	行首锚定，用于模式的最左侧
$	行尾锚定，用于模式的最右侧
^PATTERN$	用于模式匹配整行
^$	空行
^[[:space:]]*$	空白行
< 或 \b	词首锚定，用于单词模式的左侧
> 或 \b	词尾锚定，用于单词模式的右侧
<PATTERN>	匹配整个单词

示例-练练手吧，十分钟练完练下手感吧

11.2.3 基礎正規表示法練習

搜索有 the 的行

[root@localhost opt]# grep "the" regular_express.txt 
I can't finish the test.^M
the symbol '*' is represented as start.
You are the best is mean you are the no. 1.

# 加上 -n 参数则会打印所在的行数

搜索没有the 的行并输出行号，加上 -v 参数

[root@localhost opt]# grep -vn "the" regular_express.txt

搜索没有 the 的行输出行号并忽略大小写

[root@localhost opt]# grep -ivn "the" regular_express.txt

使用[] 搜索集合字符，取反，即[包含在这里面的]

[root@localhost opt]# grep "t[ae]st" regular_express.txt 
I can't finish the test.^M    <-- test
Oh! The soup taste good.^M    <-- tast

如果想要搜尋到有 oo 的字元時

[dmtsai@study ~]$ grep -n 'oo' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

如果我不想要 oo 前面有 g 的話呢？此時，可以利用在集合字元的反向選擇 [^] 來達成

[dmtsai@study ~]$ grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!
# 18 行裡面雖然出現了我們所不要的項目 (goo) 但是由於有需要的項目 (too) ， 因此，是符合字串搜尋的喔！
# 至於第 19 行，同樣的，因為 goooooogle 裡面的 oo 前面可能是 o ，例如： go(ooo)oogle ，所以，這一行也是符合需求的

假設我 oo 前面不想要有小写字母

[root@localhost opt]# grep -n "[^a-z]oo" regular_express.txt 
3:Football game is not use feet only.   <-- Football

如果只匹配数字

[root@localhost opt]# grep [0-9] regular_express.txt 
However, this dress is about $ 3183 dollars.^M
You are the best is mean you are the no. 1.

如果只匹配数字，或纯单词

# 小写单词 [[:lower:]] == [a-z]
[root@localhost opt]# grep [[:lower:]] regular_express.txt 
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.

# 纯数字 [[:digit:]] == [1-9]
[root@localhost opt]# grep [[:digit:]] regular_express.txt 
However, this dress is about $ 3183 dollars.^M
You are the best is mean you are the no. 1.

定位小写字母打头的，

# 在[]外就是锚定头部，在里[^xxx] 就是取反
[root@localhost opt]# grep -n "^[a-z]" regular_express.txt 
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

我不想要开头是英文字母

[root@localhost opt]# grep -n "^[^a-zA-Z]" regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird

查找以.结尾的

[root@localhost opt]# grep -n '.$' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.

5-8其实也是.结尾，我们cat -An 查看之后发现多了个^M这种是windows下的格式CRLF，而linux是LF，如果是脚本定义成这种就会提示\r的异常 sed -i "s@\r@@gi" 脚本

[root@localhost opt]# cat -An regular_express.txt | head -n 10 | tail -n 6
     5	However, this dress is about $ 3183 dollars.^M$
     6	GNU is free air not free beer.^M$
     7	Her hair is very beauty.^M$
     8	I can't finish the test.^M$
     9	Oh! The soup taste good.^M$
    10	motorcycle is cheap than car.$

限定范围 {n}

]# grep -n 'g{1,2}' regular_express.txt   # 限定g,可以1-2次同时出现

]# grep -n 'g{2}' regular_express.txt     # 限定g,同时得要有2个，否则将不满足

]# grep -n 'go{2}g' regular_express.txt  # 限定o必须出现2次,以g结尾

查找即不是空格也不是#号打头的行

[root@localhost opt]# egrep -nv '^$|^#' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.

正则匹配 * 、 . 、?

]# grep -n 'go*' regular_express.txt     # * 匹配前面的 o* 零次或无穷次

]# grep -n 'go?' regular_express.txt    # ? 前面前面的 0? 零次或1次

]# grep -n 'g.' regular_express.txt      # . 匹配任意一个字符串

]# grep -n 'go*g' regular_express.txt    # 匹配前面的 o 零-无穷 g 结尾

锚定词首，词尾

]# grep -n '\bg' regular_express.txt   # 以g开头的词
    16:The world <Happy> is the same with "glad".
    18:google is the best tools for search keyword.

]# grep -n 'g\b' regular_express.txt   # 以g结尾的词
    14:The gd software is a library for drafting programs.
    17:I like dog.

分组

贪婪模式，匹配最长的字符

]# grep -n '([a-z])\1' regular_express.txt  
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    ..
    19:goooooogle yes!
	22:gxxxxxxxxxx

# 找一个查看一下，\1就是最少出来2次
[root@localhost opt]# grep -n '(p)\1' regular_express.txt
2:apple is my favorite food.
16:The world <Happy> is the same with "glad".

非贪婪模式，前面的字符需要出现2次然后以g结尾

]# grep -n '([a-z])\1g' regular_express.txt 
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    23:gyyyyyyyg

^ 与 -v

# 普通正则
	]# grep -v "^$" regular_express.txt | grep -v "^#"
	
# 扩展正则 -e 或 egrep  反向查找
	]# egrep -v '^$|^#' regular_express.txt

sed

Sed 是Linux 平台下的轻量级流编辑器一般用于处理文本文件，我们可以用sed的方式来处理日常工作中大多数文档，它与vim最大的区别：它不需要像vim一样打开文件，可以直接在脚本中操作文档

语法及参数

sed (-nefr) [nl,n2 ）动作

选项：sed (-nefr)

符号	说明
-n	安静模式，只有经过sed 处理过的行才显示出来，其他不显示
-e	表示直接在命令行模式上进行sed 操作。默认选项，不用写
-f	将sed 的操作写在一个文件里，使用－f filename 就可以按照内容进行sed 操作了
-r	表示使sed 支持扩展正则表达式
-i	直接修改读取的文件内容，而不是输出到终端
n1,n2	选择要进行处理的行，如10,20 表示在10 ～ 20 行之间处理不一定需要

sed动作参数， 即 sed 选项 '行参数' 文本

符号	说明
a	表示添加，后接字符串，添加到当前行的下一行
c	表示替换，后接字符串，用它替换nl 到n2 之间的行
d	表示删除符合模式的行，它的语法为sed '/regexp / d’，／／之间是正则表达式，模式在d 前面， d 后面一般不接任何内容
i	表示插入，后接字符串，添加到当前行的上一行
p	表示打印，打印某个选择的数据，通常与－ n （安静模式）一起使用。
s	表示搜索，还可以替换，类似与vim 里的搜索替换功能

示例-1

测试文本

[root@localhost opt]# cat input.txt 
### start
000011112222
000011112222
### end

在1111之前添加AAA

# "s/源/修改/"  这样会直接替换源文本， 加个 &就是往前加
[root@localhost opt]# sed -i "s/1111/AA/" input.txt 
[root@localhost opt]# cat input.txt 
0000AAAAA2222
0000AAAAA2222

[root@localhost opt]# sed -i  's/1111/AAA&/' /tmp/input.txt
[root@localhost opt]# cat input.txt 
### start
0000AAA11112222
0000AAA11112222
### end

在1111之后添加BBB

# &在哪个位置就修改在哪
[root@localhost opt]# sed -i "s/1111/&BB/" input.txt 
[root@localhost opt]# cat input.txt 
### start
0000AAA1111BBB2222
0000AAA1111BBB2222
### end

删除指定范围内的内容

[root@localhost opt]# cat a.txt 
### 321
123123
321312
### 123
# 删除包含 /### start/,/### end/d 这个段里的所有内容， 整个文本就清空了
[root@localhost opt]# sed -i "/### start/,/### end/d" input.txt

示例-2

显示passwd 内容，将2～5 行删除后显示，不会修改原文件

[root@localhost opt]# cat -n passwd | sed '2,5d'
     1	root:x:0:0:root:/root:/bin/bash
     6	sync:x:5:0:sync:/sbin:/bin/sync

在第2 行的后一行加上“ hello, world ”字符串

[root@localhost opt]# cat -n passwd  | sed '2a hello,world'
     1	root:x:0:0:root:/root:/bin/bash
     2	bin:x:1:1:bin:/bin:/sbin/nologin
hello,world

在第2 行的后一行加上两行字，如 ”first line" 和 "second line"

# 注意： 如果需要换行 分号一定是最后结束的字符串后面
[root@localhost opt]# cat -n passwd  | sed '2a  first line \nsecond line'
     1	root:x:0:0:root:/root:/bin/bash
     2	bin:x:1:1:bin:/bin:/sbin/nologin
first line 
second line

将1 行的内容替换成“ xxx ”

[root@localhost opt]# cat -n passwd  | sed '1c xxx' 
xxx
     2	bin:x:1:1:bin:/bin:/sbin/nologin

修改多行为一行

[root@localhost opt]# cat -n passwd  | sed '1,2c xxx' 
xxx
     3	daemon:x:2:2:daemon:/sbin:/sbin/nologin

只显示1 ～2 行，注意 p 与 -n 的配合使用

# 安静模式，只有经过sed 处理过的行才显示出来，其他不显示
]# cat -n passwd  | sed -n '1,2p'
     1	root:x:0:0:root:/root:/bin/bash
     2	bin:x:1:1:bin:/bin:/sbin/nologin

使用 ip 和sed 组合列出特定网卡的IP

[root@localhost opt]# ip a sh ens36 | grep "inet[[:space:]][1-9]{1,3}" | sed 's/brd.*//g' | sed 's/[[:space:]]*inet//g'
 192.168.189.131/24 
 
]# ifconfig ens36 | grep 'inet[[:space:]]' | sed 's/^.*inet//g' | sed 's/netmask.*$//g'
 192.168.1.61

在 /etc/man.config 中，将有man 的设置取出，去掉以 # 号开头的内容和空行

]# cat /etc/man_db.conf | grep "MAN" | sed 's/#.*$//g' | sed '/^$/d'
]# cat /etc/man_db.conf | grep "MAN" | sed 's/#.*$//g' | grep -v '^$'
]# cat /etc/man_db.conf | grep "^MAN"   # 直接以MAN开头 

# 直接用grep就行
]# grep -e "^$|^#" /etc/man_db.conf

将文件的1-10行行首加上#号, 然后在删除

]# sed -i '1,10s/^/#/' file.txt   # 行首添加 # 
]# sed -i '1,10s/^#//' file.txt   # 将行首的# 删除

如果没有g ，则表示从行的左端开始匹配，每一行第一个与之匹配的会被换掉，
如果有g，则表示每一行所有与之匹配的都会被换掉。