格式化文本输出的命令:
- nl —Number lines.
- fold —Wrap each line to a specified length.
- fmt —A simple text formatter.
- pr —Format text for printing.
- printf —Format and print data.
- groff —A document formatting system.
Simple Formatting Tools
nl—Number Lines
nl 命令用来对行编号,类似于命令 cat -n:
[me@linuxbox ~]$ nl distros.txt | head
1 SUSE 10.2 12/07/2006
2 Fedora 10 11/25/2008
3 SUSE 11.0 06/19/2008
4 Ubuntu 8.04 04/24/2008
5 Fedora 8 11/08/2007
6 SUSE 10.3 10/04/2007
7 Ubuntu 6.10 10/26/2006
8 Fedora 7 05/31/2007
9 Ubuntu 7.10 10/18/2007
10 Ubuntu 7.04 04/19/2007
像cat一样,nl可以接受多个文件名作为命令行参数或标准输入。但是 nl 支持更加复杂的编号。
nl 编号时候支持 logical pages 的概念。允许 nl 编号时候重置数字序列。使用选项可以将起始编号设置为特定值,并在一定程度上设置其格式。logical page 进一步分解为 header,body 和 footer 三个部分。在每个部分,行号都会重置或者指定不同的风格。如果 制定了多个文件,nl 将他们视为单独的文本流。文本流中的每部分由下面修饰符号表示:
Table 21-1: nl Markup
| Markup | Meaning |
|---|---|
| ::: | Start of logical-page header |
| :: | Start of logical-page body |
| : | Start of logical-page footer |
nl 处理上表中修饰元素的时候,将会在文本流中将其删除。
Table 21-2: Common nl Options
| Option | Meaning |
|---|---|
| -b style | Set body numbering to style , where style is one of the following: ·a Number all lines. · t Number only non-blank lines. This is the default. · n None. · pregexp Number only lines matching basic regular expression regexp . |
| -f style | Set footer numbering to style . Default is n (none). |
| -h style | Set header numbering to style . Default is n (none). |
| -i number | Set page numbering increment to number . Default is 1. |
| -n format | Set numbering format to format , where format is one of thefollowing: · ln Left justified, without leading zeros. · rn Right justified, without leading zeros. This is the default. · rz Right justified, with leading zeros. |
| -p | Do not reset page numbering at the beginning of each logical page. |
| -s string | Add string to the end of each line number to create a separator.Default is a single tab character. |
| -v number | Set first line number of each logical page to number . Default is 1. |
| -w width | Set width of the line number field to width . Default is 6. |
可以在报告的脚本中加入 nl 来生成报告,将 distros-nl.sed 脚本中内容修改:
# sed script to produce Linux distributions report
1 i\
\\:\\:\\:\
\
Linux Distributions Report\
\
Name Ver. Released\
---- ---- --------\
\\:\\:
s/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\)$/\3-\1-\2/
$ a\
\\:\
\
End Of Report
上面命令中修饰符号使用了两个反斜杠,原因是 sed 通常将其解析为转义字符。执行下面命令输出报告:
[me@linuxbox ~]$ sort -k 1,1 -k 2n distros.txt | sed -f distros-nl.sed | nl
Linux Distributions Report
Name Ver. Released
---- ---- --------
1 Fedora 5 2006-03-20
2 Fedora 6 2006-10-24
3 Fedora 7 2007-05-31
4 Fedora 8 2007-11-08
5 Fedora 9 2008-05-13
6 Fedora 10 2008-11-25
7 SUSE 10.1 2006-05-11
8 SUSE 10.2 2006-12-07
9 SUSE 10.3 2007-10-04
10 SUSE 11.0 2008-06-19
11 Ubuntu 6.06 2006-06-01
12 Ubuntu 6.10 2006-10-26
13 Ubuntu 7.04 2007-04-19
14 Ubuntu 7.10 2007-10-18
15 Ubuntu 8.04 2008-04-24
16 Ubuntu 8.10 2008-10-30
End Of Report
fold—Wrap Each Line to a Specified Length
[me@linuxbox ~]$ echo "The quick brown fox jumped over the lazy dog." | fold
-w 12
The quick br
own fox jump
ed over the
lazy dog.
echo 发送的文本根据 -w 后面指定的字符数量分割为不同的段。上面命令我们指定的行宽为12个字符。 如果未指定数量,则默认值为80个字符。 添加-s选项将导致折线在达到字符数量之前在最后一个可用空格处结束该行:
[me@linuxbox ~]$ echo "The quick brown fox jumped over the lazy dog." | fold
-w 12 -s
The quick
brown fox
jumped over
the lazy
dog.
fmt—A Simple Text Formatter
fmt 接受文件或者标准输入在文本流上执行格式化段落。它可以填充并连接文本中的行,同时保留空白行和缩进。
查看 fmt 命令信息 www.gnu.org/software/co…
fmt reads from the specified file arguments (or standard input if none are given), and writes to standard output.
By default, blank lines, spaces between words, and indentation are preserved in the output; successive input lines with different indentation are not joined; tabs are expanded on input and introduced on output.
fmt prefers breaking lines at the end of a sentence, and tries to avoid line breaks after the first word of a sentence or before the last word of a sentence. A sentence break is defined as either the end of a paragraph or a word ending in any of ‘.?!’, followed by two spaces or end of line, ignoring any intervening parentheses or quotes. Like TeX, fmt reads entire “paragraphs” before choosing line breaks; the algorithm is a variant of that given by Donald E. Knuth and Michael F. Plass in “Breaking Paragraphs Into Lines”, Software—Practice & Experience 11, 11 (November 1981), 1119–1184.
将其拷贝到文件 fmt-info.txt 文件中,现在想要将其格式更改为 50 字符一列,可以使用下面命令:
[me@linuxbox ~]$ fmt -w 50 fmt-info.txt | head
fmt reads from the specified file arguments
(or standard input if none are given), and
writes to standard output.
By default, blank lines, spaces between
words, and indentation are preserved in the
output; successive input lines with different
indentation are not joined; tabs are expanded
on input and introduced on output.
fmt 还提供选项来保留缩进:
[me@linuxbox ~]$ fmt -cw 50 fmt-info.txt
fmt reads from the specified file arguments
(or standard input if none are given), and writes
o standard output.
By default, blank lines, spaces between words,
and indentation are preserved in the output;
successive input lines with different indentation
are not joined; tabs are expanded on input and
ntroduced on output.
fmt prefers breaking lines at the end of a
sentence, and tries to avoid line breaks after
Table 21-3: fmt Options
| Option | Description |
|---|---|
| -c | Operate in crown margin mode. This preserves the indentationof the first two lines of a paragraph. Subsequent lines are alignedwith the indentation of the second line. |
| -p string | Format only those lines beginning with the prefix string . Afterformatting, the contents of string are prefixed to each reformat-ted line. This option can be used to format text in source codecomments. For example, any programming language or config-uration file that uses a # character to delineate a comment couldbe formatted by specifying -p '# ' so that only the commentswill be formatted. See the example below. |
| -s | Split-only mode. In this mode, lines will be split only to fit thespecified column width. Short lines will not be joined to filllines. This mode is useful when formatting text, such as code,where joining is not desired. |
| -u | Perform uniform spacing. This will apply traditional “typewriter-style” formatting to the text. This means a single space betweenwords and two spaces between sentences. This mode is usefulfor removing justification, that is, forced alignment to both theleft and right margins. |
| -w width | Format text to fit within a column width characters wide. Thedefault is 75 characters. Note: fmt actually formats lines slightlyshorter than the specified width to allow for line balancing. |
上面的 -p 选项非常有趣,为了演示起作用,首先创建文件 fmt-code.txt:
[me@linuxbox ~]$ cat > fmt-code.txt
# This file contains code with comments.
# This line is a comment.
# Followed by another comment line.
# And another.
This, on the other hand, is a line of code.
And another line of code.
And another.
文件中前四行为 # 开头的注释,使用 fmt 可以格式化注释:
[me@linuxbox ~]$ fmt -w 50 -p '# ' fmt-code.txt
# This file contains code with comments.
# This line is a comment. Followed by another
# comment line. And another.
This, on the other hand, is a line of code.
And another line of code.
And another.
相邻的注释行已合并,因为空行不是以 指定的前缀开头因此保留。
pr—Format Text for Printing
pr 用来为文本标记页数:
[me@linuxbox ~]$ pr -l 15 -w 65 distros.txt
2012-12-11 18:27 distros.txt Page 1
SUSE 10.2 12/07/2006
Fedora 10 11/25/2008
SUSE 11.0 06/19/2008
Ubuntu 8.04 04/24/2008
Fedora 8 11/08/2007
2012-12-11 18:27 distros.txt Page 2
SUSE 10.3 10/04/2007
Ubuntu 6.10 10/26/2006
Fedora 7 05/31/2007
Ubuntu 7.10 10/18/2007
Ubuntu 7.04 04/19/2007
上面命令使用 -l (页的长度)以及 -w(页宽)定义 页 为 65 字符宽 以及 15 行长。
printf—Format and Print Data
printf 不接受标准输入。但是广泛使用。
printf(print formatted)起源于 C语言 现在已经被多种语言所实现,包括 shell,事实上 ,printf 已经内置于在 bash 中。
printf "format" arguments
例如:
[me@linuxbox ~]$ printf "I formatted the string: %s\n" foo
I formatted the string: foo
其中 数据类型所对应的转换说明符如下表:
Table 21-4: Common printf Data-Type Specifiers
| Specifier | Description |
|---|---|
| d | Format a number as a signed decimal integer. |
| o | Format an integer as an octal number. |
| s | Format a string. |
| x | Format an integer as a hexadecimal number using lowercase a–f where needed. |
| X | Same as x , but use uppercase letters. |
| % | Print a literal % symbol (i.e., specify “%%”). |
例如 使用不同转换说明符打印 380:
[me@linuxbox ~]$ printf "%d, %f, %o, %s, %x, %X\n" 380 380 380 380 380 380
380, 380.000000, 574, 380, 17c, 17C
因为在文本中有六个占位符,因此在后面要加上六个与之对应的参数。
可以在转换说明符增加可选组件来调整输出,格式如下:
%[flags][width][.precision]conversion_specification
Table 21-5: printf Conversion-Specification Components
| Component | Description |
|---|---|
| flags | There are five different flags: · # Use the alternate format for output. This varies by datatype. For o (octal number) conversion, the output is prefixedwith 0 (zero). For x and X (hexadecimal number) conversions,the output is prefixed with 0x or 0X respectively. · 0 (zero) Pad the output with zeros. This means that the fieldwill be filled with leading zeros, as in 000380 . · - (dash) Left-align the output. By default, printf right-alignsoutput. · (space) Produce a leading space for positive numbers. · + (plus sign) Sign positive numbers. By default, printf signsonly negative numbers. |
| width | A number specifying the minimum field width |
| .precision | For floating-point numbers, specify the number of digits ofprecision to be output after the decimal point. For stringconversion, precision specifies the number of characters tooutput. |
下表为一些例子:
Table 21-6: print Conversion Specification Examples
| Argument | Format | Result | Notes |
|---|---|---|---|
| 380 | "%d" | 380 | Simple formatting of an integer |
| 380 | "%#x" | 0x17c | Integer formatted as a hexa-decimal number using thealternate format flag |
| 380 | "%05d" | 00380 | Integer formatted with leadingzeros (padding) and a minimumfield width of five characters |
| 380 | "%05.5f" | 380.00000 | Number formatted as a floating-point number with padding and5 decimal places of precision.Since the specified minimumfield width (5) is less than theactual width of the formattednumber, the padding has noeffect. |
| 380 | "%010.5f" | 0380.00000 | Increasing the minimum fieldwidth to 10 makes the paddingvisible. |
| 380 | "%+d" | +380 | The + flag signs a positivenumber. |
| 380 | "%-d" | 380 | The - flag left-aligns theformatting. |
| abcdefghijk | "%5s" | abcedfghijk | A string is formatted with aminimum field width. |
| abcdefghijk | "%.5s" | abcde | By applying precision to a string, it is truncated. |
printf 大多数用在脚本中用来格式化表头数据,例如,输出一些被制表符分割的字段:
[me@linuxbox ~]$ printf "%s\t%s\t%s\n" str1 str2 str3
str1 str2 str3
输出格式整齐的数字:
[me@linuxbox ~]$ printf "Line: %05d %15.3f Result: %+15d\n" 1071 3.14156295 32589
Line: 01071 3.142 Result: +32589