Bash-编程高级教程-三-Bash 编程高级教程（三）十、编写并调试无错误的脚本从未编写过错误程序的程序员是某人

Bash 编程高级教程（三）

原文：Pro Bash Programming

协议：CC BY-NC-SA 4.0

十、编写并调试无错误的脚本

从未编写过错误程序的程序员是某人想象中的虚构人物。bug 是程序员存在的祸根。它们从简单的打字错误到糟糕的编码再到错误的逻辑。有些很容易修复；其他人可能需要几个小时的狩猎。

一个极端是语法错误，它阻止脚本完成或运行。这些可能涉及到丢失的字符:空格、括号或大括号、引号。它可能是输入错误的命令或变量名。可能是遗漏的关键词，比如elif后面的then。

另一个极端是逻辑上的错误。它可能在你应该从 0 开始的时候从 1 开始计数，或者它可能在应该是-ge(大于或等于)的时候使用-gt(大于)。可能是公式有误(华氏到摄氏(F – 32) * 1.8不是吗？)或者在一条数据记录中使用了错误的字段(我以为 shell 是/etc/passwd中的字段 5！).

在这两个极端之间，常见的错误包括试图对错误类型的数据进行操作(要么是程序本身提供了错误的数据，要么是外部源提供了错误的数据)，以及在进行下一步之前未能检查命令是否成功。

本章着眼于让程序做它应该做的事情的各种技术，包括用于检查和跟踪脚本进度的各种 shell 选项，有策略地放置调试指令，以及最重要的，从一开始就防止错误。

防胜于治

避免引入 bug 比消除 bug 要好得多。没有办法保证没有 bug 的脚本，但是一些预防措施可以大大降低 bug 的频率。让你的代码易于阅读会有所帮助。记录它也是如此，这样你就知道它是做什么的，它期望什么，它产生什么结果，等等。

构建您的程序

术语结构化编程 适用于各种编程范例，但它们都涉及模块化编程——将问题分解成可管理的部分。在使用 shell 开发大型应用时，这意味着要么使用函数、单独的脚本，要么两者结合使用。

即使是一个短程序也能从某种结构中受益；它应该包含离散的部分:

评论
变量的初始化
函数定义
运行时配置(解析选项、读取配置文件等)
健全性检查(所有值都合理吗？)
过程信息(计算、切片和切割线、I/O 等)

使用这个大纲，一个简短但完整的脚本的所有组件将在下面的部分中呈现。提供的脚本中有错误；将使用各种调试技术找到并纠正这些问题。

注释应包括关于脚本的元数据，包括描述、如何调用命令或函数的概要、作者、创建日期、最后修订日期、版本号、选项以及成功运行命令所需的任何其他信息，如下例所示:

#:       Title: wfe - List words ending with PATTERN
#:    Synopsis: wfe [-c|-h|-v] REGEX
#:        Date: 2009-04-13
#:     Version: 1.0
#:      Author: Chris F.A. Johnson
#:     Options: -c - Include compound words
#:              -h - Print usage information
#:              -v - Print version number

#:用于引入这些注释，以便grep '^#:' wfe提取所有元数据。

变量的初始化

首先，定义一些包含元数据的变量。与前面的注释有些重复，但是后面可能需要这些变量:

## Script metadata
scriptname=${0##*/}
description="List words ending with REGEX"
usage="$scriptname [-c|-h|-v] REGEX"
date_of_creation=2009-04-13
version=1.0
author="Chris F.A. Johnson"

然后定义默认值、文件位置和该脚本所需的其他信息:

## File locations
dict=$HOME
wordfile=$dict/singlewords
conpoundfile=$dict/Compounds

## Default is not to show compound words
compounds=

## Regular expression supplied on the command line
pattern=$1

功能定义

有三个函数是原作者脚本的一部分(除了快速和肮脏的一次性)。分别是die、usage、version；它们可能包含在脚本本身或脚本提供的函数库中。他们还没有被包括在这本书的剧本里；这将是不必要的重复。这些例子有:

## Function definitions
die() #@ DESCRIPTION: print error message and exit with supplied return code
{     #@ USAGE: die STATUS [MESSAGE]
  error=$1
  shift
  [ -n "$*" ] printf "%s\n" "$*" >&2
  exit "$error"
}

usage() #@ DESCRIPTION: print usage information
{       #@ USAGE: usage
        #@ REQUIRES: variable defined: $scriptname
  printf "%s - %s\n" "$scriptname" "$description"
  printf "USAGE: %s\n" "$usage"
}

version() #@ DESCRIPTION: print version information
{         #@ USAGE: version
          #@ REQUIRES: variables defined: $scriptname, $author and $version
  printf "%s version %s\n" "$scriptname" "$version"
  printf "by %s, %d\n" "$author"  "${date_of_creation%%-*"
}

任何其他函数将紧随这些通用函数之后。

运行时配置和选项

第十二章将深入介绍运行时配置以及可以使用的不同方法。大多数时候，您需要做的就是解析命令行选项:

## parse command-line options, -c, -h, and -v
while getopts chv var
do
  case $var in
    c) compounds=$compoundfile ;;
    h) usage; exit ;;
    v) version; exit ;;
  esac
done
shift $(( $OPTIND - 1 ))

过程信息

正如短脚本中经常出现的情况，脚本的实际工作相对较短；设置参数和检查数据的有效性占据了程序的大部分:

## Search $wordfile and $compounds if it is defined
{
  cat "$wordfile"
  if [ -n "$compounds" ]
  then
    cut -f1 "$compounds"
  fi
} | grep -i ".$regex$" |
 sort -fu ## Case-insensitive sort; remove duplicates

这里，cat是必要的，因为第二个文件的位置存储在compounds变量中，不能作为参数给grep，因为它不仅仅是一个单词列表。该文件有三个制表符分隔的字段:带有空格和其他非字母字符的短语被删除，下面的字母大写，原始短语，以及它们在神秘的纵横字谜中出现的长度:

corkScrew       cork-screw      (4-5)
groundCrew      ground crew     (6,4)
haveAScrewLoose have a screw loose      (4,1,5,5)

如果它是一个简单的单词列表，就像singlewords一样，管道可以被一个简单的命令代替:

grep -i ".$regex$" "$wordfile" ${compounds:+"$compounds"}

grep命令在命令行给出的文件中搜索匹配正则表达式的行。-i选项告诉grep将大写字母和小写字母视为等同。

记录您的代码

这本书的第一作者克里斯·约翰森提到，

直到最近，我自己的文档习惯还有很多不足之处。在我的脚本目录中，有超过 900 个程序是在过去 15 年左右编写的。有 90 多个函数库。大约有 20 个脚本被 cron 调用，还有十几个被这些脚本调用。我经常使用的脚本大概有 100 个左右，“经常”可以是从一天几次到一年一两次。

其余的是正在开发的脚本，被放弃的脚本，没有成功的脚本，以及我不再知道它们有什么用的脚本。我不知道它们有什么用，因为我没有包括任何文档，甚至没有一行描述。我不知道它们是否有用，也不知道我是否真的不需要那个剧本，或者关于它们的任何事情。

对他们中的许多人来说，我可以从他们的名字看出他们是做什么的。在其他情况下，代码很简单，目的也很明显。但是还有很多剧本的目的我不知道。当我再次需要这个任务时，我可能会重复其中的一些。当我这么做的时候，他们至少会有最少的文件。

许多开发人员都是如此，尤其是代码片段。有一些软件可以帮助你组织你的代码片段，但是没有什么比文档和添加注释、待办事项等更好的了。

一致地格式化您的代码

漂亮的打印代码有各种各样的模型，有些人非常大声地为他们的特殊风格辩护。我有自己的偏好(你会从本书的脚本中注意到这一点)，但一致性比每层缩进两个、四个或六个空格更重要。有压痕比压痕的数量更重要。我会说，两个空格(这是我使用的)是最少的，八个是最少的，如果不是太多的话。

同样，你有没有then和if在一条线上也没关系。这两个都可以:

if [ "$var" = "yes" ]; then
  echo "Proceeding"
fi

if [ "$var" = "yes" ]
then
  echo "Proceeding"
fi

其他循环和函数定义也是如此。我更喜欢这种格式:

funcname()
{
  : body here
}

其他人喜欢这种格式:

funcname() {
  : body here
}

只要格式一致，结构清晰，使用哪种格式都没关系。

知识创新系统原则

简单性有助于理解程序的意图，但重要的不仅仅是让代码尽可能短。当有人在下面发布以下问题时，我的第一个想法是，“这将是一个复杂的正则表达式。”第二，我不会使用正则表达式:

我需要一个正则表达式来用美国符号表示金融数量。它们有一个前导美元符号和一个可选的星号字符串、一个十进制数字字符串和一个由小数点(.)和两位十进制数字。小数点左边的字符串可以是一个零。否则，它不能以零开始。如果小数点左边有三位以上的数字，三个一组的数字必须用逗号隔开。例如:$ * * 2345.67。

我会将任务分解成几个独立的步骤，并分别对每个步骤进行编码。例如，第一项检查是:

amount='$**2,345.67'
case $amount in
  \$[*0-9]*) ;; ## OK (dollar sign followed by asterisks or digits), do nothing
  *) exit 1 ;;
esac

当测试完成时，将会有比正则表达式多得多的代码，但是如果需求改变，将会更容易理解和改变。

分组命令

与其重定向几行中的每一行，不如用大括号将它们分组，并使用单个重定向。最近在一个论坛上看到这个:

echo "user odad odd" > ftp.txt
echo "prompt" >> ftp.txt
echo "cd $i" >> ftp.txt
echo "ls -ltr" >> ftp.txt
echo "bye" >> ftp.txt

我建议您这样做:

{
  echo "user odad odd"
  echo "prompt"
  echo "cd $i"
  echo "ls -ltr"
  echo "bye"
} > ftp.txt

边走边测试

与其把所有的调试工作留到最后，不如把它作为开发程序过程中不可或缺的一部分。每个部分都应该在编写时进行测试。作为一个例子，让我们看看我作为国际象棋程序的一部分编写的一个函数。不，它不是一个下棋程序(尽管当它完成时可能是)；在 Shell 中，这将是极其缓慢的。这是一套准备教学材料的功能。

它需要能够将一种形式的国际象棋符号转换为另一种形式，并列出棋盘上任何棋子的所有可能的移动。它需要能够判断一项变动是否合法，并在变动后创建一个新的董事会职位。在最基本的层面上，它必须能够将标准代数符号(SAN )中的正方形转换为它的数字等级和文件。这就是这个函数的作用。

命名方块的 SAN 格式是代表文件的小写字母和代表等级的数字。文件是从棋盘的白方到黑方的一排排方块。行列是从左到右的一排排正方形。白棋左角的方块是a1；那个穿黑色的是h8。为了计算可能的移动，这些需要转换为普通士兵:a1转换为rank=1和file=1；h8变成了rank=8和file=8。

这是一个简单的函数，但是它演示了如何测试一个函数。该函数接收一个正方形的名称作为参数，并将等级和文件存储在这些变量中。如果方块无效，它将 rank 和 file 都设置为0，并返回一个错误:

split_square() #@ DESCRIPTION: convert SAN square to numeric rank and file
{              #@ USAGE: split_square SAN-SQUARE
  local square=$1
  rank=${square#?}
  case $square in
    a[1-8]) file=1;; ## Conversion of file to number
    b[1-8]) file=2;; ## and checking that the rank is
    c[1-8]) file=3;; ## a valid number are done in a
    d[1-8]) file=4;; ## single look-up
    e[1-8]) file=5;;
    f[1-8]) file=6;; ## If the rank is not valid,
    g[1-8]) file=7;; ## it falls through to the default
    h[1-8]) file=8;;
    *) file=0
       rank=0
       return 1      ## Not a valid square
       ;;
  esac
  return 0
}

为了测试这个函数，传递给它所有可能的合法方块以及一些不合法的方块。它打印方块的名称、文件和等级编号:

test_split_square()
{
  local f r
  for f in {a..i}
  do
    for r in {1..9}
    do
      split_square "$f$r"
      printf "$f$r %d-%d  " "$file" "$rank"
    done
    echo
  done
}

运行测试时，输出如下:

a1 1-1  a2 1-2  a3 1-3  a4 1-4  a5 1-5  a6 1-6  a7 1-7  a8 1-8  a9 0-0
b1 2-1  b2 2-2  b3 2-3  b4 2-4  b5 2-5  b6 2-6  b7 2-7  b8 2-8  b9 0-0
c1 3-1  c2 3-2  c3 3-3  c4 3-4  c5 3-5  c6 3-6  c7 3-7  c8 3-8  c9 0-0
d1 4-1  d2 4-2  d3 4-3  d4 4-4  d5 4-5  d6 4-6  d7 4-7  d8 4-8  d9 0-0
e1 5-1  e2 5-2  e3 5-3  e4 5-4  e5 5-5  e6 5-6  e7 5-7  e8 5-8  e9 0-0
f1 6-1  f2 6-2  f3 6-3  f4 6-4  f5 6-5  f6 6-6  f7 6-7  f8 6-8  f9 0-0
g1 7-1  g2 7-2  g3 7-3  g4 7-4  g5 7-5  g6 7-6  g7 7-7  g8 7-8  g9 0-0
h1 8-1  h2 8-2  h3 8-3  h4 8-4  h5 8-5  h6 8-6  h7 8-7  h8 8-8  h9 0-0
i1 0-0  i2 0-0  i3 0-0  i4 0-0  i5 0-0  i6 0-0  i7 0-0  i8 0-0  i9 0-0

所有带有普通 0-0 的方格都是无效的。

调试脚本

在前面一节一节介绍的wfe脚本中，有一些错误。让我们运行这个脚本，看看会发生什么。剧本在$HOME/bin 里，?? 在你的PATH里，因此它可以单以它的名字来称呼。然而，在此之前，最好先用-n选项检查脚本。这将在不实际执行代码的情况下测试任何语法错误:

$ bash -n wfe
/home/jayant/bin/wfe-sh: wfe: line 70: unexpected EOF while looking for matching '"'
/home/jayant/bin/wfe-sh: wfe: line 72: syntax error: unexpected end of file

错误消息指出缺少引号(")。它已经到达文件的末尾，但没有找到它。这意味着它可能在文件的任何地方丢失。在快速(或不那么快速)浏览文件后，不清楚它应该在哪里。

当这种情况发生时，我开始从文件底部删除一些部分，直到错误消失。我去掉最后一节；它还在那里。我删除了解析选项，错误并没有消失。我去掉最后一个函数定义，version() ，错误就没了。错误一定在函数中；它在哪里？

version() #@ DESCRIPTION: print script's version information
{         #@ USAGE: version
          #@ REQUIRES: variables defined: $scriptname, $author and $version
  printf "%s version %s\n" "$scriptname" "$version"
  printf "by %s, %d\n" "$author"  "${date_of_creation%%-*"
}

没有不匹配的引号，所以一定是缺少了其他的结束字符导致了这个问题。快速浏览后，我发现最后一个变量展开缺少了一个右括号。固定了，就变成了"${date_of_creation%%-*}"。用-n再检查一次，它就获得了一份健康证明。现在是运行它的时候了:

$ wfe
bash: /home/jayant/bin/wfe: Permission denied

哎呀！我们忘记了让脚本可执行。这通常不会发生在主脚本中；对于被另一个脚本调用的脚本，这种情况更为常见。请更改权限，然后重试:

$ chmod +x /home/jayant/bin/wfe
$ wfe
cat: /home/jayant/singlewords: No such file or directory

singlewords和Compounds两个文件下载了吗？如果有，你把它们放在哪里了？在脚本中，它们被声明在$dict，定义为$HOME。如果你把它们放在别的地方，比如放在一个名为words的子目录中，修改脚本中的那一行。让我们制作一个目录，words，并把它们放在那里:

mkdir $HOME/words &&
cd $HOME/words &&
wget http://cfaj.freeshell.org/wordfinder/singlewords &&
wget http://cfaj.freeshell.org/wordfinder/Compounds

在脚本中，更改dict的赋值以反映这些文件的实际位置:

dict=$HOME/words

让我们再试一次:

$ wfe
a
aa
Aachen
aalii
aardvark
*.... 183,758 words skipped ....*
zymotic
zymotically
zymurgy
Zyrian
zythum

我们忘了告诉程序我们在找什么。脚本应该检查是否提供了参数，但是我们忘记了包含健全性检查部分。在搜索完成之前(在第shift $(( $OPTIND - 1 ))行之后)添加:

## Check that user entered a search term
if [ -z "$pattern" ]
then
  {
    echo "Search term missing"
    usage
  } >&2
  exit 1
fi

现在，再试一次:

$ wfe
Search term missing
wfe - List words ending with REGEX
USAGE: wfe [-c|-h|-v] REGEX

这样更好。现在让我们真正地寻找一些单词:

$ wfe drow
a
aa
Aachen
aalii
aardvark
*.... 183,758 words skipped ....*
zymotic
zymotically
zymurgy
Zyrian
zythum

还是有问题。

最有用的调试工具之一是set - x ，它在执行时打印每个命令及其扩展参数。每一行前面都有PS4变量的值。PS4的默认值为“+”；我们将把它改为包含正在执行的行号。将这两行放在脚本的最后一部分之前:

export PS4='+ $LINENO: ' ## single quotes prevent $LINENO being expanded immediately
set -x

再试一次:

$ wfe drow
++ 77: cat /home/jayant/singlewords
++ 82: grep -i '.$'
++ 83: sort -fu
++ 78: '[' -n '' ']' ## Ctrl-C pressed to stop entire word list being printed

在第 82 行，您看到命令行中输入的模式丢失了。那是怎么发生的？应该是grep -i '.drow$'。脚本中的第 82 行应该如下所示:

} | grep -i ".$regex$" |

regex的值怎么了？注释掉set -x ，在脚本顶部添加set -u选项。该选项将未设置的变量在展开时视为错误。再次运行脚本，检查是否设置了regex:

$ wfe drow
/home/jayant/bin/wfe: line 84: regex: unbound variable

为什么regex未设置？看看前面的脚本，看看哪个变量用于保存命令行参数。哦！是pattern，不是regex。你必须保持一致，而regex是对其内容更好的描述，我们就用那个吧。将pattern的所有实例改为regex。你也应该在顶部的评论中这样做。现在试试看:

$ wfe drow
windrow

成功！现在用-c选项将复合词和短语添加到组合中:

$ wfe -c drow
/home/jayant/bin/wfe: line 58: compoundfile: unbound variable

又来了！当然，我们在文件位置部分分配了Compounds文件。看一看；是的，它在 23 线附近。等一下，有个错别字:conpoundfile=$dict/Compounds 。将con改为com。祈祷好运:

$ wfe -c drow
$

什么事？什么都没有？连windrow都没有？是时候set -x了，看看是怎么回事。取消注释该行，并再次播放它:

$ wfe -c drow
++ 79: cat /home/jayant/singlewords
++ 84: grep -i '.-c$'
++ 85: sort -fu
++ 80: '[' -n /home/jayant/Compounds ']'
++ 82: cut -f1 /home/jayant/Compounds

至少这很容易理解。我们在处理选项之前分配了regex，它截取了第一个参数，即-c选项。将任务移动到getopts部分之后，特别是shift命令之后。(你可能会想注释掉set -x。):

shift $(( $OPTIND - 1 ))
## Regular expression supplied on the command line
regex=$1

还有什么问题吗？

$ wfe -c drow
skidRow
windrow

看起来不错。对于一个小脚本来说，这可能看起来工作量很大，但是讲的时间似乎比做的时间长，尤其是一旦你习惯了这样做——或者更好的是，从一开始就把它做好。

摘要

错误是不可避免的，但是只要小心，大多数错误是可以避免的。当它们出现时，有 shell 选项可以帮助跟踪问题。

练习

if [ $var=x ]怎么了？应该是什么？为什么它会给出这样的结果呢？
编写一个函数valid_square()，如果它的唯一参数是一个有效的 SAN 棋盘方格，则返回成功，否则返回失败。写一个函数来测试它是否工作。

十一、命令行编程

这本书是关于用 shell 编程，而不是在命令行使用它。您在这里找不到关于编辑命令行、创建命令提示符(PS1变量)或从您的交互历史中检索命令的信息。这一章是关于在命令行中比在其他脚本中更有用的脚本。

本章介绍的许多脚本都是 shell 函数。其中一些必须如此，因为它们改变了环境。其他的是函数，因为它们经常被使用，而且用起来更快。其他的既有函数也有独立的脚本。

操作目录堆栈

cd命令会记住之前的工作目录，cd -会返回。还有另一个命令，将改变目录，并记住无限数量的目录:pushd。目录存储在一个数组中，DIRSTACK。为了返回到前一个目录，popd将顶部的条目从DIRSTACK中取出，并使其成为当前目录。我使用了两个函数使处理DIRSTACK更容易，为了完整起见，我在这里添加了第三个函数。

注意本章中创建的一些函数的名称类似于 Bash 中可用的命令。这样做的原因是使用您现有的 shell 脚本，而不需要对它们进行任何更改，并且仍然可以利用一些附加的功能。

激光唱片

cd函数替换同名内置命令。该函数使用内置命令pushd改变目录并将新目录存储在DIRSTACK上。如果没有给出目录，pushd使用$HOME。如果更改目录失败，cd会打印一条错误消息，并且函数返回一个失败的退出代码(清单 11-1 )。

清单 11-1 。cd，改变目录，在目录栈上保存位置

cd() #@ Change directory, storing new directory on DIRSTACK
{
  local dir error          ## variables for directory and return code

  while :                  ## ignore all options
  do
    case $1 in
      --) break ;;
      -*) shift ;;
      *) break ;;
    esac
  done

  dir=$1

  if [ -n "$dir" ]         ## if a $dir is not empty
  then
    pushd "$dir"           ## change directory
  else
    pushd "$HOME"          ## go HOME if nothing on the command line
  fi 2>/dev/null           ## error message should come from cd, not pushd

  error=$?     ## store pushd's exit code

  if [ $error -ne 0 ]      ## failed, print error message
  then
    builtin cd "$dir"      ## let the builtin cd provide the error message
  fi
  return "$error"          ## leave with pushd's exit code
} > /dev/null

标准输出被重定向到位桶，因为pushd打印DIRSTACK的内容，唯一的其他输出被发送到标准错误(>&2)。

注意标准命令(如cd)的替代应该接受原始命令接受的任何内容。在cd的情况下，选项-L和-P被接受，即使它们被忽略。也就是说，我确实有时会忽略选项，甚至没有为它们做好准备，尤其是如果它们是我从未使用过的选项。

螺纹中径

这里的pd函数是为了完整起见(清单 11-2 )。是懒人对popd的称呼方式；我不用它。

清单 11-2 。pd，用popd返回上一个目录

pd ()
{
    popd
} >/dev/null ### for the same reason as cd

连续地层（倾角仪）

我不用pd的原因不是因为我不懒。远非如此，但我更喜欢保持DIRSTACK不变，这样我就可以在目录之间来回移动。出于这个原因，我使用一个菜单来显示DIRSTACK中的所有目录。

cdm函数将输入字段分隔符(IFS)设置为一个换行符(NL或LF)，以确保dirs内置命令的输出在分词后将文件名保持在一起(清单 11-3 )。包含换行符的文件名仍然会引起问题；带有空格的名字令人讨厌，但是带有换行符的名字令人厌恶。

该函数遍历DIRSTACK ( for dir in $(dirs -l -p))中的名字，将每个名字添加到一个数组item，除非它已经存在。然后这个数组被用作menu函数的参数(下面讨论)，它必须在cdm被使用之前获得。

DIRS 内置命令

dirs内置命令列出了DIRSTACK数组中的目录。默认情况下，它在一行中列出它们，用波浪符号表示HOME的值。-l选项将~扩展到$HOME，并且-p打印目录，每行一个。

清单 11-3 。cdm，从已经访问过的目录菜单中选择新目录

cdm() #@ select new directory from a menu of those already visited
{
  local dir IFS=$'\n' item
  for dir in $(dirs -l -p)             ## loop through diretories in DIRSTACK[@]
  do
    [ "$dir" = "$PWD" ] && continue    ## skip current directory
    case ${item[*]} in
      *"$dir:"*) ;;                    ## $dir already in array; do nothing
      *) item+=( "$dir:cd '$dir'" ) ;; ## add $dir to array
    esac
  done
  menu "${item[@]}" Quit:              ## pass array to menu function
}

运行时，菜单如下所示:

$ cdm

    1\. /public/music/magnatune.com
    2\. /public/video
    3\. /home/jayant
    4\. /home/jayant/tmp/qwe rty uio p
    5\. /home/jayant/tmp
    6\. Quit

 (1 to 6) ==>

菜单

menu函数的调用语法来自9menu，它是 Plan 9 操作系统的一部分。每个参数包含两个用冒号分隔的字段:要显示的项目和要执行的命令。如果参数中没有冒号，它将同时用作显示和命令:

$ menu who date "df:df ."

    1\. who
    2\. date
    3\. df

 (1 to 3) ==> 3
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda5             48070472  43616892   2011704  96% /home
$ menu who date "df: df ."

    1\. who
    2\. date
    3\. df

 (1 to 3) ==> 1
jayant    tty8         Jun 18 14:00 (:1) 
jayant    tty2         Jun 21 18:10

一个for循环编号并打印菜单；read得到响应；一个case语句检查响应中的退出字符q、Q或0。最后间接展开检索选中的项，进一步展开提取命令，eval执行:eval "${!num#*:}" ( 清单 11-4 )。

清单 11-4 。菜单、打印菜单和执行相关命令

menu()
{
  local IFS=$' \t\n'                        ## Use default setting of IFS
  local num n=1 opt item cmd
  echo

  ## Loop though the command-line arguments
  for item
  do
    printf "  %3d. %s\n" "$n" "${item%%:*}"
    n=$(( $n + 1 ))
  done
  echo

  ## If there are fewer than 10 items, set option to accept key without ENTER
  if [ $# -lt 10 ]
  then
    opt=-sn1
  else
    opt=
  fi
  read -p " (1 to $#) ==> " $opt num         ## Get response from user

  ## Check that user entry is valid
  case $num in
    [qQ0] | "" ) return ;;                   ## q, Q or 0 or "" exits
    *[!0-9]* | 0*)                           ## invalid entry
       printf "\aInvalid response: %s\n" "$num" >&2
       return 1
       ;;
  esac
  echo

  if [ "$num" -le "$#" ]   ## Check that number is <= to the number of menu items
  then
    eval "${!num#*:}"      ## Execute it using indirect expansion
  else
    printf "\aInvalid response: %s\n" "$num" >&2
    return 1
  fi
}

文件系统功能

这些功能各不相同，从懒惰(给较长的命令起一个短名字)到给标准命令(cp 和mv))增加功能。它们列出、复制或移动文件或创建目录。

POSIX 规范不需要单个字母的命令，只有一个在大多数 Unixes 上可以找到的命令:w，它显示谁登录了以及他们在做什么。我定义了许多单字母函数:

a:列出当前播放的音乐曲目
c:清除屏幕(有时比^L更快或更容易)
d:日期"+%A, %-d %B %Y %-I:%M:%S %P (%H:%M:%S)"
k:相当于man -k，或apropos
t:对于 Amiga 和 MS-DOS 命令type，调用less
v和V:分别降低和提高音量
x:注销

还有一个是我最常用的，它通过less传输一个长文件清单，如清单 11-5 所示。

清单 11-5 。l，以长格式列出文件，通过less管道传输

l()
{
  ls -lA "$@" | less        ## the -A option is specific to GNU and *BSD versions
}

LSR

我最常用的命令是l、cd、xx.sh、cdm、lsr；xx.sh是用于一次性脚本的文件。我不断在顶部添加新的；lsr显示最近的文件(或使用-o选项，显示最旧的文件)。默认设置是显示十个文件，但可以通过-n选项进行更改。

清单 11-6 中的脚本使用-t(或-tr)选项来ls并将结果传送给head。

清单 11-6 。lsr，列出最近修改的文件

num=10                                           ## number of files to print
short=0                                          ## set to 1 for short listing
timestyle='--time-style="+ %d-%b-%Y %H:%M:%S "'  ## GNU-specific time format

opts=Aadn:os

while getopts $opts opt
do
  case $opt in
      a|A|d) ls_opts="$ls_opts -$opt" ;;  ## options passed to ls
      n) num=$OPTARG ;;                   ## number of files to display
      o) ls_opts="$ls_opts -r" ;;         ## show oldest files, not newest
      s) short=$(( $short + 1 )) ;;
  esac
done
shift $(( $OPTIND - 1 ))

case $short in
    0) ls_opts="$ls_opts -l -t" ;;        ## long listing, use -l
    *) ls_opts="$ls_opts -t" ;;           ## short listing, do not use -l
esac

ls $ls_opts $timestyle "$@" | {
    read                                  ## In bash, the same as: IFS= read -r REPLY
    case $line in
        total*) ;;                        ## do not display the 'total' line
        *) printf "%s\n" "$REPLY" ;;
    esac
    cat
} | head -n$num

cp，mv

在我的桌面切换到 GNU/Linux 之前，我使用的是 Amiga。如果没有给出目的地，它的copy命令会将文件复制到当前目录。这个函数给出了与cp ( 清单 11-7 )相同的能力。-b选项是 GNU 特有的，所以如果您使用的是不同版本的cp，请删除它。

清单 11-7 。cp，复制，如果没有给出目的地，使用当前目录

cp()
{
  local final
  if [ $# -eq 1 ]                  ## Only one arg,
  then
    command cp -b "$1" .           ## so copy it to the current directory
  else
    final=${!#}
    if [ -d "$final" ]             ## if last arg is a directory
    then
      command cp -b "$@"           ## copy all the files into it
    else
      command cp -b "$@" .         ## otherwise, copy to the current directory
    fi
  fi
}

除了在cp出现的地方都有mv之外，mv函数是相同的。

使用md函数(清单 11-8 )懒惰是当今的主流。它调用带有-p选项的mkdir来创建中间目录，如果它们不存在的话。使用-c选项，md创建目录(如果它还不存在)，然后cd s 进入其中。因为有了-p选项，如果目录存在，就不会产生错误。

清单 11-8 。md，创建一个新目录和中间目录，并可选地将cd放入其中

md() { #@ create new directory, including intermediate directories if necessary
  case $1 in
     -c) mkdir -p "$2" && cd "$2" ;;
     *) mkdir -p "$@" ;;
  esac
}

杂项功能

我经常使用下面两个函数，但是它们不属于任何类别。

pr1

我将pr1函数作为一个函数和一个独立的脚本(清单 11-9 )。它将每个参数打印在单独的一行上。默认情况下，它将长度限制为终端中的列数，并根据需要截断行。

有两个选项，-w和-W。前者消除了截断，所以行总是完整打印，必要时换行。后者指定截断行的宽度。

清单 11-9 。pr1，函数打印其参数一行一行

pr1() #@ Print arguments one to a line
{
  case $1 in
    -w) pr_w=                   ## width specification modifier
        shift
        ;;
    -W) pr_w=${2}
        shift 2
        ;;
    -W*) pr_w=${1#??}
         shift
         ;;
    *) pr_w=-.${COLUMNS:-80}    ## default to number of columns in window
       ;;
  esac
  printf "%${pr_w}s\n" "$@"
 }

脚本版本(清单 11-10 )使用getopts；我没有在函数中使用它们，因为我希望它是 POSIX 兼容的。

清单 11-10 。pr1，脚本将其参数一行一行打印出来

while getopts wW: opt
do
  case $opt in
    w) w=
       shift
       ;;
    W) w=$OPTARG ;;
    *) w=-.${COLUMNS:-80} ;;
  esac
done
shift $(( $OPTIND - 1 ))

printf "%${w}s\n" "$@"

计算

Bash缺乏对小数进行算术运算的能力，所以我编写了这个函数(清单 11-11 )来使用awk来完成这项脏工作。请注意，shell 的特殊字符必须进行转义，或者在命令行上加上引号。这尤其适用于乘法符号*。

清单 11-11 。calc，打印算术表达式的结果

calc() #@ Perform arithmetic, including decimal fractions
{
  local result=$(awk 'BEGIN { OFMT="%f"; print '"$*"'; exit}')
  case $result in
    *.*0) result=${result%"${result##*[!0]}"} ;;
  esac
  printf "%s\n" "$result"
}

case语句删除小数点后的尾随零。

管理手册页

我使用了三个与手册页相关的函数。第一个搜索模式或字符串的手册页，第二个查找 POSIX 手册页，第三个相当于man -k。

sman

sman函数调用手册页并搜索给定的字符串。它假设less是默认的寻呼机(清单 11-12 )。

清单 11-12 。调用手册页并搜索模式

sman() #@ USAGE: sman command search_pattern
{
  LESS="$LESS${2:+ +/$2}" man "$1"
}

你的

当我想检查给定命令的可移植性，或者更常见的是，检查 POSIX 指定了哪些选项时，我使用sus。它在本地存储了一份 POSIX 手册页的副本，这样就不需要在后续查询中获取它了(清单 11-13 )。

清单 11-13 。在 POSIX 规范中查找手册页

sus()
{
    local html_file=/usr/share/sus/$1.html    ## adjust to taste
    local dir=9699919799
    local sus_dir=http://www.opengroup.org/onlinepubs/$dir/utilities/
    [ -f "$html_file" ] ||
      lynx -source  $sus_dir${1##*/}.html > $html_file ##>/dev/null 2>&1
    lynx -dump -nolist $html_file | ${PAGER:-less}
}

这里的lynx是一个文本模式的网络浏览器。虽然通常用于交互访问 Web，但是-source和-dump指令也可以在脚本中使用。

k功能保存apropos或man -k的所有输入。它实际上做得更多一点。它过滤结果，以便只显示用户命令(来自手册页的第一部分)。系统和内核函数以及文件规范等等，没有显示出来(清单 11-14 )。

清单 11-14 。k，列出其简短描述包括搜索字符串的命令

k() #@ USAGE: k string
{
    man -k "$@" | grep '(1'
}

游戏

没有游戏的命令行是什么？无聊，就是这样！我用 shell 写了很多游戏。它们包括yahtzee ( 图 11-1 )，一种使用五颗骰子的游戏；maxit ( 图 11-2 )，基于 Commodore 64 的一款算术游戏；当然还有tic-tac-toe ( 图 11-3 )。所有这些游戏都太大了，无法在本书中包含它们的脚本，但是它们的一些部分(例如yahtzee骰子)将在后面的章节中演示。我在这里可以包括的一个游戏是fifteen谜题。

图 11-1 。yahtzee游戏，玩家试图获得分数、满堂彩或三、四或五个同类的分数

图 11-2 。一种游戏，一名玩家从一行中选择，另一名玩家从一列中选择

图 11-3 。tic-tac-toe无处不在的游戏

十五个谜题

fifteen拼图由一个框架中的 15 个编号的滑动瓦片组成；目标是按升序排列它们，如下所示:

        +----+----+----+----+
        |    |    |    |    |
        |  1 |  2 |  3 |  4 |
        |    |    |    |    |
        +----+----+----+----+
        |    |    |    |    |
        |  5 |  6 |  7 |  8 |
        |    |    |    |    |
        +----+----+----+----+
        |    |    |    |    |
        |  9 | 10 | 11 | 12 |
        |    |    |    |    |
        +----+----+----+----+
        |    |    |    |    |
        | 13 | 14 | 15 |    |
        |    |    |    |    |
        +----+----+----+----+

在这个脚本(清单 11-15 )中，磁贴是用光标键移动的。

清单 11-15 。fifteen，按升序摆放瓷砖

########################################
## Meta data
########################################

scriptname=${0##*/}
description="The Fifteen Puzzle"
author="Chris F.A. Johnson"
created=2009-06-20

########################################
## Variables
########################################

board=( {1..15} "" )         ## The basic board array
target=( "${board[@]}" )     ## A copy for comparison (the target)
empty=15                     ## The empty square
last=0                       ## The last move made
A=0 B=1 C=2 D=3              ## Indices into array of possible moves
topleft='\e[0;0H'            ## Move cursor to top left corner of window
nocursor='\e[?25l'           ## Make cursor invisible
normal=\e[0m\e[?12l\e[?25h   ## Resume normal operation

## Board layout is a printf format string
## At its most basic, it could be a simple:

fmt="$nocursor$topleft

     %2s  %2s  %2s  %2s

     %2s  %2s  %2s  %2s

     %2s  %2s  %2s  %2s

     %2s  %2s  %2s  %2s

"

## I prefer this ASCII board
fmt="\e[?25l\e[0;0H\n
\t+----+----+----+----+
\t|    |    |    |    |
\t| %2s | %2s | %2s | %2s |
\t|    |    |    |    |
\t+----+----+----+----+
\t|    |    |    |    |
\t| %2s | %2s | %2s | %2s |
\t|    |    |    |    |
\t+----+----+----+----+
\t|    |    |    |    |
\t| %2s | %2s | %2s | %2s |
\t|    |    |    |    |
\t+----+----+----+----+
\t|    |    |    |    |
\t| %2s | %2s | %2s | %2s |
\t|    |    |    |    |
\t+----+----+----+----+\n\n"

########################################
###  Functions
########################################

print_board() #@ What the name says
{
  printf "$fmt" "${board[@]}"
}

borders() #@ List squares bordering on the empty square
{
  ## Calculate x/y co-ordinates of the empty square
  local x=$(( ${empty:=0} % 4 ))  y=$(( $empty / 4 ))

  ## The array, bordering, has 4 elements, corresponding to the 4 directions
  ## If a move in any direction would be off the board, that element is empty
  ##
  unset bordering     ## clear array before setting it
  [ $y -lt 3 ] && bordering[$A]=$(( $empty + 4 ))
  [ $y -gt 0 ] && bordering[$B]=$(( $empty - 4 ))
  [ $x -gt 0 ] && bordering[$C]=$(( $empty - 1 ))
  [ $x -lt 3 ] && bordering[$D]=$(( $empty + 1 ))
}

check() #@ Check whether puzzle has been solved
{
  ## Compare current board with target
  if [ "${board[*]}" = "${target[*]}" ]
  then
    ## Puzzle is completed, print message and exit
    print_board
    printf "\a\tCompleted in %d moves\n\n"  "$moves"
    exit
  fi
}

move() #@ Move the square in $1
{
  movelist="$empty $movelist"    ## add current empty square to the move list
  moves=$(( $moves + 1 ))        ## increment move counter
  board[$empty]=${board[$1]}     ## put $1 into the current empty square
  board[$1]=""                   ## remove number from new empty square
  last=$empty                    ## .... and put it in old empty square
  empty=$1                       ## set new value for empty-square pointer
}

random_move() #@ Move one of the squares in the arguments
{
  ## The arguments to random_move are the squares that can be moved
  ## (as generated by the borders function)
  local sq
  while :
  do
    sq=$(( $RANDOM % $# + 1 ))
    sq=${!sq}
    [ $sq -ne ${last:-666} ] &&   ## do not undo last move
       break
  done
  move "$sq"
}

shuffle() #@ Mix up the board using legitimate moves (to ensure solvable puzzle)
{
  local n=0 max=$(( $RANDOM % 100 + 150 ))   ## number of moves to make
  while [ $(( n += 1 )) -lt $max ]
  do
    borders                                  ## generate list of possible moves
    random_move "${bordering[@]}"            ## move to one of them at random
  done
}

########################################
### End of functions
########################################

trap 'printf "$normal"' EXIT                 ## return terminal to normal state on exit

########################################
### Instructions and initialization
########################################

clear
print_board
echo
printf "\t%s\n" "$description" "by $author, ${created%%-*}" ""
printf "
 Use the cursor keys to move the tiles around.

 The game is finished when you return to the
 position shown above.

 Try to complete the puzzle in as few moves
 as possible.

        Press \e[1mENTER\e[0m to continue
"
shuffle                                    ## randomize board
moves=0                                    ## reset move counter
read -s                                    ## wait for user
clear                                      ## clear the screen

########################################
### Main loop
########################################

while :
do
  borders
  print_board
  printf "\t   %d move" "$moves"
  [ $moves -ne 1 ] && printf "s"
  check

  ## read a single character without waiting for <ENTER>
  read -sn1 -p $'        \e[K' key

  ## The cursor keys generate three characters: ESC, [ and A, B, C, or D;
  ## this loop will run three times for each press of a cursor key
  ## but will not do anything until it receives a letter
  ## from the cursor key (or entered directly with A etc.), or a 'q' to exit
  case $key in
    A) [ -n "${bordering[$A]}" ] && move "${bordering[$A]}" ;;
    B) [ -n "${bordering[$B]}" ] && move "${bordering[$B]}" ;;
    C) [ -n "${bordering[$C]}" ] && move "${bordering[$C]}" ;;
    D) [ -n "${bordering[$D]}" ] && move "${bordering[$D]}" ;;
    q) echo; break ;;
  esac
done

摘要

本章提供的脚本只是在命令行使用脚本的一小部分。在需要改变环境的地方(如在cd和cdm中)，脚本必须是 shell 函数。这些通常保存在$HOME/.bashrc或.bashrc提供的文件中。

甚至游戏也可以在不需要 GUI 界面的情况下编程。

练习

修改menu函数以从文件中接受其参数。
将pr1函数重写为prx,它将按照第八章中的pr4的方式运行，但是将接受任意列数的选项。
在fifteen游戏中添加一个getopts部分，允许用户在三种不同的棋盘格式中进行选择。写第三种格式。

十二、运行时配置

当我从三四个不同的 POP3 服务器下载电子邮件时，我不会对每个服务器使用不同的脚本。当我打开一个终端ssh连接到一台远程计算机(半打)时，每台计算机都有不同的背景颜色，我对每个连接都使用相同的脚本。为了将文件上传到我的网站(我负责六个网站)，我对所有的网站都使用相同的脚本。

运行脚本时，您可以通过多种方式配置脚本的行为。本章介绍七种方法:初始化变量、命令行选项和参数、菜单、问答对话、配置文件、一个脚本的多个名称以及环境变量。这些方法并不相互排斥；事实上，它们经常结合在一起。命令行选项可以告诉脚本使用不同的配置文件，或者为用户提供一个菜单。

定义变量

如果脚本的运行时需求很少改变，那么硬编码变量可能就是你所需要的全部配置。您可以在安装脚本时设置它们。当需要改变时，可以用文本编辑器快速改变参数。

清单 12-1 。初始化的默认变量示例

## File locations
dict=/usr/share/dict
wordfile=$dict/singlewords
compoundfile=$dict/Compounds

## Default is not to show compound words
compounds=no

如果变量需要经常改变，可以增加一个或多个其他方法。

命令行选项和参数

更改运行时行为的最常见方法是使用命令行选项。如清单 12-2 所示，前面定义的所有值都可以在命令行修改。

清单 12-2 。解析命令行选项

while getopts d:w:f:c var
do
  case "$var" in
    c) compounds=1 ;;
    d) dict=$OPTARG ;;
    w) wordfile=$OPTARG ;;
    f) compoundfile=$OPTARG ;;
  esac
done

菜单

对于一个不熟悉软件的用户来说，菜单是允许运行时改变的好方法。在清单 12-3 所示的菜单示例中，选项从 1 到 4 编号，q退出菜单。

清单 12-3 。通过菜单设置参数

while :  ## loop until user presses 'q'
do
  ## print menu
  printf "\n\n%s\n" "$bar"
  printf "  Dictionary parameters\n"
  printf "%s\n\n" "$bar"
  printf "  1\. Directory containing dictionary: %s\n" "$dict"
  printf "  2\. File containing word list: %s\n" "$wordfile"
  printf "  3\. File containing compound words and phrases: %s\n" "$compoundfile"
  printf "  4\. Include compound words and phrases in results? %s\n" "$compounds"
  printf "  q. %s\n" "Exit menu"
  printf "\n%s\n\n" "$bar"

  ## get user response
  read -sn1 -p "Select (1,2,3,4,q): " input
  echo

  ## interpret user response
  case $input in
    1) read -ep "Enter dictionary directory: " dict ;;
    2) read -ep "Enter word-list file: " wordfile ;;
    3) read -ep "Enter compound-word file: " compoundfile ;;
    4) [ "$compounds" = y ] && compounds=n || compounds=y ;;
    q) break ;;
    *) printf "\n\aInvalid selection: %c\n" "$input" >&2
    sleep 2
    ;;
  esac
done

问答对话

问答函数循环遍历所有参数，提示用户为每个参数输入一个值(清单 12-4 )。对于用户来说，这可能会变得很乏味，当没有缺省值时，当需要输入的参数很少时，或者当需要为新的配置文件输入值时，这可能是最好的选择。

清单 12-4 。通过问答设置变量

read -ep "Directory containing dictionary: " dict
read -ep "File containing word list: " wordfile
read -ep "File containing compound words and phrases: " compoundfile
read -sn1 -p "Include compound words and phrases in results (y/n)? " compounds
echo
read -ep "Save parameters (y/n)? " save
case $save in
  y|Y) read -ep "Enter path to configuration file: " configfile
   {
    printf '%-30s ## %s"\n' \
      "dict=$dict" "Directory containing dictionary" \
      "wordfile=$wordfile" "File containing word list" \
      "compoundfile=$compoundfile" "File containing compound words and phrases" \
      "Compounds" "$Compounds" "Include compound words and phrases in results?"
   } > "${configfile:-/dev/tty}"
esac

配置文件

配置文件可以使用任何格式，但是最简单的方法是让它们成为可以获得源代码的 shell 脚本。清单 12-5 所示的示例文件可以找到，但它也可以提供更多信息。

清单 12-5 。配置文件，words.cfg

dict=/usr/share/dict        ## directory containing dictionary files
wordfile=singlewords        ## file containing word list
compoundfile=Compounds      ## file containing compound words and phrases
compounds=no                ## include compound words and phrases in results?

可以使用以下两个命令中的任何一个来获取words.cfg文件:

. words.cfg
source words.cfg

除了寻找文件的来源，还可以用各种方式来解析它(清单 12-6 )。在bash-4.x 中，您可以将文件读入一个数组，并使用参数扩展提取变量和注释，扩展应用于数组的每个元素。

清单 12-6 。解析配置文件

IFS=$'\n'
file=words.cfg
settings=( $( < "$file") )         ## store file in array, 1 line per element
eval "${settings[@]%%#*}"          ## extract and execute the assignments
comments=( "${settings[@]#*## }" ) ## store comments in array

comments数组只包含注释，赋值可以用"${settings[@]%%#*}"从settings中提取:

$ printf "%s\n" "${comments[@]}"
directory containing dictionary files
file containing word list
file containing compound words and phrases
include compound words and phrases in results?

你也可以通过显示注释(清单 12-7 )来循环读取文件以设置变量并提供有关变量的信息。

清单 12-7 。解析带注释的配置文件

while read assignment x comment
do
  if [ -n "$assignment" ]
  then
    printf "%20s: %s\n" "${assignment#*=}"  "$comment"
    eval "$assignment"
  fi
done < "$file"

以下是结果:

     /usr/share/dict: directory containing dictionary files
         singlewords: file containing word list
           Compounds: file containing compound words and phrases
                   n: include compound words and phrases in results?

配置文件可以根据你的喜好变得复杂，但是解析它们更适合归入数据处理的范畴，这是第十三章的主题。

有几个名字的脚本

通过以不同的名称存储同一个文件，可以避免命令行选项和菜单。清单 12-8 中的脚本打开一个终端，并使用安全 Shell 连接到不同的远程计算机。终端的颜色、登录的 mac 和远程用户的名称都由脚本的名称决定。

清单 12-8 。bashful，通过ssh连接到远程计算机

scriptname=${0##*/}

## default colours
bg=#ffffcc     ## default background: pale yellow
fg=#000000     ## default foreground: black

user=bashful   ## default user name
term=xterm     ## default terminal emulator (I prefer rxvt)

case $scriptname in
  sleepy)
     bg=#ffffff
     user=sleepy
     host=sleepy.example.com
     ;;
  sneezy)
     fg=#aa0000
     bg=#ffeeee
     host=sneezy.example.org
     ;;
  grumpy)
     fg=#006600
     bg=#eeffee
     term=rxvt
     host=cfajohnson.example.com
     ;;
  dopey)
     host=127.0.0.1
     ;;
  *) echo "$scriptname: Unknown name" >&2
     exit 1
     ;;
esac

"$term" -fg "$fg" -bg "$bg" -e ssh -l "$user" "$host"

要为同一个文件创建多个名字，创建与ln ( 清单 12-9 )的链接。

清单 12-9 。制作到bashful脚本的多个链接

cd "$HOME/bin" &&
for name in sleepy sneezy grumpy dopey
do
  ln -s bashful "$name"           ## you can leave out the -s option if you like
done

环境变量

还可以使用变量将设置传递给程序。这些既可以导出，也可以在与命令相同的行上定义。在后一种情况下，只为该命令定义变量。

你可以通过检查变量值或者仅仅是它的存在来改变程序的行为。我最常使用这种技术来调整使用verbose的脚本的输出。这将是脚本中的一个典型行:

[ ${verbose:-0} -gt 0 ] && printf "%s\n" "Finished parsing options"

该脚本将按如下方式调用:

verbose=1 myscriptname

您可以在下面的脚本中看到一个示例。

现在都在一起

以下是我用来更新所有网站的程序。它在目录层次结构中找到新的或修改过的文件，将它们存储在 tarball 中，并上传到(通常)远程计算机上的网站。我在我使用的所有站点上都有 shell 访问权限，所以我可以使用一个安全的 shellssh来传输文件，并在站点上用tar对它们进行解包:

ssh -p "$port" -l "$user" "$host" \
      "cd \"$dest\" || exit;tar -xpzf -" < "$tarfile" &&
        touch "$syncfile"

我的所有网站都使用认证密钥(用ssh-keygen 创建)，因此不需要密码，脚本可以作为cron作业运行。

这个程序使用了前面提到的所有技术，除了多个名字。这比您通常在单个程序中使用的要多，但是这是一个很好的例子。

用户可以选择是否使用命令行选项、菜单、问答对话框或配置文件来调整设置，或者用户甚至可以使用默认值。命令行选项可用于所有设置:

-c configfile:从configfile读取设置
-h host:指定远程计算机的 URL 或 IP 地址
-p port:指定要使用的 SSH 端口
-d dest:指定远程主机上的目标目录
-u user:指定用户在远程计算机上的登录名
-a archivedir:指定存储归档文件的本地目录
-f syncfile:指定时间戳为截止点的文件
还有另外三个控制脚本本身的选项:
-t:仅测试，显示最终设置，不存档或上传
-m:向用户呈现菜单
-q:使用 Q &进行对话

在接下来的几节中，我们将一节一节地详细研究这个脚本。

注意这是一本关于专业 Bash 脚本以及使用脚本的方法的书。写剧本不一定是最好的解决方案。

还有几个不一定基于 Bash 脚本的选项，它们只是为了实现管理结果而创建的。有一个名为集群 SSH (开源)的perl脚本包装器，它允许你同时向多个服务器发送命令，并且是基于 GUI 的。还有一种叫傀儡，挺受欢迎的。

脚本信息

注意，参数扩展用于从$0中提取脚本名称，而不是外部命令basename ( 清单 12-10 )。

清单 12-10 。upload，将文件存档并上传到远程计算机

scriptname=${0##*/}
description="Archive new or modified files and upload to web site"
author="Chris F.A. Johnson"
version=1.0

默认配置

除了设置变量，还会创建一个包含变量名称及其描述的数组(清单 12-11 )。这由标签和提示的menu和qa(问题和答案)功能使用。

清单 12-11 。默认值和settings数组

## archive and upload settings
host=127.0.0.1                        ## Remote host (URL or IP address)
port=22                               ## SSH port
dest=work/upload                      ## Destination directory
user=jayant                           ## Login name on remote system
source=$HOME/public_html/oz-apps.com  ## Local directory to upload
archivedir=$HOME/work/webarchives     ## Directory to store archive files
syncfile=.sync                        ## File to touch with time of last upload

## array containing variables and their descriptions
varinfo=( "" ## Empty element to emulate 1-based array
  "host:Remote host (URL or IP address)"
  "port:SSH port"
  "dest:Destination directory"
  "user:Login name on remote system"
  "source:Local directory to upload"
  "archivedir:Directory to store archive files"
  "syncfile:File to touch with time of last upload"
)

## These may be changed by command-line options
menu=0          ## do not print a menu
qa=0            ## do not use question and answer
test=0          ## 0 = upload for real; 1 = don't archive/upload, show settings
configfile=     ## if defined, the file will be sourced
configdir=$HOME/.config  ## default location for configuration files
sleepytime=2    ## delay in seconds after printing messages

## Bar to print across top and bottom of menu (and possibly elsewhere)
bar=================================================================
bar=$bar$bar$bar$bar   ## make long enough for any terminal window
menuwidth=${COLUMNS:-80}

屏幕变量

这些变量使用 ISO-6429 标准，该标准现在在终端和终端仿真器中几乎是通用的。这将在第十四章中详细讨论。当打印到终端时，这些转义序列执行注释中指示的操作。

清单 12-12 。定义屏幕操作变量

topleft='\e0;0H'     ## Move cursor to top left corner of screen
clearEOS='\e[J'       ## Clear from cursor position to end of screen
clearEOL='\e[K'       ## Clear from cursor position to end of line

函数定义

共有五种功能，其中两种功能menu和qa允许用户更改设置。在readline能够接受用户输入的情况下，如果 shell 版本是bash-4.x 或更高版本，则使用read的-i选项。如果使用测试选项，print_config功能以适合配置文件的格式输出设置，并附有注释。

功能:板牙

当命令失败时，程序通过die函数退出([清单 12-13 )。

清单 12-13 。定义die功能

die() #@ Print error message and exit with error code
{     #@ USAGE: die [errno [message]]

  error=${1:-1}   ## exits with 1 if error number not given
  shift
  [ -n "$*" ] &&
    printf "%s%s: %s\n" "$scriptname" ${version:+" ($version)"} "$*" >&2
  exit "$error"
}

功能:菜单

menu函数使用它的命令行参数来填充菜单(清单 12-14 )。每个参数都包含一个变量名和变量描述，用冒号分隔。

上传设置菜单

================================================================================

    UPLOAD SETTINGS

================================================================================

    1: Remote host (URL or IP address) (127.0.0.1)
    2: ssh port (22)
    3: Destination directory (work/upload)
    4: Login name on remote system (jayant)
    5: Local directory to upload (/home/jayant/public_html/oz-apps.com)
    6: Directory to store archive files (/home/jayant/work/webarchives)
    7: File to touch with time of last upload (.sync)
    q: Quit menu, start uploading
    0: Exit upload

================================================================================

Select 1..7 or 'q/0'

功能进入无限循环，用户通过选择q或0退出。在循环中，menu清空屏幕，然后循环遍历每个参数，将其存储在item中。它使用参数扩展提取变量名和描述:

var=${item%%:*}
description=${item#*:}

每个var的值通过间接扩展${!var}获得，并包含在菜单标签中。菜单编号的字段宽度为${#max}，即最高项目编号的长度。

清单 12-14 。定义menu功能

menu() #@ Print menu, and change settings according to user input
{
  local max=$#
  local menutitle="UPLOAD SETTINGS"
  local readopt

  if [ $max -lt 10 ]
  then             ## if fewer than ten items,
    readopt=-sn1   ## allow single key entry
  else
    readopt=
  fi

  printf "$topleft$clearEOS"  ## Move to top left and clear screen

  while : ## infinite loop
  do

    #########################################################
    ## display menu
    ##
    printf "$topleft"  ## Move cursor to top left corner of screen

    ## print menu title between horizontal bars the width of the screen
    printf "\n%s\n" "${bar:0:$menuwidth}"
    printf "    %s\n" "$menutitle"
    printf "%s\n\n" "${bar:0:$menuwidth}"

    menunum=1

    ## loop through the positional parameters
    for item
    do
      var=${item%%:*}          ## variable name
      description=${item#*:}   ## variable description

      ## print item number, description and value
      printf "   %${#max}d: %s (%s)$clearEOL\n" \
                 "$menunum" "$description" "${!var}"

      menunum=$(( $menunum + 1 ))
    done

    ## … and menu adds its own items
    printf "   %${##}s\n" "q: Quit menu, start uploading" \
                      "0: Exit $scriptname"

    printf "\n${bar:0:$menuwidth}\n"   ## closing bar

    printf "$clearEOS\n" ## Clear to end of screen
    ##
    #########################################################

    #########################################################
    ## User selection and parameter input
    ##

    read -p " Select 1..$max or 'q' " $readopt x
    echo

    [ "$x" = q ] && break  ## User selected Quit
    [ "$x" = 0 ] && exit   ## User selected Exit

    case $x in
      *[!0-9]* | "")
              ## contains non digit or is empty
              printf "\a %s - Invalid entry\n" "$x" >&2
              sleep "$sleepytime"
              ;;
      *) if [ $x -gt $max ]
         then
           printf "\a %s - Invalid entry\n" "$x" >&2
           sleep "$sleepytime"
           continue
         fi

         var=${!x%%:*}
         description=${!x#*:}

         ## prompt user for new value
         printf "      %s$clearEOL\n" "$description"
         readline value "        >> "  "${!var}"

         ## if user did not enter anything, keep old value
         if [ -n "$value" ]
         then
           eval "$var=\$value"
         else
           printf "\a Not changed\n" >&2
           sleep "$sleepytime"
         fi
         ;;
    esac
    ##
    #########################################################

  done
}

功能: qa

qa函数采用与menu相同的参数，但是它并没有将它们放入菜单中，而是提示用户为每个变量输入一个新值(清单 12-15 )。当它运行完所有的命令行参数时，它以与menu相同的方式分割这些参数，它调用menu函数来验证和编辑这些值。也像menu一样，它使用readline来获得输入，如果没有输入任何东西，它就保持原来的值。

清单 12-15 。定义qa功能

qa() #@ Question and answer dialog for variable entry
{
  local item var description

  printf "\n %s - %s\n" "$scriptname" "$description"
  printf " by %s, copyright %d\n"  "$author" "$copyright"
  echo
  if [ ${BASH_VERSINFO[0]} -ge 4 ]
  then
    printf " %s\n" "You may edit existing value using the arrow keys."
  else
    printf " %s\n" "Press the up arrow to bring existing value" \
                   "to the cursor for editing with the arrow keys"
  fi
  echo

  for item
  do
    ## split $item into variable name and description
    var=${item%%:*}
    description=${item#*:}
    printf "\n %s\n" "$description"
    readline value "   >> " "${!var}"
    [ -n "$value" ] && eval "$var=\$value"
  done

  menu "$@"
}

对话是这样的:

$ upload -qt

 upload - Archive new or modified files and upload to web site
 by Chris F.A. Johnson, copyright 2009

 You may edit existing value using the arrow keys.

 Remote host (URL or IP address)
   >> oz-apps.com

 SSH port
   >> 99

 Destination directory
   >> public_html

 Login name on remote system
   >> jayant

 Local directory to upload
   >> /home/jayant/public_html/oz-apps.com

 Directory to store archive files
   >> /home/jayant/work/webarchives

 File to touch with time of last upload
   >> .sync

功能:打印配置

如本章前面所述，print_config函数将varinfo数组中列出的所有变量以适合配置文件的格式打印到标准输出中。虽然在这个程序中可能没有必要，但是它用双引号将赋值值括起来，并使用bash的搜索和替换参数扩展对值中的双引号进行转义:

$ var=location
$ val='some"where'
$ printf "%s\n" "$var=\"${val//\"/\\\"}\""
location="some\"where"

参见清单 12-16 中的选项解析部分，查看print_config的输出示例。

清单 12-16 。定义print_config功能

print_config() #@ Print values in a format suitable for a configuration file
{
  local item var description

  [ -t 1 ] && echo  ## print blank line if output is to a terminal

  for item in "${varinfo[@]}"
  do
    var=${item%%:*}
    description=${item#*:}
    printf "%-35s ## %s\n" "$var=\"\${!var//\"/\\\"}\"" "$description"
  done

  [ -t 1 ] && echo  ## print blank line if output is to a terminal
}

功能: readline

如果您使用的是bash-4.x或更高版本，readline函数会在光标前放置一个值供您编辑(清单 12-17 )。在早期版本的bash中，它将值放入历史记录中，这样您就可以用向上箭头(或 Ctrl+P)来调出它，然后编辑它。

清单 12-17 。定义readline功能

readline() #@ get line from user with editing of current value
{          #@ USAGE var [prompt] [default]
  local var=${1?} prompt=${2:-  >>> } default=$3

  if [ ${BASH_VERSINFO[0]} -ge 4 ]
  then
    read -ep "$prompt" ${default:+-i "$default"} "$var"
  else
    history -s "$default"
    read -ep "$prompt" "$var"
  fi
}

解析命令行选项

您可以通过a、d、f、h、p、s和u选项设置七个配置变量。此外，您可以用c选项指定一个配置文件。可以用t选项触发一个测试运行，它打印配置信息，但不试图创建一个 tarball 或上传任何文件。m和q选项分别为用户提供菜单和问答对话框。

如果将主机作为一个选项给出，则使用标准公式构建配置文件名。如果该文件存在，则将其分配给configfile变量，以便从中加载参数。通常这就是为此目的需要添加到命令行的全部内容(清单 12-18 )。

清单 12-18 。解析命令行选项

while getopts c:h:p:d:u:a:s:f:mqt var
do
  case "$var" in
    c) configfile=$OPTARG ;; 
    h) host=$OPTARG
       hostconfig=$configdir/$scriptname.$host.cfg
       [ -f "$hostconfig" ] &&
         configfile=$hostconfig
       ;;
    p) port=$OPTARG ;;
    s) source=$OPTARG ;;
    d) dest=$OPTARG ;;
    u) user=$OPTARG ;;
    a) archivedir=$OPTARG ;;
    f) syncfile=$OPTARG ;;

    t) test=1 ;; ## show configuration, but do not archive or upload

    m) menu=1 ;;
    q) qa=1 ;;
  esac
done
shift $(( $OPTIND - 1 ))

使用选项和重定向，这个程序可以创建新的配置文件。这里，参数是在命令行中给出的，没有给出的参数使用默认值。

$ upload -t -h www.example.com -p 666 -u paradigm -d public_html \
   -s $HOME/public_html/www.example.com > www.example.com.cfg
$ cat www.example.com.cfg
host="www.example.com"              ## Remote host (URL or IP address)
port="666"                          ## SSH port
dest="public_html"                  ## Destination directory
user="paradigm"                     ## Login name on remote system
source="/home/jayant/public_html/www.example.com" ## Local directory to upload
archivedir="/home/jayant/work/webarchives" ## Directory to store archive files
syncfile=".sync"                    ## File to touch with time of last upload

零零碎碎

下面的清单 12-19 显示了脚本的其余部分。

清单 12-19 。剧本的其余部分

## If a configuration file is defined, try to load it
if [ -n "$configfile" ]
then
  if [ -f "$configfile" ]
  then
    ## exit if problem with config file
    . "$configfile" || die 1 Configuration error
  else
    ## Exit if configuration file is not found.
    die 2 "Configuration file, $configfile, not found"
  fi
fi

## Execute menu or qa if defined
if [ $menu -eq 1 ]
then
  menu "${varinfo[@]}"
elif [ $qa -eq 1 ]
then
  qa "${varinfo[@]}"
fi

## Create datestamped filename for tarball
tarfile=$archivedir/$host.$(date +%Y-%m-%dT%H:%M:%S.tgz)

if [ $test -eq 0 ]
then
  cd "$source" || die 4
fi

## verbose must be set (or not) in the environment or on the command line
if [ ${verbose:-0} -gt 0 ]
then
  printf "\nArchiving and uploading new files in directory: %s\n\n" "$PWD"
  opt=v
else
  opt=
fi

## IFS=$'\n' # uncomment this line if you have spaces in filenames (shame on you!)

if [ ${test:-0} -eq 0 ]
then
  remote_command="cd \"$dest\" || exit;tar -xpzf -"

  ## Archive files newer than $syncfile
  tar cz${opt}f "$tarfile" $( find . -type f -newer "$syncfile") &&

    ## Execute tar on remote computer with input from $tarfile
    ssh -p "$port" -l "$user" "$host" "$remote_command" < "$tarfile" &&

       ## if ssh is successful
       touch "$syncfile"

else ## test mode
  print_config
fi

摘要

本章演示了改变脚本运行时行为的七种方法。如果变化很少，脚本中定义的变量可能就足够了。当这还不够时，命令行选项(用getopts解析)通常就足够了。

您可以使用菜单或问答对话来进行运行时配置，也可以根据需要创建配置文件。对同一个脚本使用不同名称的文件可以节省键入时间。在某些情况下，在 shell 环境中设置一个变量就足够了。

练习

向upload脚本添加代码，检查所有变量是否都被设置为合法值(例如，port是一个整数)。
编写一个usage或help函数，并将其添加到upload脚本中。
在upload脚本中添加一个选项，以保存已保存的配置。
编写一个脚本，创建一个与words.cfg格式相同的配置文件，提示用户在其中输入信息。

十三、数据处理

数据操作包括广泛的动作，远远超过了在一章中所能涵盖的范围。然而，大多数动作只是应用了前面章节中已经介绍过的技术。数组是一种基本的数据结构，虽然语法在第五章的中有所涉及，并且在第十一章的谜题代码中使用了它们，但是我还没有解释它们的用途。参数扩展已经在许多章节中使用，但是它在解析数据结构中的应用还没有被讨论。

本章将介绍使用字符串和数组的不同方式，如何将字符分隔的记录解析成各自的字段，以及如何读取数据文件。有两个操作二维网格的函数库，还有排序和搜索数组的函数。

数组

POSIX shell 中没有包含数组，但是bash从 2.0 版本开始就使用索引数组，在 4.0 版本中增加了关联数组。索引数组使用整数下标进行赋值和引用；关联数组使用字符串。数组可以包含的元素数量没有预设限制；它们只受可用内存的限制。

索引数组中的孔

如果一个索引数组的一些元素未被设置，那么这个数组就会留下空洞，成为一个稀疏数组。这样就不可能仅仅通过增加一个索引来遍历数组。有各种方法来处理这样的数组。为了演示，让我们创建一个数组，并在其中戳一些洞:

array=( a b c d e f g h i j )
unset array[2] array[4] array[6] array[8]

该数组现在包含六个元素，而不是原来的十个:

$ sa "${array[@]}"
:a:
:b:
:d:
:f:
:h:
:j:

遍历所有剩余元素的一种方法是将它们作为参数扩展到for。在这种方法中，没有办法知道每个元素的下标是什么:

for i in "${array[@]}"
do
  : do something with each element, $i, here
done

对于一个打包的数组 (一个没有洞的数组)，索引可以从 0 开始，然后递增以获取下一个元素。对于稀疏(或任意)数组，${!array[@]}展开列出了下标:

$ echo "${!array[@]}"
0 1 3 5 7 9

此扩展可用作for的参数:

for i in "${!array[@]}"
do
  : do something with ${array[$i]} here
done

该解决方案没有提供引用下一个元素的方法。您可以保存前一个元素，但不能获得下一个元素的值。为此，您可以将下标列表放入一个数组中，并使用其元素来引用原始数组。包装阵列要简单得多，去掉孔:

$ array=( "${array[@]}" )
$ echo "${!array[@]}"
0 1 2 3 4 5

注意，这将把关联数组转换成索引数组。

使用数组进行排序

按字母顺序 (或数字)排序数据通常不是 shell 的任务。sort命令是一个非常灵活高效的工具，可以处理大多数排序需求。然而，在一些情况下，排序最好由 shell 来完成。

最明显的是文件名扩展，其中扩展通配符的结果总是按字母顺序排序。例如，在处理带有日期戳的文件时，这很有用。如果日期戳使用标准 ISO 格式YYYY-MM-DD或压缩版本YYYYMMDD，文件将自动按日期顺序排序。如果您有格式为log.YYYYMMDD的文件，它会按时间顺序循环显示:

for file in log.*    ## loop through files in chronological order
do
   : do whatever
done

没必要用ls；shell 对通配符扩展进行排序。

使用bash-4.x，另一个扩展按字母顺序排序:带有单字符下标的关联数组:

$ declare -A q
$ q[c]=1 q[d]=2 q[a]=4
$ sa "${q[@]}"
:4:
:1:
:2:

这导致了编写一个对单词的字母进行排序的函数(清单 13-1 )。

清单 13-1 。lettersort，按字母顺序排列单词中的字母

lettersort() #@ Sort letters in $1, store in $2
{
  local letter string
  declare -A letters
  string=${1:?}
  while [ -n "$string" ]
  do
    letter=${string:0:1}
    letters["$letter"]=${letters["$letter"]}$letter
    string=${string#?}
  done
  printf -v "${2:-_LETTERSORT}" "%s" "${letters[@]}"
}

你会问，这有什么意义？看看这些例子:

$ lettersort triangle; printf "%s\n" "$_LETTERSORT"
aegilnrt
$ lettersort integral; printf "%s\n" "$_LETTERSORT"
aegilnrt

当对字母进行排序时，可以看到这两个单词包含相同的字母。因此，它们是彼此的变位词。用改变、警告、关联的单词来尝试这个过程。

插入排序函数

如果您真的想在 shell 中进行排序，您可以这样做。当元素超过 15 到 20 个时，清单 13-2 中的函数比外部sort命令要慢(具体数字会根据你的计算机、它的负载等等而变化)。它将每个元素插入到数组中的正确位置，然后打印结果数组。

注意sort函数是一个用 C 语言编写的程序，针对速度进行了优化，并进行编译，而用bash编写的脚本在运行时被解释。然而，这完全取决于您正在排序的元素数量和您的 scipt 的构造方式，这决定了sort是否适合使用您自己的脚本排序。

清单 13-2 。isort，对命令行参数进行排序

isort()
{
  local -a a
  a=( "$1" ) ## put first argument in array for initial comparison
  shift      ## remove first argument
  for e      ## for each of the remaining arguments…
  do
    if [ "$e" \< "${a[0]}" ]                ## does it precede the first element?
    then
      a=( "$e" "${a[@]}" )                  ## if yes, put it first
    elif [ "$e" \> "${a[${#a[@]}-1]}" ]     ## if no, does it go at the end?
    then
      a=( "${a[@]}" "$e" )                  ## if yes, put it at the end
    else                                    ## otherwise,
      n=0
      while [ "${a[$n]}" \< "$e" ]          ## find where it goes
      do
        n=$(( $n + 1 ))
      done
      a=( "${a[@]:0:n}" "$e" "${a[@]:n}" )  ## and put it there
    fi
  done
  printf "%s\n" "${a[@]}"
}

要按字母顺序排列加拿大的十个省会，您可以使用以下代码:

$ isort "St. John's" Halifax Fredericton Charlottetown "Quebec City" \
                       Toronto Winnipeg Regina Edmonton Victoria
Charlottetown
Edmonton
Fredericton
Halifax
Quebec City
Regina
St. John's
Toronto
Victoria
Winnipeg

搜索数组

与isort函数一样，这个函数是为相对较小的数组设计的。如果数组包含超过一定数量的元素(50？60?70?)，通过grep管道更快。清单 13-3 中的函数将一个数组名和一个搜索字符串作为参数，并将包含搜索字符串的元素存储在一个新数组_asearch_elements中。

清单 13-3 。asearch，搜索一个字符串数组的元素

asearch() #@ Search for substring in array; results in array _asearch_elements
{         #@ USAGE: asearch arrayname string
  local arrayname=$1 substring=$2  array

  eval "array=( \"\${$arrayname[@]}\" )"

  case ${array[*]} in
    *"$substring"*) ;;  ## it's there; drop through
    *) return 1 ;;      ## not there; return error
  esac

  unset _asearch_elements
  for subscript in "${!array[@]}"
  do
    case ${array[$subscript]} in
      *"$substring"*)
               _asearch_elements+=( "${array[$subscript]}" )
               ;;
    esac
  done
}

要查看函数的运行情况，请将上一节中的省会放入一个数组中，并调用asearch:

$ capitals=( "St. John's" Halifax Fredericton Charlottetown "Quebec City"
                       Toronto Winnipeg Regina Edmonton Victoria )
$ asearch captials Hal && printf "%s\n"  "${_asearch_elements[@]}"
Halifax
$ asearch captials ict && printf "%s\n"  "${_asearch_elements[@]}"
Fredericton
Victoria

将数组读入内存

用bash将文件读入数组有多种方式。最明显也是最慢的一个while read循环:

unset array
while read line
do
  array+=( "$line" )
done < "$kjv"         ## kjv is defined in Chapter 8

一种更快的方法仍然是可移植的，它使用外部命令cat:

IFS=$'\n'             ## split on newlines, so each line is a separate element
array=( $(cat "$kjv") )

在bash中，cat是不必要的:

array=( < "$kjv" )    ## IFS is still set to a newline

有了bash-4.x，一个新的内置命令mapfile，甚至更快:

mapfile -t array < "$kjv"

mapfile的选项允许您选择开始读取的行(实际上，它是开始读取之前要跳过的行数)、要读取的行数以及开始填充数组的索引。如果没有给定数组名，则使用变量MAPFILE 。

以下是mapfile的七个选项:

-n num:读取不超过num行
-O index:从元素index开始填充数组
-s num:丢弃前num行
-t:删除每行的结尾换行符
-u fd:从输入流fd中读取，而不是标准输入
-C callback:每隔N行对 shell 命令callback求值，其中N由-c N置位
-c N:指定callback每次求值之间的行数；默认是5000

使用旧版本的bash，您可以使用 sed从文件中提取行的范围；有了bash-4.x，你可以使用mapfile。清单 13-4 安装一个函数，如果bash的版本是 4.x 或更高版本，则使用mapfile,否则使用sed。

清单 13-4 。getlines，将文件中的一系列行存储在一个数组中

if [ "${BASH_VERSINFO[0]}" -ge 4 ]
then
  getlines() #@ USAGE: getlines file start num arrayname
  {
    mapfile -t -s$(( $2 - 1 )) -n ${3:?} "$4" < "$1"
  }
else
  getlines() #@ USAGE: getlines file start num arrayname
  {
    local IFS=$'\n' getlinearray arrayname=${4:?}
    getlinearray=( $(sed -n "$2,$(( $2 - 1 + $3 )) p" "$1") )
    eval "$arrayname=( \"\${getlinearray[@]}\" )"
  }
fi

进程替换和外部实用程序可与mapfile一起使用，使用不同的标准提取文件的各个部分:

mapfile -t exodus < <(grep ^Exodus: "$kjv")     ## store the book of Exodus
mapfile -t books < <(cut -d: -f1 "$kjv" | uniq) ## store names of all books in KJV

提示你也可以使用readarray将数据从一个文件读入一个数组，它基本上是mapfile的别名。

二维网格

程序员经常要和二维网格的打交道。作为纵横字谜的构造者，我需要将字谜文件中的网格转换成我的客户出版物可以导入桌面出版软件的格式。作为一名国际象棋导师，我需要将国际象棋的位置转换成一种我可以在学生的工作表中使用的格式。在tic-tac-toe、maxit、fifteen(出自第十一章)等游戏中，游戏棋盘是一个格子。

最容易使用的结构是二维数组。因为bash只有一维数组，所以需要一个工作区来模拟二维数组。这可以通过一个数组、一个字符串、一个字符串数组或者一个“穷人”数组来实现(见第九章)。

对于国际象棋图，可以使用关联数组，使用标准代数符号(SAN)来标识方格，a1、b1到g8、h8:

declare -A chessboard
chessboard["a1"]=R
chessboard["a2"]=P
: ... 60 squares skipped
chessboard["g8"]=r
chessboard["h8"]=b

我在一些场合下使用的结构是一个数组，其中每个元素是一个表示等级的字符串:

chessboard=(
  RNBQKBRN
  PPPPPPPP
 "        "
 "        "
 "        "
 "        "
  pppppppp
  rnbqkbnr
)

当使用bash时，我的偏好是一个简单的索引数组:

chessboardarray=(
R N B Q K B R N
P P P P P P P P
"" "" "" "" "" "" "" ""
"" "" "" "" "" "" "" ""
"" "" "" "" "" "" "" ""
"" "" "" "" "" "" "" ""
p p p p p p p p
r n b q k b n r
)

或者，在 POSIX shell 中，它可以是单个字符串:

chessboard="RNBQKBRNPPPPPPPP                                pppppppprnbqkbnr"

接下来，讨论两个函数库，一个用于处理单个字符串中的网格，另一个用于存储在数组中的网格。

使用单字符串网格

我有一个函数库，stringgrid- funcs ，用于处理存储在单个字符串中的二维网格。有一个函数将网格的所有元素初始化为给定的字符，还有一个函数根据x和y坐标计算字符串中的索引。一个是使用x/y获取字符串中的字符，另一个是在x / y将字符放入网格。最后，有一些函数可以打印一个网格，从第一行或最后一行开始。这些函数仅适用于方形网格。

函数: initgrid

给定网格的名称(即变量名)、大小和可选的填充字符，initgrid ( 清单 13-5 )用提供的参数创建一个网格。如果没有提供字符，则使用空格。

清单 13-5 。initgrid，创建一个网格并填充它

initgrid() #@ Fill N x N grid with a character
{          #@ USAGE: initgrid gridname size [character]
  ## If a parameter is missing, it's a programming error, so exit
  local grid gridname=${1:?} char=${3:- } size
  export gridsize=${2:?}                ## set gridsize globally

  size=$(( $gridsize ** 2 ))            ## total number of characters in grid
  printf -v grid "%$size.${size}s" " "  ## print string of spaces to variable
  eval "$gridname=\${grid// /"$char"}"  ## replace spaces with desired character
}

字符串的长度是网格大小的平方。使用printf中的宽度规范创建该长度的字符串，并使用-v选项将其保存到作为参数提供的变量中。然后，模式替换用请求的字符串替换空格。

这个函数和这个库中的其他函数使用${var:?}扩展，如果没有参数值，它会显示一个错误并退出脚本。这是适当的，因为如果缺少参数，这是编程错误，而不是用户错误。即使因为用户未能提供而丢失，也仍然是编程错误；脚本应该检查是否输入了一个值。

井字格是由九个空格组成的字符串。对于如此简单的东西，initgrid函数几乎没有必要，但它是一个有用的抽象:

$ . stringgrid-funcs
$ initgrid ttt 3
$ sa "$ttt"       ## The sa script/function has been used in previous chapters
:         :

函数: gridindex

要将x和y坐标转换到网格串中相应的位置，从row数中减去 1，乘以gridsize，并添加列。清单 13-6 ，gridindex，是一个简单的公式，可以在需要时内联使用，但是抽象使得使用字符串网格更容易，并且将公式本地化，这样如果有变更，它只需要在一个地方修正。

清单 13-6 。gridindex，计算行列索引

gridindex() #@ Store row/column's index into string in var or $_gridindex
{        #@ USAGE: gridindex row column [gridsize] [var]]
  local row=${1:?} col=${2:?}

  ## If gridsize argument is not given, take it from definition in calling script
  local gridsize=${3:-$gridsize}
  printf -v "${4:-_GRIDINDEX}" "%d" "$(( ($row - 1) * $gridsize + $col - 1))"
}

井字格字符串中第 2 行第 3 列的索引是什么？

$ gridindex 2 3    ## gridsize=3
$ echo "$_GRIDINDEX"
5

功能:放网格

要改变网格字符串中的字符，putgrid ( 清单 13-7 )需要四个参数:包含字符串的变量的名称、row和column坐标以及新字符。它使用bash的子串参数扩展将字符串分成字符前的部分和字符后的部分。然后，它将新字符夹在两部分之间，并将复合字符串赋给gridname变量。(与第七章中的_overlay功能进行比较。)

清单 13-7 。putgrid，在指定行和列的网格中插入字符

putgrid() #@ Insert character int grid at row and column
{         #@ USAGE: putgrid gridname row column char
  local gridname=$1        ## grid variable name
  local left right         ## string to left and right of character to be changed
  local index              ## result from gridindex function
  local char=${4:?}        ## character to place in grid
  local grid=${!gridname}  ## get grid string though indirection

  gridindex ${2:?} ${3:?} "$gridsize" index

  left=${grid:0:index}
  right=${grid:index+1}
  grid=$left$4$right
  eval "$gridname=\$grid"
}

以下是井字游戏第一步的代码:

$ putgrid ttt 1 2 X
$ sa "$ttt"
: X       :

函数:获取网格

putgrid的反义词是 getgrid ( 清单 13-8 )。它返回给定位置的字符。它的参数是网格名称(我也可以使用字符串本身，因为没有给它赋值，但是网格名称用于保持一致性)、坐标和存储字符的变量的名称。如果没有提供变量名，它被存储在_GRIDINDEX中。

清单 13-8 。getgrid，获取网格中行列位置的字符

getgrid() #@ Get character from grid in row Y, column X
{         #@ USAGE: getgrid gridname row column var
  : ${1:?} ${2:?} ${3:?} ${4:?}
  local grid=${!1}
  gridindex "$2" "$3"
  eval "$4=\${grid:_GRIDINDEX:1}"
}

这个代码片段返回方块e1中的棋子。国际象棋实用程序会将方块转换成坐标，然后调用getgrid函数。这里直接使用它:

$ gridsize=8
$ chessboard="RNBQKBRNPPPPPPPP                                pppppppprnbqkbnr"
$ getgrid chessboard 1 5 e1
$ sa "$e1"
:K:

功能:显示网格

这个函数(清单 13-9 ) 使用子串扩展和gridsize变量从字符串网格中提取行，并将它们打印到标准输出。

清单 13-9 。showgrid，从一个字符串打印一个网格

showgrid() #@ print grid in rows to stdout
{          #@ USAGE: showgrid gridname [gridsize]
  local grid=${!1:?} gridsize=${2:-$gridsize}
  local row    ## the row to be printed, then removed from local copy of grid

  while [ -n "$grid" ]  ## loop until there's nothing left
  do
    row=${grid:0:"$gridsize"}     ## get first $gridsize characters from grid
    printf "\t:%s:\n" "$row"      ## print the row
    grid=${grid#"$row"}           ## remove $row from front of grid
  done
}

这里另一步棋被添加到井字游戏板上并显示出来:

$ gridsize=3    ## reset gridsize after changing it for the chessboard
$ putgrid ttt 2 2 O ## add O's move in the center square
$ showgrid ttt  ## print it
        : X :
        : O :
        :   :

函数: rshowgrid

对于大多数网格，从左上角开始计算。对于其他的，比如棋盘，从左下角开始。为了显示棋盘，rgridshow函数提取并显示从字符串末尾开始的行，而不是从开头开始。

在清单 13-10 中，子串扩展与负数一起使用。

清单 13-10 。rshowgrid，以相反的顺序打印一个网格

rshowgrid() #@ print grid to stdout in reverse order
{           #@ USAGE: rshowgrid grid [gridsize]
  local grid gridsize=${2:-$gridsize} row
  grid=${!1:?}
  while [ -n "$grid" ]
  do
    ## Note space before minus sign
    ## to distinguish it from default value substitution
    row=${grid: -$gridsize}   ## get last row from grid
    printf "\t:%s:\n" "$row"  ## print it
    grid=${grid%"$row"}       ## remove it
  done
}

这里，rshowgrid用来显示一盘棋的第一步棋。(感兴趣的话，开篇叫鸟的开篇。不常玩，但我已经成功用了 45 年了。)

$ gridsize=8
$ chessboard="RNBQKBRNPPPPPPPP                                pppppppprnbqkbnr"
$ putgrid chessboard 2 6 ' '
$ putgrid chessboard 4 6 P
$ rshowgrid chessboard
        :rnbqkbnr:
        :pppppppp:
        :        :
        :        :
        :     P  :
        :        :
        :PPPPP PP:
        :RNBQKBRN:

这些输出功能可以通过一个实用程序(如sed或awk)的管道输出来扩充，甚至可以替换为特定用途的自定义功能。我发现当管道穿过sed来增加一些间距时，棋盘看起来更好:

$ rshowgrid chessboard | sed 's/./& /g' ## add a space after every character
         : r n b q k b n r :
         : p p p p p p p p :
         :                 :
         :                 :
         :           P     :
         :                 :
         : P P P P P   P P :
         : R N B Q K B R N :

使用数组的二维网格

对于许多网格来说，单个字符串就足够了(并且可以移植到其他 shells)，但是基于数组的网格提供了更多的灵活性。在第十一章的fifteen谜题中，棋盘存放在一个数组中。它使用一个格式字符串用printf打印，这个格式字符串可以很容易地改变，使它具有不同的外观。数组中的井字格可能如下所示:

$ ttt=( "" X "" "" O "" "" X "" )

这是格式字符串:

$ fmt="
     |   |
   %1s | %1s | %1s
 ----+---+----
   %1s | %1s | %1s
 ----+---+----
   %1s | %1s | %1s
     |   |

  "

打印出来的结果是这样的:

$ printf "$fmt" "${ttt[@]}"

     |   |
     | X |
 ----+---+----
     | O |
 ----+---+----
     | X |
     |   |

如果格式字符串更改为:

fmt="

       _/     _/
    %1s  _/  %1s  _/  %1s
       _/     _/
 _/_/_/_/_/_/_/_/_/_/
       _/     _/
    %1s  _/  %1s  _/  %1s
       _/     _/
 _/_/_/_/_/_/_/_/_/_/
       _/     _/
    %1s  _/  %1s  _/  %1s
       _/     _/

"

输出将如下所示:

       _/     _/
       _/  X  _/
       _/     _/
 _/_/_/_/_/_/_/_/_/_/
       _/     _/
       _/  O  _/
       _/     _/
 _/_/_/_/_/_/_/_/_/_/
       _/     _/
       _/  X  _/
       _/     _/

同样的输出可以用单字符串网格来实现，但是它需要循环遍历字符串中的每个字符。数组是一组元素，可以根据需要单独寻址，也可以同时寻址。

arraygrid- funcs 中的功能与stringgrid-funcs中的功能相同。事实上，gridindex的功能与stringgrid-funcs中的功能完全相同，这里不再赘述。与sdtring网格函数一样，有些函数希望网格的大小在变量agridsize 中可用。

函数: initagrid

阵列网格的大多数函数都比它们的单字符串对应物简单。一个明显的例外是initagrid ( 清单 13-11 )，它更长更慢，因为需要一个循环而不是简单的赋值。整个数组可以被指定为参数，任何未使用的数组元素将被初始化为空字符串。

清单 13-11 。initagrid，初始化一个网格数组

initagrid() #@ Fill N x N grid with supplied data (or placeholders if none)
{           #@ USAGE: initgrid gridname size [character ...]
  ## If a required parameter is missing, it's a programming error, so exit
  local grid gridname=${1:?} char=${3:- } size
  export agridsize=${2:?}             ## set agridsize globally

  size=$(( $agridsize * $agridsize )) ## total number of elements in grid

  shift 2        ## Remove first two arguments, gridname and agridsize
  grid=( "$@" )  ## What's left goes into the array

  while [ ${#grid[@]} -lt $size ]
  do
    grid+=( "" )
  done

  eval "$gridname=( \"\${grid[@]}\" )"
}

功能: putagrid

改变一个数组中的值是一个简单的任务。与改变字符串中的字符不同，不需要把它拆开再装回去。所需要的就是从坐标中计算出的索引。这个函数(清单 13-12 )需要定义agridsize。

清单 13-12 。putagrid，替换一个网格元素

putagrid() #@ Replace character in grid at row and column
{          #@ USAGE: putagrid gridname row column char
  local left right pos grid gridname=$1
  local value=${4:?} index
  gridindex ${2:?} ${3:?} "$agridsize" index   ## calculate the index
  eval "$gridname[index]=\$value"              ## assign the value
}

函数: getagrid

给定x和y 坐标，getagrid获取该位置的值，并将其存储在提供的变量中(清单 13-13 )。

清单 13-13 。getagrid，从网格中提取一个条目

getagrid() #@ Get entry from grid in row Y, column X
{          #@ USAGE: getagrid gridname row column var
  : ${1:?} ${2:?} ${3:?} ${4:?}
  local grid

  eval "grid=( \"\${$1[@]}\" )"
  gridindex "$2" "$3"
  eval "$4=\${grid[$_GRIDINDEX]}"
}

功能: showagrid

函数showagrid ( 清单 13-14 )将的数组网格的每一行打印在单独的一行上。

清单 13-14 。showagrid，描述

showagrid() #@ print grid to stdout
{           #@ USAGE: showagrid gridname format [agridsize]
  local gridname=${1:?} grid
  local format=${2:?}
  local agridsize=${3:-${agridsize:?}} row

  eval "grid=( \"\${$1[@]}\" )"
  printf "$format" "${grid[@]}"
}

功能: rshowagrid

函数rshowagrid ( 清单 13-15 ) 以相反的顺序在单独的行上打印数组网格的每一行。

清单 13-15 。r showagrid，描述

rshowagrid() #@ print grid to stdout in reverse order
{            #@ USAGE: rshowagrid gridname format [agridsize]
  local format=${2:?} temp grid
  local agridsize=${3:-$agridsize} row
  eval "grid=( \"\${$1[@]}\" )"
  while [ "${#grid[@]}" -gt 0 ]
  do
    ## Note space before minus sign
    ## to distinguish it from default value substitution
    printf "$format" "${grid[@]: -$agridsize}"
    grid=( "${grid[@]:0:${#grid[@]}-$agridsize}" )
  done
}

数据文件格式

数据文件有许多用途，有许多不同的风格，分为两种主要类型:面向行的和面向块的。在面向行的文件中，每一行都是一个完整的记录，通常带有由某个字符分隔的字段。在面向块的文件中，每条记录可以跨多行，一个文件中可能有多个块。在某些格式中，记录不止是一个块(例如，PGN 格式的国际象棋游戏是由空白行分隔的两个块)。

shell 不是处理大型数据文件的最佳语言；当处理单个记录时更好。然而，有一些实用程序，比如sed和awk，可以有效地处理大型文件，并提取记录传递给 shell。本节处理单个记录。

基于行的记录

基于行的记录是那些的记录，其中文件中的每一行都是一个完整的记录。它通常由一个定界字符分成多个字段，但有时这些字段由长度定义:前 20 个字符是名称，接下来的 20 个字符是地址的第一行，依此类推。

当文件很大时，处理通常由外部实用程序完成，如sed或awk。有时会使用一个外部实用程序来选择一些记录供 shell 处理。这个代码片段在密码文件中搜索 Shell 为bash的用户，并将结果提供给 Shell 来执行一些(未指定的)检查:

grep 'bash$' /etc/passwd |
while read line
do
  : perform some checking here
done

分隔符分隔的值

大多数单行记录都有由某个字符分隔的字段。在/etc/passwd中，分隔符是冒号。在其他文件中，分隔符可能是制表符、波浪号，或者更常见的是逗号。为了使这些记录有用，必须将它们拆分到各自的字段中。

当在输入流上接收到记录时，分割它们的最简单方法是更改IFS并将每个字段读入它自己的变量:

grep 'bash$' /etc/passwd |
while IFS=: read user passwd uid gid name homedir shell
do
  printf "%16s: %s\n" \
      User       "$user" \
      Password   "$passwd" \
      "User ID"  "$uid" \
      "Group ID" "$gid" \
      Name       "$name" \
"Home directory" "$homedir" \
      Shell      "$shell"

  read < /dev/tty
done

有时无法在读取记录时将其拆分，例如，如果需要完整的记录，也可以将其拆分为组成字段。在这种情况下，可以将整行读入一个变量，然后使用几种技术中的任何一种进行拆分。对于所有这些，这里的例子将使用来自/etc/passwd的根条目:

record=root:x:0:0:root:/root:/bin/bash

可以使用参数扩展一次提取一个字段:

for var in user passwd uid gid name homedir shell
do
  eval "$var=\${record%%:*}"  ## extract the first field
  record=${record#*:}         ## and take it off the record
done

只要没有在任何字段中找到定界字符，就可以通过将IFS设置为定界符来分割记录。进行此操作时，应关闭文件名扩展 (使用set -f)以避免扩展任何通配符。字段可以存储在数组中，变量可以设置为引用它们:

IFS=:
set -f
data=( $record )
user=0
passwd=1
uid=2
gid=3
name=4
homedir=5
shell=6

变量名是可用于从data数组中检索值的字段名称:

$ echo;printf "%16s: %s\n" \
      User       "${data[$user]}" \
      Password   "${data[$passwd]}" \
      "User ID"  "${data[$uid]}" \
      "Group ID" "${data[$gid]}" \
      Name       "${data[$name]}" \
"Home directory" "${data[$homedir]}" \
      Shell      "${data[$shell]}"

            User: root
        Password: x
         User ID: 0
        Group ID: 0
            Name: root
  Home directory: /root
           Shell: /bin/bash

更常见的是将每个字段分配给一个标量变量。这个函数(清单 13-16 )获取一个passwd记录，用冒号分割它，并将字段分配给变量。

清单 13-16 。split_passwd，将来自/etc/passwd的记录分割成字段并分配给变量

split_passwd() #@ USAGE: split_passwd RECORD
{
  local opts=$-    ## store current shell options
  local IFS=:
  local record=${1:?} array

  set -f                                  ## Turn off filename expansion
  array=( $record )                       ## Split record into array
  case $opts in *f*);; *) set +f;; esac   ## Turn on expansion if previously set

  user=${array[0]}
  passwd=${array[1]}
  uid=${array[2]}
  gid=${array[3]}
  name=${array[4]}
  homedir=${array[5]}
  shell=${array[6]}
}

同样的事情可以使用这里的文档(清单 13-17 )来完成。

清单 13-17 。split_passwd，将/etc/passwd中的一条记录拆分成字段并分配给变量

split_passwd()
{
  IFS=: read user passwd uid gid name homedir shell <<.
$1
.
}

更一般地，任何字符分隔的记录都可以用这个函数拆分成每个字段的变量(清单 13-18 )。

清单 13-18 。split_record，通过读取变量拆分一条记录

split_record() #@ USAGE parse_record record delimiter var ...
{
  local record=${1:?} IFS=${2:?} ## record and delimiter must be provided
  : ${3:?}                       ## at least one variable is required
  shift 2                        ## remove record and delimiter, leaving variables

  ## Read record into a list of variables using a 'here document'
  read "$@" <<.
$record
.
}

使用前面定义的record,下面是输出:

$ split_record "$record" : user passwd uid gid name homedir shell
$ sa "$user" "$passwd" "$uid" "$gid" "$name" "$homedir" "$shell"
:root:
:x:
:0:
:0:
:root:
:/root:
:/bin/bash:

固定长度字段

比带分隔符的字段更不常见的是固定长度字段。它们不常被使用，但是当它们被使用时，它们会被循环通过name=width字符串来解析它们，这就是许多文本编辑器从固定长度的字段数据文件导入数据的方式:

line="John           123 Fourth Street   Toronto     Canada                "
for nw in name=15 address=20 city=12 country=22
do
  var=${nw%%=*}                 ## variable name precedes the equals sign
  width=${nw#*=}                ## field width follows it
  eval "$var=\${line:0:width}"  ## extract field
  line=${line:width}            ## remove field from the record
done

阻止文件格式

在众多类型的块数据文件中，可移植游戏符号(PGN) 国际象棋文件是可以使用的。它以人类可读和机器可读的格式存储一个或多个国际象棋游戏。所有的国际象棋程序都可以读写这种格式。

每场比赛开始时都有一个七个标签的花名册，上面标明了比赛的时间和地点，比赛者和结果。接下来是一个空行，然后是游戏的走法。

这里有一个 PGN 象棋游戏文件(来自http://cfaj.freeshell.org/Fidel.pgn):

[Event "ICS rated blitz match"]
[Site "69.36.243.188"]
[Date "2009.06.07"]
[Round "-"]
[White "torchess"]
[Black "FidelCastro"]
[Result "1-0"]

1\. f4 c5 2\. e3 Nc6 3\. Bb5 Qc7 4\. Nf3 d6 5\. b3 a6 6\. Bxc6+ Qxc6 7\. Bb2 Nf6
8\. O-O e6 9\. Qe1 Be7 10\. d3 O-O 11\. Nbd2 b5 12\. Qg3 Kh8 13\. Ne4 Nxe4 14.
Qxg7#
{FidelCastro checkmated} 1-0

你可以使用一个while循环来读取标签，然后使用mapfile来获取游戏的移动。gettag函数从每个标签中提取值，并将其分配给标签名(清单 13-19 )。

清单 13-19 。readpgn，解析一个 PGN 游戏并打印游戏在一列中

pgnfile="${1:?}"
header=0
game=0

gettag() #@ create a variable with the same name and value as the tag
{
  local tagline=$1
  tag=${tagline%% *}        ## get line before the first space
  tag=${tag#?}              ## remove the open bracket
  IFS='"' read a val b <<.  ## get the 2nd field, using " as delimiter
   $tagline
.

  eval "$tag=\$val"
}

{
  while IFS= read -r line
  do
    case $line in
      \[*) gettag "$line" ;;
      "") [ -n "$Event" ] && break;;  ## skip blank lines at beginning of file
    esac
  done
  mapfile -t game                     ## read remainder of the file
} < "$pgnfile"

## remove blank lines from end of array
while [ -z "${game[${#game[@]}-1]}" ]
do
  unset game[${#game[@]}-1]
done

## print the game with header
echo "Event: $Event"
echo "Date:  $Date"
echo
set -f
printf "%4s  %-10s %-10s\n" "" White Black  ""  ========== ========== \
          "" "$White" "$Black" ${game[@]:0:${#game[@]}-1}
printf "%s\n" "${game[${#game[@]}-1]}"

摘要

这一章仅仅触及了数据操作可能性的表面，但是希望它能提供一些技术来解决您的一些需求，并为其他人提供一些提示。这一章的大部分内容都涉及到使用最基本的编程结构，数组。展示了处理单行字符分隔记录的技术，以及处理文件中数据块的基本技术。

练习

如果数组超过一定的大小，修改isort和asearch函数，分别使用sort和grep。
Write a function that transposes rows and columns in a grid (either a single-string grid or an array). For example, transform these:
```
123
456
789
```
变成这些:
```
147
256
369
```
转换一些网格函数(字符串或数组版本),以处理非正方形的网格，例如 6 × 3。
将解析固定宽度记录的代码转换成一个函数，该函数接受数据行作为第一个参数，后跟varname=width列表。