译| 关于 Unix 命令 `yes` 的小故事有echo命令，用于将字符串打印到标准输出流，并以 o 为结束的命

原文阅读：A Little Story About the `yes` Unix Command

写在前面：瑟瑟发抖的首次翻译

这是第一次动手翻译一篇外文，看懂和翻懂是不一样的，你所见到的是 v3.0 版本…

感谢依云 信雅达的科普和满满的批注，还有依云和传奇老师的最后的校正，以及，H 老师的文章分享~

如果你发现本文有任何一处翻译不当的，欢迎指教，感谢感谢(///▽///)

译文开始

你所知的最简单的 Unix 命令是什么呢？

有echo命令，用于将字符串打印到标准输出流，并以 o 为结束的命令。

在成堆的简单 Unix 命令中，也有 yes 命令。如果你不带参数地运行yes命令，你会得到一串无尽的被换行符分隔开的 y 字符流：

y
y
y
y
(...你明白了吧)

一开始看似无意义的东西原来它是非常的有用：

yes | sh 糟心的安装.sh

你曾经有安装一个程序，需要你输入“y”并按下回车继续安装的经历吗？yes命令就是你的救星。它会很好地履行安装程序继续执行的义务，而你可以继续观看 Pootie Tang.（一部歌舞喜剧）。

编写 yes

emmm，这是 BASIC 编写 ‘yes’的一个基础版本：

10 PRINT "y"
20 GOTO 10

下面这个是用 Python 实现的编写 ‘yes’：

while True:
    print("y")

看似很简单？不，执行速度没那么快！事实证明，这个程序执行的速度非常慢。

python yes.py | pv -r > /dev/null
[4.17MiB/s]

和我 Mac 自带的版本执行速度相比：

yes | pv -r > /dev/null
[34.2MiB/s]

所以我重新写了一个执行速度更快的的 Rust 版本，这是我的第一次尝试：

use std::env;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  loop {
    println!("{}", expletive);
  }
}

解释一下：

循环里想打印的那个被叫做expletive字符串是第一个命令行的参数。expletive这个词是我在yes书册里学会的；
用 unwrap_or给expletive传参，为了防止参数没有初始化，我们将yes作为默认值
用into()方法将默认参数将从单个字符串转换为堆上的字符串

来，我们测试下效果：

cargo run --release | pv -r > /dev/null
   Compiling yes v0.1.0
    Finished release [optimized] target(s) in 1.0 secs
     Running `target/release/yes`
[2.35MiB/s]

emmm，速度上看上去并没有多大提升，它甚至比 Python 版本的运行速度更慢。这结果让我意外，于是我决定分析下用 C 实现的写入‘yes’程序的源代码。

这是 C 语言的第一个版本，这是 Ken Thompson 在 1979 年 1 月 10 日 Unix 第七版里的 C 实现的编写‘yes’程序：

main(argc, argv)
char **argv;
{
  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
}

这里没有魔法。

将它同 GitHub 上镜像的 GNU coreutils 的 128 行代码版相比较，即使 25 年过去了，它依旧在发展更新。上一次的代码变动是在一年前，现在它执行速度快多啦：

# brew install coreutils
gyes | pv -r > /dev/null 
[854MiB/s]

最后，重头戏来了：

/* Repeatedly output the buffer until there is a write error; then fail.  */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
  continue;

wow，让写入速度更快他们只是用了一个缓冲区。常量BUFSIZ用来表明这个缓冲区的大小，根据不同的操作系统会选择不同的缓冲区大小【写入/读取】操作高效（延伸阅读传送门。我的系统的缓冲区大小是 1024 个字节，事实上，我用 8192 个字节能更高效。

好，来看看我改进的 Rust 新版本：

use std::io::{self, Write};

const BUFSIZE: usize = 8192;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
  loop {
    writeln!(writer, "{}", expletive).unwrap();
  }
}

最关键的一点是，缓冲区的大小要是 4 的倍数以确保内存对齐。

现在运行速度是 51.3MiB/s ，比我系统默认的版本执行速度快多了，但仍然比 Ken Thompson 在 [高效的输入输出] (https://www.gnu.org/software/libc/manual/html_node/Controlling-Buffering.html) 文中说的 10.2GiB/s 慢。

更新

再一次，Rust 社区没让我失望。

这篇文章刚发布到 Reddit 的 Rust 板块， Reddit 的用户 nwydo 就提到了之前关于速率问题的讨论。这个是先前讨论人员的优化代码，它打破了我机子的 3GB/s 的速度：

use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;

use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;

pub fn to_bytes(os_str: OsString) -> Vec<u8> {
  use std::os::unix::ffi::OsStringExt;
  os_str.into_vec()
}

fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
  if output.len() > buffer.len() / 2 {
    return output;
  }

  let mut buffer_size = output.len();
  buffer[..buffer_size].clone_from_slice(output);

  while buffer_size < buffer.len() / 2 {
    let (left, right) = buffer.split_at_mut(buffer_size);
    right[..buffer_size].clone_from_slice(left);
    buffer_size *= 2;
  }

  &buffer[..buffer_size]
}

fn write(output: &[u8]) {
  let stdout = io::stdout();
  let mut locked = stdout.lock();
  let mut buffer = [0u8; BUFFER_CAPACITY];

  let filled = fill_up_buffer(&mut buffer, output);
  while locked.write_all(filled).is_ok() {}
}

fn main() {
  write(&env::args_os().nth(1).map(to_bytes).map_or(
    Cow::Borrowed(
      &b"y\n"[..],
    ),
    |mut arg| {
      arg.push(b'\n');
      Cow::Owned(arg)
    },
  ));
  process::exit(1);
}

一个新的实现方式！

我们预先准备了一个填充好的字符串缓冲区，在每次循环中重用。
标准输出流被锁保护着，所以，我们不采用不断地获取、释放的形式，相反的，我们用 lock 进行数据写入同步。
我们用平台原生的 std::ffi::OsString 和 std::borrow::Cow 去避免不必要的空间分配

我唯一能做的事情就是删除一个不必要的 mut 。

这是我这次经历的一个总结：

看似简单的 yes 程序其实没那么简单，它用了一个输出缓冲和内存对齐形式去提高性能。重新实现 Unix 工具很有意思，我很欣赏那些让电脑运行飞速的有趣的小技巧。

附上原文

A Little Story About the `yes` Unix Command

What's the simplest Unix command you know? There's echo, which prints a string to stdout andtrue, which always terminates with an exit code of 0.

Among the rows of simple Unix commands, there's alsoyes. If you run it without arguments, you get an infinite stream of y's, separated by a newline:

y
y
y
y
(...you get the idea)

What seems to be pointless in the beginning turns out to be pretty helpful :

yes | sh boring_installation.sh

Ever installed a program, which required you to type "y" and hit enter to keep going?yesto the rescue! It will carefully fulfill this duty, so you can keep watchingPootie Tang.

Writing yes

Here's a basic version in... uhm... BASIC.

10 PRINT "y"
20 GOTO 10

And here's the same thing in Python:

while True:
    print("y")

Simple, eh? Not so quick! Turns out, that program is quite slow.

python yes.py | pv -r > /dev/null
[4.17MiB/s]

Compare that with the built-in version on my Mac:

yes | pv -r > /dev/null [34.2MiB/s] So I tried to write a quicker version in Rust. Here's my first attempt:

use std::env;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  loop {
    println!("{}", expletive);
  }
}

Some explanations:

The string we want to print in a loop is the first command line parameter and is named expletive. I learned this word from the yes manpage.
I use unwrap_or to get the expletive from the parameters. In case the parameter is not set, we use "y" as a default.
The default parameter gets converted from a string slice (&str) into an owned string on the heap (String) using into().

Let's test it.

cargo run --release | pv -r > /dev/null
   Compiling yes v0.1.0
    Finished release [optimized] target(s) in 1.0 secs
     Running `target/release/yes`
[2.35MiB/s]

Whoops, that doesn't look any better. It's even slower than the Python version! That caught my attention, so I looked around for the source code of a C implementation.

Here's the very first version of the program, released with Version 7 Unix and famously authored by Ken Thompson on Jan 10, 1979:

main(argc, argv)
char **argv;
{
  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
}

No magic here.

Compare that to the 128-line-version from the GNU coreutils, which is mirrored on Github. After 25 years, it is still under active development! The last code change happened around a year ago. That's quite fast:

# brew install coreutils
gyes | pv -r > /dev/null 
[854MiB/s]

The important part is at the end:

/* Repeatedly output the buffer until there is a write error; then fail.  */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
  continue;

Aha! So they simply use a buffer to make write operations faster. The buffer size is defined by a constant namedBUFSIZ, which gets chosen on each system so as to make I/O efficient (see here). On my system, that was defined as 1024 bytes. I actually had better performance with 8192 bytes.

I've extended my Rust program:

use std::env;
use std::io::{self, BufWriter, Write};

const BUFSIZE: usize = 8192;

fn main() {
    let expletive = env::args().nth(1).unwrap_or("y".into());
    let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
    loop {
        writeln!(writer, "{}", expletive).unwrap();
    }
}

The important part is, that the buffer size is a multiple of four, to ensure memory alignment.

Running that gave me 51.3MiB/s. Faster than the version, which comes with my system, but still way slower than the results from this Reddit post that I found, where the author talks about 10.2GiB/s.

####Update

Once again, the Rust community did not disappoint. As soon as this post hit the Rust subreddit, user nwydo pointed out a previous discussion on the same topic. Here's their optimized code, that breaks the 3GB/s mark on my machine:

use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;

use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;

pub fn to_bytes(os_str: OsString) -> Vec<u8> {
  use std::os::unix::ffi::OsStringExt;
  os_str.into_vec()
}

fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
  if output.len() > buffer.len() / 2 {
    return output;
  }

  let mut buffer_size = output.len();
  buffer[..buffer_size].clone_from_slice(output);

  while buffer_size < buffer.len() / 2 {
    let (left, right) = buffer.split_at_mut(buffer_size);
    right[..buffer_size].clone_from_slice(left);
    buffer_size *= 2;
  }

  &buffer[..buffer_size]
}

fn write(output: &[u8]) {
  let stdout = io::stdout();
  let mut locked = stdout.lock();
  let mut buffer = [0u8; BUFFER_CAPACITY];

  let filled = fill_up_buffer(&mut buffer, output);
  while locked.write_all(filled).is_ok() {}
}

fn main() {
  write(&env::args_os().nth(1).map(to_bytes).map_or(
    Cow::Borrowed(
      &b"y\n"[..],
    ),
    |mut arg| {
      arg.push(b'\n');
      Cow::Owned(arg)
    },
  ));
  process::exit(1);
}

Now that's a whole different ballgame!

We prepare a filled string buffer, which will be reused for each loop.
Stdout is protected by a lock. So, instead of constantly acquiring and releasing it, we keep it all the time.
We use a the platform-native std::ffi::OsString and std::borrow::Cow to avoid unnecessary allocations.

The only thing, that I could contribute was removing an unnecessary mut. 😅

Lessons learned

The trivial programyesturns out not to be so trivial after all. It uses output buffering and memory alignment to improve performance. Re-implementing Unix tools is fun and makes me appreciate the nifty tricks, which make our computers fast.

译| 关于 Unix 命令 `yes` 的小故事