使用async Rust的HTTP状态代码教程这篇博文是对我之前关于Rust中不同级别异步的博文的直接跟进。在这里之前，

我们可以通过使用非阻塞的I/O调用来使我们的程序异步化。但上次我们只看到了完全按顺序进行的例子，这违背了异步的全部目的。让我们用更复杂的东西来改变这种情况。

几个月前，我需要确保一个域名的所有URL都能解析到一个真实的网页（200状态代码），或者重定向到其他地方的真实网页。为了实现这一点，我需要一个程序，它可以：

读取一个文本文件中的所有URL，每行一个URL
产生一个包含URL及其状态代码的CSV文件

为了使之简单化，我们要采取很多捷径，比如：

为URLs硬编码输入文件路径
将CSV输出打印到标准输出
使用一个简单的println! 来生成CSV输出，而不是使用一个库。
允许任何错误使整个程序崩溃
- 事实上，正如你在后面看到的，我们真的把它当作一个要求：如果任何HTTP请求有错误，程序必须以错误代码终止，所以我们知道出了问题。

对于好奇的人来说：这个的原始版本是一个非常短的Haskell程序，有这些属性。几周前，为了好玩，我用Rust的两种方式重写了它，最终导致了这两篇博文。

完全阻断

像上次一样，我建议跟着我的代码走。我将用cargo new httpstatus 来开场。然后，为了避免进一步对我们的Cargo.toml ，让我们先期添加我们的依赖性。

[dependencies]
tokio = { version = "0.2.22", features = ["full"] }
reqwest = { version = "0.10.8", features = ["blocking"] }
async-channel = "1.4.1"
is_type = "0.2.1"

这个features = ["blocking"] ，希望能引起你的注意。reqwest 库提供了一个可选的、完全阻塞的API。这似乎是一个开始的好地方。这里有一个漂亮的、简单的程序，可以做我们需要的事情。

// To use .lines() before, just like last time
use std::io::BufRead;

// We'll return _some_ kind of an error
fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Open the file for input
    let file = std::fs::File::open("urls.txt")?;
    // Make a buffered version so we can read lines
    let buffile = std::io::BufReader::new(file);

    // CSV header
    println!("URL,Status");

    // Create a client so we can make requests
    let client = reqwest::blocking::Client::new();

    for line in buffile.lines() {
        // Error handling on reading the lines in the file
        let line = line?;
        // Make a request and send it, getting a response
        let resp = client.get(&line).send()?;
        // Print the status code
        println!("{},{}", line, resp.status().as_u16());
    }
    Ok(())
}

由于Rust的? 语法，这里的错误处理相当容易。事实上，这里基本上没有任何问题。reqwest 使得这段代码非常容易编写。

一旦你把一个urls.txt 文件放在一起，比如下面这个。

https://www.wikipedia.org
https://www.wikipedia.org/path-the-does-not-exist
http://wikipedia.org

你就有希望得到这样的输出。

URL,Status
https://www.wikipedia.org,200
https://www.wikipedia.org/path-the-does-not-exist,404
http://wikipedia.org,200

上面的逻辑是很容易理解的，希望内联评论能解释任何令人困惑的东西。有了这个想法，让我们提高一下我们的游戏。

抛弃阻塞的API

让我们首先抛弃reqwest 中的阻塞式API，但仍然保持程序的所有顺序性。这涉及到对代码的四个相对较小的改动，所有的改动都在下面说明。

use std::io::BufRead;

// First change: add the Tokio runtime
#[tokio::main]
// Second: turn this into an async function
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file = std::fs::File::open("urls.txt")?;
    let buffile = std::io::BufReader::new(file);

    println!("URL,Status");

    // Third change: Now we make an async Client
    let client = reqwest::Client::new();

    for line in buffile.lines() {
        let line = line?;

        // Fourth change: We need to .await after send()
        let resp = client.get(&line).send().await?;

        println!("{},{}", line, resp.status().as_u16());
    }
    Ok(())
}

该程序仍然是完全连续的：我们完全发送一个请求，然后得到响应，然后再转到下一个URL。但我们至少已经准备好开始玩不同的异步方法了。

堵的地方是好的

如果你还记得上次，我们对阻塞的本质进行了一些哲学上的讨论，最终在程序中有些阻塞是可以的。为了简化我们在这里所做的事情，以及提供一些真实世界的建议，让我们列出我们正在做的所有阻塞式I/O：

打开文件urls.txt
从该文件中读取行数
输出到stdout 。println!
隐式关闭文件描述符

请注意，尽管我们现在正在按顺序运行我们的HTTP请求，但那些实际上是在使用非阻塞I/O。因此，我没有把与HTTP有关的东西包括在上面的列表中。接下来我们将开始处理顺序性的问题。

回到上面的四个阻塞式I/O调用，我要做一个大胆的声明：不要费心让它们成为非阻塞式。实际上，使用tokio 来做文件I/O并不是非常困难的（我们上次看到了如何做）。但是我们这样做几乎没有任何好处。本地磁盘访问的延迟，尤其是当我们谈论的是一个像urls.txt 那样小的文件时，尤其是与一堆HTTP请求相比，是微不足道的。

你可以不同意我的观点，也可以把使这些调用非阻塞作为一种练习。但我要把注意力放在更高的目标上。

并发请求

这里真正的问题是，我们有连续的HTTP请求。相反，我们更希望我们的请求是并发的。如果我们假设有100个URL，每个请求需要1秒（希望是高估了），一个顺序的算法最多可以在100秒内完成。然而，理论上，一个并发的算法可以在1秒内完成所有100个请求。在现实中，这是很不可能发生的，但是根据网络条件、你所连接的主机数量以及其他类似的因素，期待一个显著的加速因素是完全合理的。

那么，我们究竟是如何用tokio 做并发的呢？最基本的答案是tokio::spawn 函数。这在tokio 运行时中产生了一个新的任务。这在原理上类似于催生一个新的系统线程。但相反，运行和调度是由运行时而不是操作系统管理的。让我们先试着把每个HTTP请求生成自己的任务。

tokio::spawn(async move {
    let resp = client.get(&line).send().await?;

    println!("{},{}", line, resp.status().as_u16());
});

这看起来不错，但我们有一个问题。

error[E0277]: the `?` operator can only be used in an async block that returns `Result` or `Option` (or another type that implements `std::ops::Try`)
  --> src\main.rs:16:24
   |
15 |           tokio::spawn(async move {
   |  _________________________________-
16 | |             let resp = client.get(&line).send().await?;
   | |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot use the `?` operator in an async block that returns `()`
17 | |
18 | |             println!("{},{}", line, resp.status().as_u16());
19 | |         });
   | |_________- this function should return `Result` or `Option` to accept `?`

我们的任务没有返回一个Result ，因此没有办法抱怨错误。这实际上说明了一个更严重的问题，我们稍后会讨论这个问题。但是现在，让我们假装错误不会发生，用.unwrap() 来欺骗一下。

let resp = client.get(&line).send().await.unwrap();

这也失败了，现在有一个所有权问题。

error[E0382]: use of moved value: `client`
  --> src\main.rs:15:33
   |
10 |       let client = reqwest::Client::new();
   |           ------ move occurs because `client` has type `reqwest::async_impl::client::Client`, which does not implement the `Copy` trait

这个问题比较容易解决。Client 是由多个任务共享的。但每个任务都需要制作自己的Client 的克隆。如果你阅读文档，你会看到这是推荐的行为。

Client 内部有一个连接池，所以建议你创建一个连接池并重复使用它。

你不必将Client 包在Rc 或Arc 中来重复使用它，因为它已经在内部使用了一个Arc 。

一旦我们在我们的tokio::spawn 前面加上这一行，我们的代码就可以编译了。

let client = client.clone();

不幸的是，事情在运行时失败得很厉害。

URL,Status
thread 'thread 'tokio-runtime-workerthread 'tokio-runtime-worker' panicked at '' panicked at 'tokio-runtime-workercalled `Result::unwrap()` on an `Err` value: reqwest::Error { kind: Request, url: "https://www.wikipedia.org/path-the-does-not-exist", source: hyper::Error(Connect, ConnectError("dns error", Custom { kind: Interrupted, error: JoinError::Cancelled })) }called `Result::unwrap()` on an `Err` value: reqwest::Error { kind: Request, url: "https://www.wikipedia.org/", source: hyper::Error(Connect, ConnectError("dns error", Custom { kind: Interrupted, error: JoinError::Cancelled })) }' panicked at '', ', called `Result::unwrap()` on an `Err` value: reqwest::Error { kind: Request, url: "http://wikipedia.org/", source: hyper::Error(Connect, ConnectError("dns error", Custom { kind: Interrupted, error: JoinError::Cancelled })) }src\main.rssrc\main.rs', ::src\main.rs1717:::241724

这是一个很大的错误信息，但对我们来说，重要的是到处都是一堆JoinError::Cancelled 的东西。

等我一下!

让我们讨论一下我们的程序中发生了什么：

启动Tokio运行时
创建一个Client
打开文件，开始逐行阅读
对于每一行
- 生成一个新的任务
- 该任务开始进行非阻塞的I/O调用
- 这些任务进入睡眠状态，当数据准备好时再重新安排。
- 当所有的事情都完成后，打印出CSV行
到达main 函数的末尾，触发运行时间关闭

问题是，我们在完成(4)之前很久就达到了(5)。当这种情况发生时，所有飞行中的I/O将被取消，这导致了我们在上面看到的错误信息。相反，我们需要确保在退出之前等待每个任务的完成。最简单的方法是在调用tokio::spawn 的结果上调用.await 。(顺便说一下，这些结果被称为JoinHandles。)然而，立即这样做将完全破坏我们并发工作的目的，因为我们将再次成为顺序性的！

相反，我们要催生所有的任务，然后等待它们全部完成。实现这一目标的一个简单方法是将所有的JoinHandles放入一个Vec 。让我们看看代码。因为自从上次完整的代码转储后，我们做了一堆改动，所以我将向你展示我们的源文件的完整的当前状态。

use std::io::BufRead;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file = std::fs::File::open("urls.txt")?;
    let buffile = std::io::BufReader::new(file);

    println!("URL,Status");

    let client = reqwest::Client::new();

    let mut handles = Vec::new();

    for line in buffile.lines() {
        let line = line?;

        let client = client.clone();
        let handle = tokio::spawn(async move {
            let resp = client.get(&line).send().await.unwrap();

            println!("{},{}", line, resp.status().as_u16());
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.await?;
    }
    Ok(())
}

我们终于有了一个并发的程序!这实际上是很好的，但它有两个缺陷，我们希望能解决：

它没有正确处理错误，而只是使用.unwrap() 。我在上面提到了这一点，并说我们对.unwrap() 的使用表明了一个 "严重得多的问题"。这个问题就是，主线程从来没有注意到产生子线程的结果值，这实际上是导致我们上面讨论的取消的核心问题。当类型驱动的错误信息指出我们代码中的运行时错误时，这总是很好的
对于我们要产生的并发任务的数量没有限制。理想情况下，我们更希望有一个工作队列的方法，有专门数量的工作任务。这将使我们的程序在增加输入文件中的URL数量时表现得更好。

注意在上面的程序中，有可能跳过spawns，收集一个Vec 的Futures，然后在这些await 。然而，这将再次导致顺序性的结果。生成允许所有这些Futures同时运行，并由tokio 运行时间本身轮询。它也可以使用 join_all来轮询所有的Future，但它有一些性能问题。所以最好坚持使用tokio::spawn 。

让我们先解决一个比较简单的问题：适当的错误处理。

错误处理

错误处理的基本概念是，我们希望在主任务中检测到来自催生任务的错误，然后导致应用程序退出。一种处理方法是直接返回产卵任务的Err 值，然后用spawn 返回的JoinHandle 拾起它们。这听起来不错，但天真地实施起来会导致一次一次地检查错误响应。相反，我们更希望尽早失败，通过检测（例如）第57个请求失败并立即终止应用程序。

你可以做一些 "告诉我哪个是第一个准备好的JoinHandle "，但这不是我最初实现的方式，而且通过一些快速的Google搜索表明你必须小心使用哪些库函数。相反，我们将尝试一种不同的方法，使用一个mpsc （多生产者，单消费者）。

这里有一个基本的想法。让我们假设文件中有100个URL。我们将催生100个任务。每个任务都会向mpsc 通道写入一个单一的值：Result<(), Error> 。然后，在main 任务中，我们将从该通道中读取 100 个值。如果其中任何一个是Err ，我们立即退出程序。否则，如果我们读出100个Ok 的值，我们就成功退出。

在我们读取文件之前，我们不知道文件中会有多少行。所以我们要使用一个无界的通道。这不是通常推荐的做法，但它与我上面的第二个抱怨密切相关：我们为文件中的每一行生成一个单独的任务，而不是做一些更智能的事情，比如工作队列。换句话说，如果我们可以安全地生成N个任务，我们就可以安全地拥有一个大小为N的无界通道。

好了，让我们看看有关的代码吧

use std::io::BufRead;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file = std::fs::File::open("urls.txt")?;
    let buffile = std::io::BufReader::new(file);

    println!("URL,Status");

    let client = reqwest::Client::new();

    // Create the channel. tx will be the sending side (each spawned task),
    // and rx will be the receiving side (the main task after spawning).
    let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel();

    // Keep track of how many lines are in the file, and therefore
    // how many tasks we spawned
    let mut count = 0;

    for line in buffile.lines() {
        let line = line?;

        let client = client.clone();
        // Each spawned task gets its own copy of tx
        let tx = tx.clone();
        tokio::spawn(async move {
            // Use a map to say: if the request went through
            // successfully, then print it. Otherwise:
            // keep the error
            let msg = client.get(&line).send().await.map(|resp| {
                println!("{},{}", line, resp.status().as_u16());
            });
            // And send the message to the channel. We ignore errors here.
            // An error during sending would mean that the receiving side
            // is already closed, which would indicate either programmer
            // error, or that our application is shutting down because
            // another task generated an error.
            tx.send(msg).unwrap();
        });

        // Increase the count of spawned tasks
        count += 1;
    }

    // Drop the sending side, so that we get a None when
    // calling rx.recv() one final time. This allows us to
    // test some extra assertions below
    std::mem::drop(tx);

    let mut i = 0;
    loop {
        match rx.recv().await {
            // All senders are gone, which must mean that
            // we're at the end of our loop
            None => {
                assert_eq!(i, count);
                break Ok(());
            }
            // Something finished successfully, make sure
            // that we haven't reached the final item yet
            Some(Ok(())) => {
                assert!(i < count);
            }
            // Oops, an error! Time to exit!
            Some(Err(e)) => {
                assert!(i < count);
                return Err(From::from(e));
            }
        }
        i += 1;
    }
}

有了这个，我们现在有了一个适当的并发程序，可以正确地进行错误处理。很好!在我们进入任务队列之前，让我们把它清理一下。

工人

前面的代码运行良好。它允许我们生成多个工作任务，然后等待所有的工作任务完成，在发生错误时进行处理。让我们概括一下吧!我们现在这样做，因为这将使本博文的最后一步更容易。

我们将把所有的代码放在我们项目的一个单独模块中。除了我们会有一个很好的struct 来保存我们的数据，而且我们会更明确地说明错误类型，代码将与我们之前的代码基本相同。把这段代码放到src/workers.rs 。

use is_type::Is; // fun trick, we'll look at it below
use std::future::Future;
use tokio::sync::mpsc;

/// Spawn and then run workers to completion, handling errors
pub struct Workers<E> {
    count: usize,
    tx: mpsc::UnboundedSender<Result<(), E>>,
    rx: mpsc::UnboundedReceiver<Result<(), E>>,
}

impl<E: Send + 'static> Workers<E> {
    /// Create a new Workers value
    pub fn new() -> Self {
        let (tx, rx) = mpsc::unbounded_channel();
        Workers { count: 0, tx, rx }
    }

    /// Spawn a new task to run inside this Workers
    pub fn spawn<T>(&mut self, task: T)
    where
        // Make sure we can run the task
        T: Future + Send + 'static,
        // And a weird trick: make sure that the output
        // from the task is Result<(), E>
        // Equality constraints would make this much nicer
        // See: https://github.com/rust-lang/rust/issues/20041
        T::Output: Is<Type = Result<(), E>>,
    {
        // Get a new copy of the send side
        let tx = self.tx.clone();
        // Spawn a new task
        tokio::spawn(async move {
            // Run the provided task and get its result
            let res = task.await;
            // Send the task to the channel
            // This should never fail, so we panic if something goes wrong
            match tx.send(res.into_val()) {
                Ok(()) => (),
                // could use .unwrap, but that would require Debug constraint
                Err(_) => panic!("Impossible happend! tx.send failed"),
            }
        });
        // One more worker to wait for
        self.count += 1;
    }

    /// Finish running all of the workers, exiting when the first one errors or all of them complete
    pub async fn run(mut self) -> Result<(), E> {
        // Make sure we don't wait for ourself here
        std::mem::drop(self.tx);
        // How many workers have completed?
        let mut i = 0;

        loop {
            match self.rx.recv().await {
                None => {
                    assert_eq!(i, self.count);
                    break Ok(());
                }
                Some(Ok(())) => {
                    assert!(i < self.count);
                }
                Some(Err(e)) => {
                    assert!(i < self.count);
                    return Err(e);
                }
            }
            i += 1;
        }
    }
}

现在在src/main.rs ，我们将只关注我们的业务逻辑...和错误处理。看一下新的内容。

// Indicate that we have another module
mod workers;

use std::io::BufRead;

/// Create a new error type to handle the two ways errors can happen.
#[derive(Debug)]
enum AppError {
    IO(std::io::Error),
    Reqwest(reqwest::Error),
}

// And now implement some boilerplate From impls to support ? syntax
impl From<std::io::Error> for AppError {
    fn from(e: std::io::Error) -> Self {
        AppError::IO(e)
    }
}

impl From<reqwest::Error> for AppError {
    fn from(e: reqwest::Error) -> Self {
        AppError::Reqwest(e)
    }
}

#[tokio::main]
async fn main() -> Result<(), AppError> {
    let file = std::fs::File::open("urls.txt")?;
    let buffile = std::io::BufReader::new(file);

    println!("URL,Status");

    let client = reqwest::Client::new();
    let mut workers = workers::Workers::new();

    for line in buffile.lines() {
        let line = line?;
        let client = client.clone();
        // Use workers.spawn, and no longer worry about results
        // ? works just fine inside!
        workers.spawn(async move {
            let resp = client.get(&line).send().await?;
            println!("{},{}", line, resp.status().as_u16());
            Ok(())
        })
    }

    // Wait for the workers to complete
    workers.run().await
}

围绕着错误处理有更多的噪音，但总的来说，代码更容易理解。现在，我们已经解决了这个问题，我们终于准备好解决最后一块内容了...

工作队列

让我们再次回顾一下，我们是如何用工作者进行错误处理的。我们设置了一个通道，允许每个工作任务将其结果发送到一个单一的接收器，即主任务。我们使用mpsc ，即 "多生产者单消费者"。这与我们刚才描述的一致，对吗？

好吧，一个工作队列有点类似。我们希望有一个单一的任务，从文件中读取行，并将其送入一个通道。然后，我们想让多个工作者从通道中读取数值。这就是 "单生产者多消费者"。不幸的是，tokio 并没有提供这样的通道。在我在Twitter上询问之后，有人推荐我使用async-channel，它提供了一个 "多生产者多消费者"。这对我们来说是可行的。

感谢我们之前对Workers struct 重构的工作，现在这很容易了。让我们看一下修改后的main 函数。

#[tokio::main]
async fn main() -> Result<(), AppError> {
    let file = std::fs::File::open("urls.txt")?;
    let buffile = std::io::BufReader::new(file);

    println!("URL,Status");

    // Feel free to define to any numnber (> 0) you want
    // At a value of 4, this could comfortably fit in OS threads
    // But tasks are certainly up to the challenge, and will scale
    // up more nicely for large numbers and more complex applications
    const WORKERS: usize = 4;
    let client = reqwest::Client::new();
    let mut workers = workers::Workers::new();
    // Buffers double the size of the number of workers are common
    let (tx, rx) = async_channel::bounded(WORKERS * 2);

    // Spawn the task to fill up the queue
    workers.spawn(async move {
        for line in buffile.lines() {
            let line = line?;
            tx.send(line).await.unwrap();
        }
        Ok(())
    });

    // Spawn off the individual workers
    for _ in 0..WORKERS {
        let client = client.clone();
        let rx = rx.clone();
        workers.spawn(async move {
            loop {
                match rx.recv().await {
                    // uses Err to represent a closed channel due to tx being dropped
                    Err(_) => break Ok(()),
                    Ok(line) => {
                        let resp = client.get(&line).send().await?;
                        println!("{},{}", line, resp.status().as_u16());
                    }
                }
            }
        })
    }

    // Wait for the workers to complete
    workers.run().await
}

就这样，我们有了一个并发的工作队列!这就是我们想要的一切!

总结

我承认，当我上周写这篇文章的时候，我并没有想到我会对这个话题进行如此深入的研究。但是一旦我开始玩解决方案，我决定要为此实现一个完整的作业队列。