Async Rust 的现状：trait 里的异步函数，究竟走到哪一步了？本文根据原英文博客 Catching up w

本文根据原英文博客 Catching up with async Rust 完整翻译改写

一个迟到的里程碑

2023 年 12 月，Rust 生态发生了一件期待已久的事：trait 里的 async fn 正式稳定。

在此之前，Rust 1.39 已经支持独立的异步函数：

pub async fn read_hosts() -> eyre::Result<Vec<u8>> {
    // ...
}

也支持在 impl 块里写异步函数：

impl HostReader {
    pub async fn read_hosts(&self) -> eyre::Result<Vec<u8>> {
        // ...
    }
}

但在 trait 里写 async fn，一直是不被允许的：

use std::io;

trait AsyncRead {
    async fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
}

在 Rust 1.74 之前编译这段代码，编译器会明确告诉你：

error[E0706]: functions in traits cannot be declared `async`
  |
  = note: `async` trait functions are not currently supported
  = note: consider using the `async-trait` crate: https://crates.io/crates/async-trait

过去的折中方案：`async-trait` 宏

在此之前，社区的标准做法是使用 async-trait 这个 crate：

use std::io;

#[async_trait::async_trait]
trait AsyncRead {
    async fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
}

它能用，但代价是：这个宏会把 trait 定义（以及所有实现）改造成返回 pinned boxed future。

什么是 boxed future？就是分配在堆上的 future。为什么需要这样做？这要从 future 的大小说起。

Future 的大小问题

下面这两个异步函数，返回的 future 大小是不一样的：

async fn foo() {
    tokio::time::sleep(std::time::Duration::from_secs(1)).await;
    println!("done");
}

async fn bar() {
    let mut a = [0u8; 72];
    tokio::time::sleep(std::time::Duration::from_secs(1)).await;
    for _ in 0..10 {
        a[0] += 1;
    }
    println!("done");
}

bar 比 foo 多了一个 72 字节的数组 a。这个数组在 .await 挂起期间不能被释放——它必须作为 future 状态的一部分保存起来。用实际数字验证：

foo: 128 字节
bar: 200 字节

这就带来了一个根本性问题：当编译器处理函数调用时，它需要提前知道返回值的大小，才能在栈上分配足够的空间。

来看个具体例子：

fn main() {
    let step1: u64 = 0;
    let _foo = foo();
    let step2: u64 = 0;
    // ...
}

查看对应的汇编，函数一开头就预留了 256 字节的栈空间：

sansioex::main:
    sub sp, sp, #256
    stp x20, x19, [sp, #224]
    stp x29, x30, [sp, #240]

局部变量 step1、_foo（即 foo() 的 future）、step2 依次排列在栈上。step1 和 step2 之间的距离，是 step1 本身的大小（8 字节）加上 _foo 的大小（128 字节），即 136 字节：

cargo run --quiet
distance in bytes between before and after: 136

栈布局大概是这样的：

[step1: 8字节][_foo: 128字节][step2: 8字节]
                ↑
         step1 和 step2 之间距离 = 136

为什么要 Box？

现在回到 trait 的问题。如果我们有一个 &dyn AsyncRead，并对它调用 read：

async fn use_read(r: &mut dyn AsyncRead) {
    let mut buf = [0; 1024];
    let fut = r.read(&mut buf);  // fut 有多大？
    fut.await;
}

r 背后可能是任意实现了 AsyncRead 的类型，每种类型的 read 返回的 future 大小都不同。编译器无法提前知道要为 fut 分配多少栈空间。

理论上可以把"future 的大小"编码进 vtable 里，先查大小再分配——这大致就是"unsized locals"特性的思路，也是长远计划的方向。但目前为止，唯一可行的方法是把 future 装进 Box。

Box 是一个指针，大小固定是 8 字节。但 Box<dyn Future> 是个胖指针——它额外携带一个 vtable 指针，指向运行时分发所需的函数表。因此大小是 16 字节：

fn main() {
    let _foo: Pin<Box<dyn Future<Output = ()>>> = Box::pin(foo());
    let _bar: Pin<Box<dyn Future<Output = ()>>> = Box::pin(bar());

    println!("Size of foo: {} bytes", std::mem::size_of_val(&_foo));
    println!("Size of bar: {} bytes", std::mem::size_of_val(&_bar));
}

Size of foo: 16 bytes
Size of bar: 16 bytes

两个大小完全不同的 future，装进 Box 之后在栈上都只占 16 字节——一个指向实际数据的指针，一个指向 vtable 的指针。

动态分发的内部机制

用 LLDB 可以直接观察 _foo 的内存结构：

(lldb) p _foo
(core::pin::Pin<alloc::boxed::Box<...>>) {
  __pointer = {
    pointer = 0x0000600001a24100   // 指向堆上的 future 数据
    vtable  = 0x0000000100084068   // 指向函数表
  }
}

检查 vtable 里的内容，可以看到一系列 64 位地址，指向实际的函数实现：

(lldb) x/8gx .__pointer.vtable
0x100084068: 0x0000000100004ae4 0x0000000000000080
0x100084078: 0x0000000000000008 0x0000000100004c58
0x100084088: 0x0000000100004a60 0x00000000000000c8
...

其中一个地址指向的正是我们的异步函数（的闭包实现）：

(lldb) image lookup -a 0x0000000100004c58
  Summary: sansioex`sansioex::foo::_{{closure}}::... at main.rs:59

vtable 里还包含 Drop 实现等其他函数。这就是 dyn Trait 动态分发的工作方式——Box<String> 只需要 8 字节，而 Box<dyn Display> 需要 16 字节：

alloc::boxed::Box<alloc::string::String>      8 bytes
alloc::boxed::Box<dyn core::fmt::Display>    16 bytes

这正是 async-trait 宏所做的事：把 trait 里的 async fn 变换成返回 Pin<Box<dyn Future>> 的普通函数。展开后的签名大概长这样：

fn read<'life0, 'life1, 'async_trait>(
    &'life0 mut self,
    buf: &'life1 mut [u8],
) -> ::core::pin::Pin<
    Box<
        dyn ::core::future::Future<Output = io::Result<usize>>
            + ::core::marker::Send
            + 'async_trait,
    >,
>
// ...

很难看，但返回类型是固定的 16 字节，不管实现方的 future 究竟多大。

Rust 1.75：trait 里原生支持 async fn

从 Rust 1.75 开始，不再需要 async-trait 宏。这段代码可以直接编译：

use std::io;

trait AsyncRead {
    async fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
}

可以给任意类型实现这个 trait：

impl AsyncRead for () {
    async fn read(&mut self, _buf: &mut [u8]) -> io::Result<usize> {
        let a = [0u8; 72];
        tokio::time::sleep(std::time::Duration::from_secs(1)).await;
        Ok(a[3] as _)
    }
}

这次返回的 future 没有被 box：

fn main() {
    let mut s = ();
    let mut buf = [0u8; 72];
    let fut = s.read(&mut buf);
    print_type_name_and_size(&fut);
}

<() as sansioex::AsyncRead>::read::{{closure}}   224 bytes

如果 future 被 box 了，这里会打印 16 字节。

但问题还没完全解决：dyn 兼容性

看起来很美好。但如果尝试通过 Box<dyn AsyncRead> 来调用呢？

fn use_async_read(r: Box<dyn AsyncRead>) {
    let mut buf = [0u8; 72];
    let fut = r.read(&mut buf);  // 这里会报错
    // ...
}

编译器会报错：

error[E0038]: the trait `AsyncRead` cannot be made into an object
  |
note: for a trait to be "dyn-compatible" it needs to allow building a vtable...
  --> src/main.rs:36:14
  |
  | async fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
  |          ^^^^ ...because method `read` is `async`

根本原因还是那个老问题：r 背后可能是任何类型，每种类型返回的 future 大小不同，vtable 里没有存储这个信息，所以没法做动态分发。

dyn 不兼容的情况不只有 async fn，还包括"按值接收 self"等其他情况：

trait EatSelf {
    fn nomnomnom(self) {}  // 不是 dyn 兼容的
}

不过，取 Box<Self> 是可以的，因为 Box 是指针，大小固定。其他智能指针和引用也都没问题：

// 这些都是 dyn 兼容的
trait TraitMethods {
    fn by_ref(self: &Self) {}
    fn by_ref_mut(self: &mut Self) {}
    fn by_box(self: Box<Self>) {}
    fn by_rc(self: Rc<Self>) {}
    fn by_arc(self: Arc<Self>) {}
    fn by_pin(self: Pin<&Self>) {}
    fn nested_pin(self: Pin<Arc<Self>>) {}
}

dyn 不兼容并不意味着无法使用这个 trait。可以用 impl AsyncRead（即泛型）：

fn use_reader(_reader: impl AsyncRead) {}
// 等价于
fn use_reader<R: AsyncRead>(_reader: R) {}

只是暂时还不能用 &dyn AsyncRead。

关联类型：async fn 的真实面目

在 trait 里写 async fn，其实是一种语法糖。这个：

trait AsyncRead {
    async fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
}

等价于：

trait AsyncRead {
    fn read(&mut self, buf: &mut [u8]) -> impl Future<Output = io::Result<usize>>;
}

而返回位置的 impl Trait 又等价于一个关联类型：

trait AsyncRead {
    type ReadFuture: Future<Output = io::Result<usize>>;
    fn read(&mut self, buf: &mut [u8]) -> Self::ReadFuture;
}

这就是为什么 async fn 在 trait 里不满足 dyn 兼容性——它引入了一个隐式的关联类型，而关联类型的具体值在动态分发时是未知的。

在 Rust nightly 上，可以用 #![feature(impl_trait_in_assoc_type)] 来明确使用这个关联类型：

#![feature(impl_trait_in_assoc_type)]

use std::{future::Future, io};

trait AsyncRead {
    type ReadFuture: Future<Output = io::Result<usize>>;
    fn read(&mut self, buf: &mut [u8]) -> Self::ReadFuture;
}

impl AsyncRead for () {
    // 不稳定特性：ReadFuture 的具体类型从 read 的函数体推断
    type ReadFuture = impl Future<Output = io::Result<usize>>;

    fn read(&mut self, _buf: &mut [u8]) -> Self::ReadFuture {
        async move {
            let a = [0u8; 72];
            tokio::time::sleep(std::time::Duration::from_secs(1)).await;
            Ok(a[3] as _)
        }
    }
}

用 nightly 运行：

<() as sansioex::AsyncRead>::read::{{closure}}   200 bytes

经典案例：tower 的 Service trait

这个模式对于用过 tower（通过 hyper）的人很熟悉。tower 的 Service trait 有一个 Future 关联类型：

pub trait Service<Request> {
    type Response;
    type Error;
    type Future: Future<Output = Result<Self::Response, Self::Error>>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>>;
    fn call(&mut self, req: Request) -> Self::Future;
}

实现这个 trait 时，大体上有三种选择：

手写 Future，避免堆分配（最复杂）
把 type Future 设为 Pin<Box<dyn Future<...>>>（简单但有分配开销）
用 nightly 启用 #![feature(impl_trait_in_assoc_type)]

用 async fn 简化 Service trait

既然 Rust 1.75 稳定了 trait 里的 async fn，可以想象一个更简洁的 Service trait：

trait Service<Request> {
    type Response;
    type Error;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>>;
    async fn call(&mut self, request: Request) -> Result<Self::Response, Self::Error>;
}

实现一个空操作服务变得非常自然：

impl<Request> Service<Request> for () {
    type Response = ();
    type Error = ();

    fn poll_ready(&mut self, _cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        Poll::Ready(Ok(()))
    }

    async fn call(&mut self, _request: Request) -> Result<Self::Response, Self::Error> {
        Ok(())
    }
}

实现一个打印请求的日志中间件也很清晰：

impl<S, Request> Service<Request> for LogRequest<S>
where
    S: Service<Request>,
    Request: std::fmt::Debug,
{
    type Response = S::Response;
    type Error = S::Error;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.inner.poll_ready(cx)
    }

    async fn call(&mut self, request: Request) -> Result<Self::Response, Self::Error> {
        println!("{:?}", request);
        self.inner.call(request).await
    }
}

这些 future 都没有被 box，全部在 Rust 1.75 stable 上可以工作：

#[tokio::main]
async fn main() {
    let mut service = LogRequest { inner: () };
    let fut = service.call(());
    print_type_name_and_size(&fut);
    fut.await.unwrap();
}

<sansioex::LogRequest<()> as sansioex::Service<()>>::call::{{closure}}   32 bytes
()

但这个简化版 trait 带来了几个重要限制，下面逐一分析。

限制一：无法命名返回类型

原始 tower Service trait 可以给 future 起一个具体的类型名（type Future = ...），并基于这个名字做进一步的组合。例如 Either 服务：

// 原始 tower Service 的 Either 实现（简化）
impl<A, B, Request> Service<Request> for Either<A, B>
where
    A: Service<Request>,
    B: Service<Request, Response = A::Response, Error = A::Error>,
{
    type Response = A::Response;
    type Error = A::Error;
    type Future = EitherResponseFuture<A::Future, B::Future>;

    fn call(&mut self, request: Request) -> Self::Future {
        match self {
            Either::Left(service) => EitherResponseFuture {
                kind: Kind::Left { inner: service.call(request) },
            },
            Either::Right(service) => EitherResponseFuture {
                kind: Kind::Right { inner: service.call(request) },
            },
        }
    }
}

但对于我们简化版的 Service trait，Either 的实现反而更简单：

// 简化版 Service 的 Either 实现
impl<A, B, Request> Service<Request> for Either<A, B>
where
    A: Service<Request>,
    B: Service<Request, Response = A::Response, Error = A::Error>,
{
    type Response = A::Response;
    type Error = A::Error;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        match self {
            Either::Left(service) => service.poll_ready(cx),
            Either::Right(service) => service.poll_ready(cx),
        }
    }

    async fn call(&mut self, request: Request) -> Result<Self::Response, Self::Error> {
        match self {
            Either::Left(service) => service.call(request).await,
            Either::Right(service) => service.call(request).await,
        }
    }
}

这一点对简化版 trait 来说是加分项。

补充：Rust 的生命周期机制

在进入下一个限制之前，先回顾一下 Rust 的生命周期。

Rust 要求你显式标注"返回值借用了哪个输入"：

fn substring<'s>(input: &'s str, start: usize, end: usize) -> &'s str {
    &input[start..end]
}

这告诉编译器：返回值的生命周期和 input 绑定。有了这个信息，编译器就能阻止"使用已释放的值"这类错误：

fn main() {
    let s = String::from("Hello, world!");
    let t = substring(&s, 0, 5);
    drop(s);       // 释放 s
    println!("{t}"); // 使用 t，但 t 借用自 s，所以报错
}

error[E0505]: cannot move out of `s` because it is borrowed

如果让函数返回一个不借用输入的值（如 String），那么 drop(s) 之后使用 t 就完全没问题：

fn substring(input: &str, start: usize, end: usize) -> String {
    input[start..end].to_string()
}

限制二：隐式的生命周期捕获

async fn 在 trait 里存在一个容易踩坑的地方：future 会隐式捕获 self 的引用。

原始的 tower Service trait，call 返回的 future 不能借用 self。看这个例子，实现 Service<i32> for i32，让服务把请求值加上自身：

#![feature(impl_trait_in_assoc_type)]

impl Service<i32> for i32 {
    type Response = i32;
    type Error = ();
    type Future = impl Future<Output = Result<Self::Response, Self::Error>>;

    fn call(&mut self, request: i32) -> Self::Future {
        async move { Ok(*self + request) }  // 编译失败！
    }
}

编译器报错：

error[E0700]: hidden type for `<i32 as Service<i32>>::Future` captures lifetime
              that does not appear in bounds

因为 future 捕获了 &mut self，但 type Future 的声明里没有生命周期参数，两者矛盾。

修复方法是引入 GAT（Generic Associated Types，泛型关联类型），让关联类型带上生命周期：

pub trait Service<Request> {
    type Response;
    type Error;
    type Future<'a>: Future<Output = Result<Self::Response, Self::Error>> + 'a
    where
        Self: 'a;

    fn call(&mut self, request: Request) -> Self::Future<'_>;
}

GAT 在 Rust 1.65 才稳定，这也是为什么 tower 的 Service trait 至今仍没有使用它——当时还不存在。

有了 GAT 版本的 trait，就可以在 future 里借用 self 了：

impl Service<i32> for i32 {
    type Response = i32;
    type Error = ();
    type Future<'a> = impl Future<Output = Result<Self::Response, Self::Error>> + 'a;

    fn call(&mut self, request: i32) -> Self::Future<'_> {
        async move { Ok(*self + request) }
    }
}

但原始 tower trait 要求 future 不能借用 self，所以只能这样写：

fn call(&mut self, request: i32) -> Self::Future {
    let this = *self;  // 提前把值复制出来
    async move { Ok(this + request) }
}

相比之下，简化版 Service trait 用 async fn 可以自然地借用 self：

impl Service<i32> for i32 {
    type Response = i32;
    type Error = ();

    async fn call(&mut self, request: i32) -> Result<Self::Response, Self::Error> {
        Ok(*self + request)  // 可以直接借用 self
    }
}

但代价是：因为 call 的 future 借用了 self，你没办法同时发起多个请求——Rust 不允许同时有两个可变借用：

#[tokio::main]
async fn main() {
    let mut service: i32 = 2024;

    let fut1 = service.call(-34);  // 第一次可变借用
    let fut2 = service.call(-25);  // 第二次可变借用 —— 编译错误！

    let (response1, response2) = tokio::try_join!(fut1, fut2).unwrap();
}

error[E0499]: cannot borrow `service` as mutable more than once at a time

限制三：无法指定 `'static` 生命周期约束

假设我们想让 future 满足 'static（即不持有任何借用），可以把 trait 改成返回 impl Future + 'static：

use std::future::Future;

pub trait Service<Request> {
    type Response;
    type Error;

    fn call(
        &mut self,
        request: Request,
    ) -> impl Future<Output = Result<Self::Response, Self::Error>> + 'static;
}

但之前用 async fn call 写的实现立刻报错：

error[E0477]: the type `impl Future<...>` does not fulfill the required lifetime
note: type must satisfy the static lifetime as required by this binding

为什么？因为 async fn 语法下，future 默认会捕获 &mut self，而 &mut self 的生命周期不是 'static。

要修复这个问题，必须放弃 async fn 语法，手动返回 impl Future + 'static，并提前把需要的值从 self 里复制出来：

impl Service<i32> for i32 {
    type Response = i32;
    type Error = ();

    fn call(
        &mut self,
        request: i32,
    ) -> impl Future<Output = Result<Self::Response, Self::Error>> + 'static {
        let this = *self;
        async move { Ok(this + request) }
    }
}

这样，多个 future 可以并发运行，也可以 spawn 到 tokio：

#[tokio::main]
async fn main() {
    let mut service: i32 = 2024;

    let fut1 = service.call(-34);
    let fut2 = service.call(-25);

    let (response1, response2) = tokio::try_join!(fut1, fut2).unwrap();
    println!("Got responses: {response1:?}, {response2:?}");
}

Got responses: 1990, 1999

限制四：Send 约束无法在 trait 层面保证

用了 + 'static，还能直接 tokio::spawn 吗？

let fut1 = tokio::spawn(service.call(-34));
let fut2 = tokio::spawn(service.call(-25));

Got responses: Ok(1990), Ok(1999)

这居然能编译——因为编译器能看到 i32 这个具体类型，知道它的 future 确实实现了 Send。

但如果在 future 里引入一个非 Send 的值：

fn call(&mut self, request: i32) -> impl Future<...> + 'static {
    let this = *self;
    let something_not_send = std::rc::Rc::new(());
    async move {
        let _woops = something_not_send;  // Rc 不是 Send
        Ok(this + request)
    }
}

就会编译报错：

error: future cannot be sent between threads safely
  = help: `Rc<()>` is not `Send`

更大的问题在于，如果在一个泛型函数里通过 Service trait 来 spawn：

async fn do_the_spawning<S>(service: &mut S)
where
    S: Service<i32>,
{
    let fut1 = tokio::spawn(service.call(-34));  // 编译错误！
    let fut2 = tokio::spawn(service.call(-25));
    // ...
}

即使加上对 Response 和 Error 的 Send 约束，编译器还是会报错：

error[E0277]: `impl Future<...> + 'static` cannot be sent between threads safely
  = help: the trait `Send` is not implemented for `impl Future<...> + 'static`

原因是：trait 定义里没有声明 future 必须是 Send，而在调用点无法补救这一点。

解决方法是在 trait 定义里明确加上 Send 约束：

pub trait Service<Request> {
    type Response;
    type Error;

    fn call(
        &mut self,
        request: Request,
    ) -> impl Future<Output = Result<Self::Response, Self::Error>> + Send + 'static;
}

但这又带来新的代价：trait 变得更严格了，不支持 Send 的实现不再能满足这个 trait。你的 trait 的适用性降低了。

与原始 tower Service trait 相比，这个简化版在灵活性上明显不如。

当前状态的总结

把以上内容梳理成一个对比表：

	`async-trait` 宏	Rust 1.75 native `async fn`	返回 `impl Future`
需要堆分配	是	否	否
dyn 兼容	是	否	否
可借用 self	是	是	需手动处理
可指定 `'static`	可以	不容易	可以
可指定 `Send`	可以（宏参数）	不容易	可以
多请求并发	可以	不能	可以
稳定版可用	是	是	是

正在推进的解决方案

async Rust WG 正在填补这些空白。目前有两个值得关注的 crate：

trait-variant：允许你在一个地方同时声明 Send 和非 Send 两个版本的 trait，减少重复：

#[trait_variant::make(HttpService: Send)]
pub trait LocalHttpService {
    async fn fetch(&self, url: String) -> String;
}

这会自动生成两个 trait：LocalHttpService（无 Send 约束）和 HttpService（有 Send 约束），避免手写两套。

dynosaur：允许对含有 async fn 的 trait 使用动态分发，填补当前 dyn 不兼容的空缺。

而作者最期待的终极方案是 dyn async traits——让 Box<dyn Trait> 原生支持含有 async fn 的 trait，完全不需要任何宏或包装层。这个特性目前仍在设计和实现中，但方向已经明确。

结语

Rust 的 async trait 故事，是一段关于"逐步解决困难问题"的历史。

从 async-trait 宏时代的"堆分配换便利"，到 Rust 1.75 的"无分配但有新约束"，再到正在推进的 dyn async traits——每一步都在正确的方向上前进，但每一步也都伴随着新的权衡和新的限制浮出水面。

async fn 进入 trait，表面上是一个语法问题，背后牵扯的是：future 的大小、生命周期捕获、动态分发的 vtable 设计、Send 约束的传播——这些问题彼此交织，没有一个能单独解决。

这也是为什么 Rust 选择把它们一步一步地解决，而不是等到所有问题都解决再一次性推出。

原文：Catching up with async Rust — fasterthanli.me 配套视频：YouTube

Async Rust 的现状：trait 里的异步函数，究竟走到哪一步了？