rust-unofficial.github.io/too-many-li…

Overview

这一章通过构建基于链表的栈来讲解一些 Rust 的基础语法。
首先，我们通过cargo new --lib lists，创建一个库。接着在这个库中创建一个新的模块，并且命名为 first.rs，并且在 lib.rs 中添加下面这一行代码

pub mod first;

这个章节的代码树如下：

├── src
│   ├── first.rs
│   ├── lib.rs

Layout

首先，我们需要定义一个链表，下面是函数式的一种定义：

List a = Empty | Elem a (List a)

这个意思就是：一个链表可以是空或者是一个元素接着跟着一个链表。这是一种递归的定义，是一种 sum type，在 Rust 中 sum type 可以用 enum 来表示。所以我们可以很自然的写出下面这种结构：

pub enum List {
    Empty,
    Elem(i32, List),
}

接着通过cargo build编译一下，会出现下面的错误：

> cargo build

error[E0072]: recursive type `first::List` has infinite size
 --> src/first.rs:4:1
  |
4 | pub enum List {
  | ^^^^^^^^^^^^^ recursive type has infinite size
5 |     Empty,
6 |     Elem(i32, List),
  |               ---- recursive without indirection
  |
  = help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `first::List` representable

这个错误意思是在List中有无法在编译时确定大小的类型，我们需要记住的是在 Rust 中不允许存在在编译期未知大小的类型存在。所以我们需要插入一个智能指针std::boxed::Box来解决这个问题。(Box文档)

#[derive(Debug)]
enum List<T> {
    Cons(T, Box<List<T>>),
    Nil,
}

fn main() {
    let list: List<i32> = List::Cons(1, Box::new(List::Cons(2, Box::new(List::Nil))));
    println!("{:?}", list);
}

这个 main 函数会打印出 Cons(1, Box(Cons(2, Box(Nil))))

但是上面的定义会存在一些问题，

[] = Stack 这个表示在栈中分配
() = Heap 这个表示在堆中分配
// 这是我们的链表在内存中的结构
[Elem A, ptr] -> (Elem B, ptr) -> (Empty, *junk*)

这里存在两个问题：

我们分配了一个空的节点
我们所有的节点不是都分配在堆上首先是针对地一个问题。

[ptr] -> (Elem A, ptr) -> (Elem B, *null*)

在上面的内存分布中，我们的节点都是在堆上面，而且没有了最后的空节点。这里需要补充一下 enum 在内存中是如何分布的。
在一般情况下，假如我们有一个 enum：

enum Foo {
    D1(T1),
    D2(T2),
    ...
    Dn(Tn),
}

Foo 需要一些整数来表示枚举中不同的变量（D1, D2, ..Dn），这就是枚举中的 tag。它的存储大小还需要加上枚举中所占内存最大的变量的存储大小。
所以尽管 Empty 只需要1个 bit 来表示，但是这也需要消耗一定的空间来表示，因为它有可能会变成 Elem。
对于第二个问题来说，在对链表进行添加或者删除的时候不会有问题，但是在分裂和合并链表的时候就会有问题。考虑下面两种分布下分割链表的情况：

layout 1:

[Elem A, ptr] -> (Elem B, ptr) -> (Elem C, ptr) -> (Empty *junk*)

split off C:

[Elem A, ptr] -> (Elem B, ptr) -> (Empty *junk*)
[Elem C, ptr] -> (Empty *junk*)

layout 2:

[ptr] -> (Elem A, ptr) -> (Elem B, ptr) -> (Elem C, *null*)

split off C:

[ptr] -> (Elem A, ptr) -> (Elem B, *null*)
[ptr] -> (Elem C, *null*)

布局2的分裂需要拷贝B的指针到栈中并将原来的值置为空。布局1则是将C从堆拷贝到栈。合并是一个反向操作的过程。链表有一个重要的属性是：你可以插入或删除一个元素而不用移动其他元素。但是上面这个例子明显破坏了这个属性。现在我们来根据 layout 2 重写：

pub enum List {
    Empty,
    ElemThenEmpty(i32),
    ElemThenNotEmpty(i32, Box<List>),
}

这个似乎解决了分配空节点的问题。但是，这其实浪费了更多的空间，对于 Rust 来说。因为 Rust 有 null pointer optimization。
我们看一下是如何优化的：

enum Foo {
    A,
    B(ContainsANonNullPtr),
}

null pointer optimization 消除了 tag 所需要的空间。如果变量是 A，则枚举的 bit 全是0，否则这个枚举表示的是 B。因为 B 肯定不可能全为0。
因为有了这一个特性，所以 &，&mut，Box，Rc，Arc，Vec等可以没有开销的放到 Option 中。
那我们如何可以避免额外的垃圾，统一的分配方式和利用好 null pointer optimization？我们知道 enum 可以声明一个类型包含好几种值中的一种，struct 可以声明一种类型同时包含好几种值（While enums let us declare a type that can contain one of several values, structs let us declare a type that contains many values at once.）让我们将我们的链表分为两种类型：一种是 List，一种是 Node。

struct Node {
    elem: i32,
    next: List,
}

pub enum List {
    Empty,
    More(Box<Node>),
}

但是这里也有一个问题，enum 是 public 的，所以他里面的变量也得是 public，意味着 Node 也得是 public 的。但是我们不想要把 Node 给暴露出来，这时候我们给 List 再套一层 struct。

pub struct List {
    head: Link,
}

enum Link {
    Empty,
    More(Box<Node>),
}

struct Node {
    elem: i32,
    next: Link,
}

到这我们的链表结构基本就完成了。

New

在 Rust 中，为类型声明方法使用 impl 块：

impl List {
    // TODO, make code happen
}

声明一个函数：

fn foo(arg1: Type1, arg2: Type2) -> ReturnType {
    // body
}

我们要做的第一件事就是通过函数去构造一个链表：

impl List {
    pub fn new() -> Self {
        List { head: Link::Empty }
    }
}

有几点需要注意的：

Self 是跟在 impl 后面的类型别名。
我们创造一个结构体实例和声明的方式差不多，只是我们需要提供字段的值。
我们用 :: 来使用枚举中的变量。
在函数中的最后一个表达式是隐式返回值，你仍然可以使用 return 来返回值。

Ownership 101

Rust 中的方法是一种特殊的函数，因为参数中有 self

fn foo(self, arg2: Type2) -> ReturnType {
    // body
}

self 一共有3种主要的形式：self，&mut self，&self。他们分别代表 Rust 中三种主要的所有权形式：

self - 值
&mut self - 可变引用
&self - 共享不可变引用值表示真正拥有所有权。你可以做任何你想做的事情对一个值：移动它，毁坏它，改变它，或者引用它。当你传递值时，它会移动到一个新的位置，旧的位置将不再能访问它。
可变引用表示对一个值临时互斥访问并且不拥有它。你可以通过可变引用对一个值做任何事，只要你不改变它的所有权。一个非常有用的例子是交换值。
一个共享引用表示对一个值的临时共享访问。因为是共享的访问，所以不允许修改

Push

让我们 push 一个值到链表里。因为 push 会改变链表，所以我们使用的是 &mut self。

impl List {
    pub fn push(&mut self, elem: i32) {
        // TODO
    }
}

第一步，我们需要新建一个节点：

    pub fn push(&mut self, elem: i32) {
        let new_node = Node {
            elem: elem,
            next: ?????
        };
    }

next 应该是什么？应该是原来的链表

impl List {
    pub fn push(&mut self, elem: i32) {
        let new_node = Node {
            elem: elem,
            next: self.head,
        };
    }
}

> cargo build
error[E0507]: cannot move out of borrowed content
  --> src/first.rs:19:19
   |
19 |             next: self.head,
   |                   ^^^^^^^^^ cannot move out of borrowed content

Rust 告诉我们错误的原因。我们尝试将 self.head 移动到 next，但是 Rust 不想我们这样做。当我们结束借用并将其“归还”给其合法所有者时，这将使 self 仅部分初始化。（我知道这句话翻译的很奇怪，原文是：This would leave self only partially initialized when we end the borrow and "give it back" to its rightful owner.）。这是很粗鲁的，Rust 是很文明的，不允许我们这么做。假如我们将它放回呢？

pub fn push(&mut self, elem: i32) {
    let new_node = Box::new(Node {
        elem: elem,
        next: self.head,
    });

    self.head = Link::More(new_node);
}

> cargo build
error[E0507]: cannot move out of borrowed content
  --> src/first.rs:19:19
   |
19 |             next: self.head,
   |                   ^^^^^^^^^ cannot move out of borrowed content

很显然这是没有用的。我们需要一些其他的方式来获得链表的头。Indy 建议使用 std::mem::replace 这个操作。这个函数可以让我们从借用中投取值出来然后用另外一个值替换（Moves src into the referenced dest, returning the previous dest value.）

pub fn push(&mut self, elem: i32) {
    let new_node = Box::new(Node {
        elem: elem,
        next: mem::replace(&mut self.head, Link::Empty),
    });

    self.head = Link::More(new_node);
}

这里我们用 Link::Empty 临时替换 self.head，然后用列表的新头替换它。
这一切都完成了！实说，我们可能应该测试它。现在最简单的方法可能是编写 pop，并确保它产生正确的结果。

Pop

像 push 一样，pop 想要改变链表。但是与 push 不一样的是，我们需要返回一些值。而且，pop 需要处理一些边界情况，比如链表为空情况。为了应对返回的值为空，我们使用 Option

pub fn pop(&mut self) -> Option<i32> {
    // TODO
}

Option<T> 是一个枚举表示一个值是否存在。它可以是 Some(T) 或者 None。Option<T> 中的 T 表示范型 T。这意味着你可以为任何类型创建 Option
所以我们要如何判断是链表是 Empty 还是 More 呢？可以使用模式匹配

pub fn pop(&mut self) -> Option<i32> {
    match self.head {
        Link::Empty => {
            // TODO
        }
        Link::More(node) => {
            // TODO
        }
    };
}

> cargo build

error[E0308]: mismatched types
  --> src/first.rs:27:30
   |
27 |     pub fn pop(&mut self) -> Option<i32> {
   |            ---               ^^^^^^^^^^^ expected enum `std::option::Option`, found ()
   |            |
   |            this function's body doesn't return
   |
   = note: expected type `std::option::Option<i32>`
              found type `()`

pop 需要返回一个值，但是我们没有返回。我们可以返回 None，但是在这个例子中更好的是返回 unimplemented!()，来表示我们没有实现这个函数。unimplemented!() 是一个宏，程序会崩溃当我们调用这个函数的时候

pub fn pop(&mut self) -> Option<i32> {
    match self.head {
        Link::Empty => {
            // TODO
        }
        Link::More(node) => {
            // TODO
        }
    };
    unimplemented!()
}

无条件的 panic 是发散函数（diverging function）的一个例子。Diverging function 不会返回给调用者，所以他们可以用在需要任何类型值的地方。
需要注意的是我们不需要写 return 在我们的程序中。最后一个表达式（不带分号）在一个函数中隐式的返回一个值。我们也可以更早的返回一个值，通过使用 return，像其他类 C 语言一样。

> cargo build

error[E0507]: cannot move out of borrowed content
  --> src/first.rs:28:15
   |
28 |         match self.head {
   |               ^^^^^^^^^
   |               |
   |               cannot move out of borrowed content
   |               help: consider borrowing here: `&self.head`
...
32 |             Link::More(node) => {
   |                        ---- data moved here
   |
note: move occurs because `node` has type `std::boxed::Box<first::Node>`, which does not implement the `Copy` trait
  --> src/first.rs:32:24
   |
32 |             Link::More(node) => {
   |                        ^^^^

默认情况，模式匹配会尝试将它的内容移动到新的分支里，但是我们不能这样做，因为我们没有它的所有权。

help: consider borrowing here: `&self.head`

Rust 告诉我们应该添加一个引用在 match 中，让我们试一下：

pub fn pop(&mut self) -> Option<i32> {
    match &self.head {
        Link::Empty => {
            // TODO
        }
        Link::More(node) => {
            // TODO
        }
    };
    unimplemented!()
}

> cargo build

warning: unused variable: `node`
  --> src/first.rs:32:24
   |
32 |             Link::More(node) => {
   |                        ^^^^ help: consider prefixing with an underscore: `_node`
   |
   = note: #[warn(unused_variables)] on by default

warning: field is never used: `elem`
  --> src/first.rs:13:5
   |
13 |     elem: i32,
   |     ^^^^^^^^^
   |
   = note: #[warn(dead_code)] on by default

warning: field is never used: `next`
  --> src/first.rs:14:5
   |
14 |     next: Link,
   |     ^^^^^^^^^^

再次编译！我们想要返回一个 Option，所以让我们新建一个变量。在 Empty 分支，我们需要返回 None。在 More 分支，我们需要返回 Some(i32)，然后改变链表的头。再次编译：

pub fn pop(&mut self) -> Option<i32> {
    let result;
    match &self.head {
        Link::Empty => {
            result = None;
        }
        Link::More(node) => {
            result = Some(node.elem);
            self.head = node.next;
        }
    };
    result
}

   Compiling lists v0.1.0 (/Users/ABeingessner/dev/temp/lists)
error[E0507]: cannot move out of borrowed content
  --> src/first.rs:35:29
   |
35 |                 self.head = node.next;
   |                             ^^^^^^^^^ cannot move out of borrowed content

当我们所拥有的只是对它的共享引用时，我们正试图移出节点。
我们可能应该退后一步，想想我们要做什么。我们想：

检查列表是否为空。
如果为空，则返回 None
如果非空
- 移除链表的头
- 移除 elem
- 用 next 替换链表的头
- 返回 Some(elem) 关键的点是我们需要将链表的头指向 next，但是 node 是一个引用，我们不能将引用的值移出来赋值给另外一个变量。所以，我们只能再次使用 replace。

pub fn pop(&mut self) -> Option<i32> {
    let result;
    match mem::replace(&mut self.head, Link::Empty) {
        Link::Empty => {
            result = None;
        }
        Link::More(node) => {
            result = Some(node.elem);
            self.head = node.next;
        }
    };
    result
}

cargo build

   Finished dev [unoptimized + debuginfo] target(s) in 0.22s

当然我们可以更简略的写这个代码：

pub fn pop(&mut self) -> Option<i32> {
    match mem::replace(&mut self.head, Link::Empty) {
        Link::Empty => None,
        Link::More(node) => {
            self.head = node.next;
            Some(node.elem)
        }
    }
}

Testing

现在我们需要测试我们的栈。Rust 和 cargo 支持测试，我们只需要编写一个普通的函数然后给它添加一个注解 #[test]。
总的来说，我们会将我们的测试代码放在编写好的代码之后。正常来说，我们会新建一个命名空间给测试，来避免与其他的代码冲突。我们可以使用 mod 在一个文件里新建一个模块：

// in first.rs

mod test {
    #[test]
    fn basics() {
        // TODO
    }
}

我们通过 cargo test 来调用它

> cargo test
   Compiling lists v0.1.0 (/Users/ABeingessner/dev/temp/lists)
    Finished dev [unoptimized + debuginfo] target(s) in 1.00s
     Running /Users/ABeingessner/dev/lists/target/debug/deps/lists-86544f1d97438f1f

running 1 test
test first::test::basics ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
; 0 filtered out

我们开始编写测试用例。我们使用了 assert_eq! 宏，这个可以用来比较两个东西。

mod test {
    use super::List;
    #[test]
    fn basics() {
        let mut list = List::new();

        // Check empty list behaves right
        assert_eq!(list.pop(), None);

        // Populate list
        list.push(1);
        list.push(2);
        list.push(3);

        // Check normal removal
        assert_eq!(list.pop(), Some(3));
        assert_eq!(list.pop(), Some(2));

        // Push some more just to make sure nothing's corrupted
        list.push(4);
        list.push(5);

        // Check normal removal
        assert_eq!(list.pop(), Some(5));
        assert_eq!(list.pop(), Some(4));

        // Check exhaustion
        assert_eq!(list.pop(), Some(1));
        assert_eq!(list.pop(), None);
    }
}

Drop

我们已经成功的建好了栈，而且我们测试他们是否可以工作正常。
我们需要关注如何清理链表吗？Rust 自动使用 destructor 来清理资源。如果类型实现了 Drop 特性，这个类型就有了 destructor。特性（Trait）是 Rust 的接口。下面是 Drop trait：

pub trait Drop {
    fn drop(&mut self);
}

实际上不需要自己实现 Drop，如果这个类型已经实现了 Drop 的话。然后你要做的只是调用它。
让我们考虑下面这个简单的链表：

list -> A -> B -> C

当 list 被删除时，它会尝试删除 A，A 会尝试删除 B，B 会尝试删除 C。这很明显是一个递归程序，递归程序可能会导致栈溢出。
你们中的一些人可能会想“这显然是尾递归，任何体面的语言都会确保这样的代码不会破坏堆栈”。这其实是不正确的！要了解原因，让我们尝试编写编译器必须执行的操作，通过像编译器那样手动实现 List 的 Drop：

impl Drop for List {
    fn drop(&mut self) {
        // NOTE: you can't actually explicitly call `drop` in real Rust code;
        // we're pretending to be the compiler!
        self.head.drop(); // tail recursive - good!
    }
}

impl Drop for Link {
    fn drop(&mut self) {
        match *self {
            Link::Empty => {} // Done!
            Link::More(ref mut boxed_node) => {
                boxed_node.drop(); // tail recursive - good!
            }
        }
    }
}

impl Drop for Box<Node> {
    fn drop(&mut self) {
        self.ptr.drop(); // uh oh, not tail recursive!
        deallocate(self.ptr);
    }
}

impl Drop for Node {
    fn drop(&mut self) {
        self.next.drop();
    }
}

我们不能在解除分配后删除 Box 的内容，因此无法以尾递归的方式进行删除！相反，我们将不得不手动为 List 编写一个迭代 drop，将节点从它们的盒子中提升出来。

impl Drop for List {
    fn drop(&mut self) {
        let mut cur_link = mem::replace(&mut self.head, Link::Empty);
        // `while let` == "do this thing until this pattern doesn't match"
        while let Link::More(mut boxed_node) = cur_link {
            cur_link = mem::replace(&mut boxed_node.next, Link::Empty);
            // boxed_node goes out of scope and gets dropped here;
            // but its Node's `next` field has been set to Link::Empty
            // so no unbounded recursion occurs.
        }
    }
}

> cargo test

     Running target/debug/lists-5c71138492ad4b4a

running 1 test
test first::test::basics ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

Final Code

pub struct List {
    head: Link,
}

enum Link {
    Empty,
    More(Box<Node>),
}

struct Node {
    elem: i32,
    next: Link,
}

impl List {
    pub fn new() -> Self {
        List {head: Link::Empty}
    }
    
    pub fn push(&mut self, elem: i32) {
        let new_node = Node {
            elem,
            next: std::mem::replace(&mut self.head, Link::Empty)
        };
        self.head = Link::More(Box::new(new_node));
    }

    pub fn pop(&mut self) -> Option<i32> {
        let result;
        match std::mem::replace(&mut self.head, Link::Empty) {
            Link::Empty => {
                result = None;
            }
            Link::More(node) => {
                result = Some(node.elem);
                self.head = node.next;
            }
        };
        result
    }
}

impl Drop for List {
    fn drop(&mut self) {
        let mut cur_link = std::mem::replace(&mut self.head, Link::Empty);

        while let Link::More(mut boxed_node) = cur_link {
            cur_link = std::mem::replace(&mut boxed_node.next, Link::Empty);
        }
    }
}

Summary

Optimization of enum. If there is a null pointer in enum, the null pointer optimization will eliminate the space needed for the tag.
std::mem::replace. This function could make us steal a value out of a borrow by replacing it with another value and the ownership would not change.
Recursive drop. A simple list like: list → A → B → C . When list gets dropped, it will try to drop A, which will try to drop B, which will try to drop C. So this is a recursive code, and it will blow the stack. Thus, we should drop manually by implementing Drop trait.

Learn Rust With Entirely Too Many Linked Lists(翻译)——第一章