8-25.【内存管理机制】Copy-on-Write 的底层原理是什么?

32 阅读5分钟

在 Swift 中,写时复制(Copy-on-Write, COW) 是一种极致的性能优化技术。它确保值类型(如 ArrayDictionarySet)在仅进行读取操作时共享同一块内存,只有在真正发生“写”操作时才执行昂贵的拷贝。

以下是 COW 的底层运作机制:


1. 核心结构:值类型包裹引用类型

虽然 Array 是结构体(Value Type),但它内部持有一个指向堆内存(Heap)的引用(通常是一个名为 _ContiguousArrayStorage 的类实例)。

  • 初次赋值:当你执行 let b = a 时,Swift 只是简单地拷贝了结构体本身。由于结构体里只存了一个指针,所以 ba 此时指向堆上的同一个存储地址。
  • 引用计数:此时,堆内存块的强引用计数(Strong Count)会变为 2

2. 检测机制:isKnownUniquelyReferenced

这是 COW 的“大脑”。在执行任何修改操作(如 appendsubscript 赋值)之前,Swift 运行时会调用一个内置函数:isKnownUniquelyReferenced

  • 检查原理:它会检查堆内存块的强引用计数。

  • 逻辑分流

    • 计数为 1:说明该内存块只被当前这一个变量持有。Swift 会原地修改(In-place mutation),性能极高。
    • 计数大于 1:说明有其他变量也在共享这块数据。为了不破坏其他变量的值,Swift 会执行 Deep Copy(深拷贝),分配一块新的堆内存,将数据复制过去,然后再在新内存上进行修改。

3. COW 的生命周期流转

假设我们有一个数组 A

  1. 共享阶段:执行 B = A。此时 A 和 B 共享内存。引用计数 = 2。
  2. 触发修改:执行 B.append(1)
  3. 底层判定:B 检查发现引用计数 > 1。
  4. 执行拷贝:B 申请新内存,拷贝 A 的内容,引用计数重置。现在 A 指向旧内存(计数回到 1),B 指向新内存(计数为 1)。
  5. 后续优化:如果再次执行 B.append(2),此时 B 的引用计数已经是 1,则直接原地修改,不再拷贝。

4. 为什么 COW 只针对部分值类型?

  • 内置类型:Swift 只为集合类型(Array, Dictionary, Set)和 String 默认实现了 COW。因为这些类型的数据量可能非常大,盲目拷贝会造成巨大的性能损耗。
  • 自定义结构体:普通的 struct(例如包含几个 Int 的 Point)不具备 COW 特性。赋值时会直接拷贝所有成员。因为小数据的拷贝开销通常比管理引用计数和堆分配更低。

5. 如何为自定义类型实现 COW?

如果你定义了一个包含海量数据的 struct,可以手动实现 COW 来提升性能:

Swift

struct LargeData {
    private class Storage { // 引用类型,用于存放实际数据
        var data: [Int] = Array(0...1000)
    }
    
    private var _storage = Storage()
    
    var data: [Int] {
        get { _storage.data }
        set {
            // 核心步骤:检查是否唯一引用
            if !isKnownUniquelyReferenced(&_storage) {
                _storage = Storage() // 否则,创建新拷贝
                _storage.data = newValue
            } else {
                _storage.data = newValue // 唯一引用,原地修改
            }
        }
    }
}

总结

COW 是**延迟加载(Lazy Loading)**思想在内存管理中的应用。它通过 引用计数检测 + 堆内存共享,完美解决了值类型在传递时“拷贝开销大”与“值语义安全性”之间的矛盾。

8-25. [Memory Management] What is the underlying principle of Copy-on-Write (COW)?

In Swift, Copy-on-Write (COW) is a sophisticated performance optimization technique. It ensures that value types (such as Array, Dictionary, and Set) share the same memory block when only read operations are performed, executing an expensive copy only when a "write" operation actually occurs.

Here is the underlying operational mechanism of COW:


1. Core Structure: Value Types Wrapping Reference Types

Although an Array is a struct (Value Type), it internally holds a reference to heap memory (usually an instance of a class like _ContiguousArrayStorage).

  • Initial Assignment: When you execute let b = a, Swift simply copies the structure itself. Since the structure only contains a pointer, both b and a now point to the same storage address on the heap.
  • Reference Counting: At this stage, the strong reference count of the heap memory block becomes 2.

2. Detection Mechanism: isKnownUniquelyReferenced

This is the "brain" of COW. Before performing any modification (like append or a subscript assignment), the Swift runtime calls a built-in function: isKnownUniquelyReferenced.

  • Mechanism: It checks the strong reference count of the heap memory block.

  • Logical Branching:

    • Count is 1: This indicates the memory block is held only by the current variable. Swift performs an in-place mutation, which is highly efficient.
    • Count is > 1: This indicates other variables are sharing this data. To avoid mutating the values of other variables, Swift executes a Deep Copy, allocating a new block of heap memory, copying the data over, and then modifying the new memory.

3. The COW Lifecycle Flow

Suppose we have an array A:

  1. Sharing Phase: Execute B = A. A and B share memory. Reference Count = 2.
  2. Triggering Modification: Execute B.append(1).
  3. Underlying Decision: B checks and finds Reference Count > 1.
  4. Executing Copy: B requests new memory, copies the contents of A, and resets its reference pointer. Now A points to the old memory (count returns to 1), and B points to the new memory (count is 1).
  5. Subsequent Optimization: If B.append(2) is executed again, B's reference count is already 1, so it modifies the memory in-place without further copying.

4. Why is COW only for specific value types?

  • Built-in Types: Swift implements COW by default only for collection types (Array, Dictionary, Set) and String. These types can hold massive amounts of data, where blind copying would cause significant performance degradation.
  • Custom Structs: Standard structs (e.g., a Point containing a few Ints) do not have COW characteristics. They copy all members immediately upon assignment because the overhead of copying small data is usually lower than managing reference counts and heap allocations.

5. How to implement COW for custom types?

If you define a struct that contains a vast amount of data, you can manually implement COW to improve performance:

Swift

struct LargeData {
    private class Storage { // Reference type to hold the actual data
        var data: [Int] = Array(0...1000)
    }
    
    private var _storage = Storage()
    
    var data: [Int] {
        get { _storage.data }
        set {
            // Core Step: Check if it is a unique reference
            if !isKnownUniquelyReferenced(&_storage) {
                _storage = Storage() // If not, create a new copy
                _storage.data = newValue
            } else {
                _storage.data = newValue // Unique reference, mutate in-place
            }
        }
    }
}

Summary

COW is the application of the Lazy Loading philosophy to memory management. By combining reference count detection with heap memory sharing, it perfectly resolves the conflict between the "high cost of copying" and "value semantics safety" when passing value types.