GO原理学习之切片的实现原理

225 阅读5分钟

切片的定义与初始化

// 1.var声明
var slice1 []int
// 2.简洁声明
slice2 := []int{1, 2, 3}
// 3.从数组中获得
arr1 := [3]{1, 2, 3}
slice3 := arr[0:2] // 前包后不包
// make 初始化
slice4 := make([]int, 0)

切片(slice)原理及底层实现

切片的内部实现是通过指针指向底层数组,是对数组的一个连续片段的引用,因此切片是引用类型。

// src/runtime/slice.go 切片的底层数据结构是一个结构体
// slice共占用24字节
type slice struct {
    array unsafe.Pointer //指向底层数组的指针,占用8字节
    len   int // 切片的长度,占用8字节
    cap   int // 切片的容量,cap >= len ,占用8字节
}

Tips: 底层数组可以被多个切片同时指向,因此对一个切片的元素进行操作有可能会影响到其它切片。下面我们用代码来验证一下

package main
​
import "fmt"func main() {
  arr := [5]int{1, 2, 3, 4, 5}
​
  s1 := arr[0:3]
  s2 := arr[0:3]
  fmt.Printf("s1: %p\n", s1) // s1: 0xc00001e158
  fmt.Printf("s2: %p\n", s2) // s2: 0xc00001e158
​
  s1[0] = 100
  fmt.Println("s1: ", s1) // [100 2 3]
  fmt.Println("s2: ", s2) // [100 2 3]
​
}

切片的截取

基于已有的slice截取新的slice,也称为reslice,新的slice和老的slice共用底层数组,新老slice对底层数组的更改都会影响到对方。除非是执行了append操作使得新slice或老slice扩容了,因为一旦发生了扩容,那slice就会获得一个新的内存空间,此后二者就不再相关。下面我们用代码来验证一下

package main
​
import "fmt"func main() {
​
  sliceArr := []int{1, 2, 3, 4, 5, 7, 8, 9}
​
  s1 := sliceArr[1:7]
  s2 := s1[2:5]
​
  fmt.Println("s1: ", s1) // s1:  [2 3 4 5 7 8]
  fmt.Println("s2: ", s2) // s2:  [4 5 7]
​
  s2[1] = 100
  fmt.Println("s1: ", s1)             // s1: [2 3 4 100 7 8]
  fmt.Println("s2: ", s2)             // s2:  [4 100 7]
  fmt.Println("sliceArr: ", sliceArr) // sliceArr:  [1 2 3 4 100 7 8 9]
​
  fmt.Println(len(s2), cap(s2))                  // s2 的长度是3,容量是5
  fmt.Printf("s2_address_before_append: %p", s2) // s2_address_before_append: 0xc0000ac058
  s2 = append(s2, 200, 300, 400)                 // 追加3个元素后,扩容了,此时会分配新的内存地址s2,这时候s2的操作将不会影响sliceArr和s1
  fmt.Printf("s2_address_after_append: %p", s2)  // s2_address_after_append: 0xc0000bc000
  fmt.Println("s1: ", s1)                        // s1:  [2 3 4 100 7 8]
  fmt.Println("s2: ", s2)                        // s2:  [4 100 7 200 300 400]
  fmt.Println("sliceArr: ", sliceArr)            // sliceArr:  [1 2 3 4 100 7 8 9]
​
}

切片的扩容机制

切片的扩容一般发生在向切片追加元素时,由于容量不足,才会引发扩容,追加元素调用append函数,go源码中的实现如下:

// src/builtin/builtin.go// The append built-in function appends elements to the end of a slice. If
// it has sufficient capacity, the destination is resliced to accommodate the
// new elements. If it does not, a new underlying array will be allocated.
// Append returns the updated slice. It is therefore necessary to store the
// result of append, often in the variable holding the slice itself:
//  slice = append(slice, elem1, elem2)
//  slice = append(slice, anotherSlice...)
// As a special case, it is legal to append a string to a byte slice, like this:
//  slice = append([]byte("hello "), "world"...)
func append(slice []Type, elems ...Type) []Type

从上面的注释中也可以看出append函数会将元素最佳到slice的末尾,如果slice容量不够将会分配一个新的底层数组。append函数可以一个一个元素传入,也可以传入一个切片用...展开,特殊的例子是直接将字符串追加到byte切片是合法的。

因此,我们在处理大量数据的时候如果可以的话最好一次性分配足够大的空间,以减少内存分配和数据复制带来的开销。

那么扩容的规律到底是怎么样的呢?一般来说遵循下面两种情况:

1.当slice的容量小于1024的时候,此时的增长因子为2,扩容将按照2倍容量扩容

2.当slice的容量超过1024的时候,此时的增长因子为1.25,每次增加的容量为之前的四分之一。

下面我们来验证一下:

package main
​
import "fmt"func main() {
​
  slice1 := make([]int, 0)
  oldCap := cap(slice1)
  for i := 0; i < 2048; i++ {
    slice1 = append(slice1, i)
    newCap := cap(slice1)
    if newCap != oldCap {
      fmt.Printf("oldCap: %d ---- newCap: %d\n", oldCap, newCap)
      oldCap = newCap
    }
  }
​
}
// 打印输出的值:
//oldCap: 0 ---- newCap: 1
//oldCap: 1 ---- newCap: 2
//oldCap: 2 ---- newCap: 4
//oldCap: 4 ---- newCap: 8
//oldCap: 8 ---- newCap: 16
//oldCap: 16 ---- newCap: 32
//oldCap: 32 ---- newCap: 64
//oldCap: 64 ---- newCap: 128
//oldCap: 128 ---- newCap: 256
//oldCap: 256 ---- newCap: 512
//oldCap: 512 ---- newCap: 1024
//oldCap: 1024 ---- newCap: 1280
//oldCap: 1280 ---- newCap: 1696
//oldCap: 1696 ---- newCap: 2304
​
​

可以看到当容量小于1024的时候确实是按2倍的容量扩容的,当大于1024时,1280/1024 = 1.25此时还是按四分之一扩容的,但是当容量到1696时,1696/1280 = 1.325, 2304/1696 = 1.358,似乎容量越大它的增长因子就不是个确定的值了,为了搞清楚这个情况我们需要看一下源码中具体的扩容函数的逻辑,这时候我们可以调用 go tool compile -S main.go查看汇编代码来找到对应的扩容函数。

go tool compile -S main.go | grep CALL
// 终端输出
0x0089 00137 (main.go:13)       CALL    runtime.convT64(SB)
0x00a4 00164 (main.go:13)       CALL    runtime.convT64(SB)
0x00c0 00192 (main.go:13)       CALL    runtime.convT64(SB)
0x00e0 00224 (main.go:13)       CALL    runtime.convT64(SB)
0x01d4 00468 ($GOROOT/src/fmt/print.go:213)     CALL    fmt.Fprintf(SB)
0x0219 00537 (main.go:10)       CALL    runtime.growslice(SB) // 这个就是扩容函数
0x0255 00597 (main.go:5)        CALL    runtime.morestack_noctxt(SB)
0x0079 00121 (<autogenerated>:1)        CALL    runtime.efaceeq(SB)
0x00a0 00160 (<autogenerated>:1)        CALL    runtime.morestack_noctxt(SB)
​

从上面的汇编代码可以看出来,slice扩容的时候调用的是growslice函数,接下去我们去源码中找到这个函数:/src/runtime/slice.go

// growslice handles slice growth during append.
// It is passed the slice element type, the old slice, and the desired new minimum capacity,
// and it returns a new slice with at least that capacity, with the old data
// copied into it.
// The new slice's length is set to the old slice's length,
// NOT to the new requested capacity.
// This is for codegen convenience. The old slice's length is used immediately
// to calculate where to write new values during an append.
// TODO: When the old backend is gone, reconsider this decision.
// The SSA backend might prefer the new length or to return only ptr/cap and save stack space.
func growslice(et *_type, old slice, cap int) slice {
  if raceenabled {
    callerpc := getcallerpc()
    racereadrangepc(old.array, uintptr(old.len*int(et.size)), callerpc, abi.FuncPCABIInternal(growslice))
  }
  if msanenabled {
    msanread(old.array, uintptr(old.len*int(et.size)))
  }
  if asanenabled {
    asanread(old.array, uintptr(old.len*int(et.size)))
  }
​
  if cap < old.cap {
    panic(errorString("growslice: cap out of range"))
  }
​
  if et.size == 0 {
    // append should not create a slice with nil pointer but non-zero len.
    // We assume that append doesn't need to preserve old.array in this case.
    return slice{unsafe.Pointer(&zerobase), old.len, cap}
  }
​
  newcap := old.cap
  doublecap := newcap + newcap
  if cap > doublecap {
    newcap = cap
  } else {
    const threshold = 256
    if old.cap < threshold {
      newcap = doublecap
    } else {
      // Check 0 < newcap to detect overflow
      // and prevent an infinite loop.
      for 0 < newcap && newcap < cap {
        // Transition from growing 2x for small slices
        // to growing 1.25x for large slices. This formula
        // gives a smooth-ish transition between the two.
        newcap += (newcap + 3*threshold) / 4
      }
      // Set newcap to the requested cap when
      // the newcap calculation overflowed.
      if newcap <= 0 {
        newcap = cap
      }
    }
  }
​
  var overflow bool
  var lenmem, newlenmem, capmem uintptr
  // Specialize for common values of et.size.
  // For 1 we don't need any division/multiplication.
  // For goarch.PtrSize, compiler will optimize division/multiplication into a shift by a constant.
  // For powers of 2, use a variable shift.
  switch {
  case et.size == 1:
    lenmem = uintptr(old.len)
    newlenmem = uintptr(cap)
    capmem = roundupsize(uintptr(newcap))
    overflow = uintptr(newcap) > maxAlloc
    newcap = int(capmem)
  case et.size == goarch.PtrSize:
    lenmem = uintptr(old.len) * goarch.PtrSize
    newlenmem = uintptr(cap) * goarch.PtrSize
    capmem = roundupsize(uintptr(newcap) * goarch.PtrSize)
    overflow = uintptr(newcap) > maxAlloc/goarch.PtrSize
    newcap = int(capmem / goarch.PtrSize)
  case isPowerOfTwo(et.size):
    var shift uintptr
    if goarch.PtrSize == 8 {
      // Mask shift for better code generation.
      shift = uintptr(sys.Ctz64(uint64(et.size))) & 63
    } else {
      shift = uintptr(sys.Ctz32(uint32(et.size))) & 31
    }
    lenmem = uintptr(old.len) << shift
    newlenmem = uintptr(cap) << shift
    capmem = roundupsize(uintptr(newcap) << shift)
    overflow = uintptr(newcap) > (maxAlloc >> shift)
    newcap = int(capmem >> shift)
  default:
    lenmem = uintptr(old.len) * et.size
    newlenmem = uintptr(cap) * et.size
    capmem, overflow = math.MulUintptr(et.size, uintptr(newcap))
    capmem = roundupsize(capmem)
    newcap = int(capmem / et.size)
  }
​
  // The check of overflow in addition to capmem > maxAlloc is needed
  // to prevent an overflow which can be used to trigger a segfault
  // on 32bit architectures with this example program:
  //
  // type T [1<<27 + 1]int64
  //
  // var d T
  // var s []T
  //
  // func main() {
  //   s = append(s, d, d, d, d)
  //   print(len(s), "\n")
  // }
  if overflow || capmem > maxAlloc {
    panic(errorString("growslice: cap out of range"))
  }
​
  var p unsafe.Pointer
  if et.ptrdata == 0 {
    p = mallocgc(capmem, nil, false)
    // The append() that calls growslice is going to overwrite from old.len to cap (which will be the new length).
    // Only clear the part that will not be overwritten.
    memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem)
  } else {
    // Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
    p = mallocgc(capmem, et, true)
    if lenmem > 0 && writeBarrier.enabled {
      // Only shade the pointers in old.array since we know the destination slice p
      // only contains nil pointers because it has been cleared during alloc.
      bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(old.array), lenmem-et.size+et.ptrdata)
    }
  }
  memmove(p, old.array, lenmem)
​
  return slice{p, old.len, newcap}
}

从上面的源码中也可以看出扩容的策略确实是按照小于1024时2倍扩容或者大于1024时1.25倍去扩容的,但是在之后还去做了内存对齐的操作也就是roundupsize函数的操作。这个时候新的cap就会大于原来的既定容量,也就是上面我们测试的1.325和1.358的情况。这块涉及到知识盲区了,就不做展开了。

其他的特性

在for循环中,v只是拷贝的副本,直接做修改不会影响原slice,需要对slice做修改需要使用s[k]++这种形式。如果需要完全复制一个slice可以调用copy函数,但需要注意的是用来接收的slice必须跟原slice类型和长度保持一致。

package main
​
import "fmt"func AddOne1(s []int) {
  for _, v := range s {
    v++
  }
}
​
func AddOne2(s []int) {
  for k, _ := range s {
    s[k] ++
  }
}
​
func main() {
​
  slice1 := []int{1, 2, 3, 4, 5}
  AddOne1(slice1)
  fmt.Println(slice1) // [1 2 3 4 5]
  AddOne2(slice1)
  fmt.Println(slice1) // [2 3 4 5 6]
​
  copySlice := make([]int, 5)
  count := copy(copySlice, slice1)
  fmt.Println(count, copySlice) // 5 [2 3 4 5 6]
  AddOne2(slice1)
  fmt.Println(slice1, copySlice) // [3 4 5 6 7] [2 3 4 5 6]
  
}

几道常见面试题

make 和 new 的区别?

make通常用来初始化slice、map、channel,返回的是Type本身

new主要用来分配值类型,比如int、struct,返回的是指向Type的指针

二者都是用来分配内存

package main
​
import "fmt"func main() {
​
  s1 := make([]int, 1)
​
  s2 := new([]int)
​
  fmt.Printf("s1: %v_%p --- s2: %v_%p", s1, s1, s2, s2) s1: [0]_0xc000020098 --- s2: &[]_0xc00000c030
​
}
​

通过 go tool compile -S main.go | grep CALL命令我们来查看一下上面的汇编代码

   0x004c 00076 (main.go:19)       CALL    runtime.makeslice(SB)
   0x0066 00102 (main.go:21)       CALL    runtime.newobject(SB)
   0x0090 00144 (main.go:23)       CALL    runtime.convTslice(SB)
   0x0116 00278 ($GOROOT/src/fmt/print.go:213)     CALL    fmt.Fprintf(SB)
   0x012b 00299 (main.go:17)       CALL    runtime.morestack_noctxt(SB)
   0x0079 00121 (<autogenerated>:1)        CALL    runtime.efaceeq(SB)
   0x00a0 00160 (<autogenerated>:1)        CALL    runtime.morestack_noctxt(SB)
​

可以看到make和new实际调用的是makeslice函数和newobject函数,二者最后都会调用mallocgc进行内存分配

// src/runtime/slice.gofunc makeslice(et *_type, len, cap int) unsafe.Pointer {
  mem, overflow := math.MulUintptr(et.size, uintptr(cap))
  if overflow || mem > maxAlloc || len < 0 || len > cap {
    // NOTE: Produce a 'len out of range' error instead of a
    // 'cap out of range' error when someone does make([]T, bignumber).
    // 'cap out of range' is true too, but since the cap is only being
    // supplied implicitly, saying len is clearer.
    // See golang.org/issue/4085.
    mem, overflow := math.MulUintptr(et.size, uintptr(len))
    if overflow || mem > maxAlloc || len < 0 {
      panicmakeslicelen()
    }
    panicmakeslicecap()
  }
​
  return mallocgc(mem, et, true)
}
​
// src/runtime/malloc.go
// implementation of new builtin
// compiler (both frontend and SSA backend) knows the signature
// of this function
func newobject(typ *_type) unsafe.Pointer {
  return mallocgc(typ.size, typ, true)
}
​
​

代码题

考察切片的原理,切片扩容后会重新分配内存地址

package main
​
import "fmt"// 输出的x的值都一样,因为 y 在append元素60的时候扩容了,这个时候重新分配了一块内存空间,所以此时的y与x没有关系了func main() {
​
  x := []int64{1, 2, 3}
  y := x[:2]                // y => [1, 2]
  y = append(y, 50)         // y => [1, 2, 50]
  y = append(y, 60)         // y => [1, 2, 50, 60]
  fmt.Printf("x = %v\n", x) // x = [1, 2, 50]
  y[0] = 20 // y => [20, 2, 50, 60]
  fmt.Printf("x = %v\n", x) // x = [1, 2, 50]
​
}
​

主要考察的是数组和切片的原理,数组是值类型,切片是引用类型。

package main
​
import "fmt"func main() {
​
a := [3]int{1, 2, 3}
​
  for k, v := range a {
    if k == 0 {
      a[0], a[1] = 100, 200
      fmt.Println(a) // [100, 200, 3]
    }
    a[k] = 100 + v // 数组是值类型,赋值和传参都会复制整个数组,所以这里的v是数组的副本循环出来的,值不会改变
  }
  fmt.Println(a) // [101, 102, 103]
​
  b := []int{1, 2, 3}
  for k, v := range b {
    if k == 0 {
      b[0], b[1] = 100, 200
      fmt.Println(b) // [100, 200, 3]
    }
    b[k] = 100 + v // v是值传递的,所以k=0是v=1,而切片是引用类型的,所以k=1时,v=200,k=2时v=3
  }
  fmt.Println(b)  // [101, 300, 103]
​
}
​

以上就是我对于学习切片原理的总结,如有错误或疏漏的地方麻烦联系我改正,感谢!