slice

slice中文名为切片，它是一个可以扩容的能够容纳不同类型的动态数组。如果你和我一样从C++转过来的话，基本可以把它理解为C++中的vector。

首先slice在GO语言中是引用类型，就是说你把它作为参数传递到函数中，会直接修改它，而不是它的副本。根据我写C++的经验，它的内部大概率是通过指针来访问数组的，所以即使参数是拷贝的形式也可以直接修改它的本体。我们做一个测试：

func accumulation(nums []int) {
    sum := 0
    for index, _ := range nums {
       nums[index] += sum
       sum += nums[index]
    }
}

func main() {
    var nums = []int{1, 2, 3}
    accumulation(nums[:])
    for _, num := range nums {
       fmt.Printf("%d ", num) //输出 1, 3, 7
    }
}

可以看到，slice中的元素被修改，证明我们对于实参的修改作用到了它本身。为了证实我的猜想，我们一起看一下slice的源码：

type slice struct {
	array unsafe.Pointer
	len   int        //实际元素的数量
	cap   int        //slice的可容纳元素数量
}

可以看到slice内部包含一个指针和一个长度以及容量，它的实现基本和C++中的vector差不多，只不过C++使用三个指针来构成一个动态数组。那么slice又是如何实现动态扩容的呢，它的扩容策略又是怎样的呢？我们知道，如果每次只给slice分配必要的内存，内存分配这个动作就会多次执行；但如果一次性给slice分配太多的内存，又会引起内存空间的浪费。如何在这二者之间实现一个平衡是衡量扩容策略的一个重要因素。

func growslice(oldPtr unsafe.Pointer, newLen, oldCap, num int, et *_type) slice {
//前面的一些判断我都省略了源码，我们直接看扩容策略最核心的部分
	newcap := oldCap
	doublecap := newcap + newcap
        //先尝试直接对容量翻倍，如果翻倍之后仍小于长度，则直接把长度作为容量
        //这里的长度就是元素实际数量
	if newLen > doublecap {
		newcap = newLen
	} else {
		const threshold = 256
          //对于扩容后容量仍然小于256的则直接按照翻倍的策略
		if oldCap < threshold {
			newcap = doublecap
		} else {
                  //当新容量大于0且小于新长度则一直循环扩容
                  //扩容量为旧有的容量加上3倍的阈值除4
			for 0 < newcap && newcap < newLen {
				// Transition from growing 2x for small slices
				// to growing 1.25x for large slices. This formula
				// gives a smooth-ish transition between the two.
				newcap += (newcap + 3*threshold) / 4
			}
			// Set newcap to the requested cap when
			// the newcap calculation overflowed.
			if newcap <= 0 {
				newcap = newLen
			}
		}
	}

也就是说：对于小容量则加倍扩容，而大容量则采取一个平滑过渡，当容量比较大的时候，此时向1.25倍扩容靠拢

slice使用中的坑

下面代码输出是什么呢？

package main

import "fmt"

func main() {
    a := make([]int, 4)
    a = append(a, 1)
    fmt.Println(a)
}

需要注意的是，这里会输出：0 0 0 0 4，这是因为当make只接收了一个参数，它会默认把slice的len, cap都设置为相同的值。

slice支持切片操作，并且这种操作是浅拷贝，它会带来一些奇怪的问题，示例如下：

package main

import "fmt"

func log(s []int) {
    fmt.Printf("len(s) = %d, cap(s) = %d\n", len(s), cap(s))
}

func main() {
    s1 := make([]int, 3, 4)
    s1[0] = 10
    s1[1] = 20
    s1[2] = 30
    s2 := s1[1:]
    log(s1)
    log(s2)
    fmt.Println(&s1[1] == &s2[0])
    s2 = append(s2, 40, 50)
    log(s1)
    log(s2)
    fmt.Println(&s1[1] == &s2[0])
}

我们分析一下上面的结果。首先：切片操作是一个浅拷贝，它的底层是共享同一个数组的，只不过不同的slice有着不同的len、cap字段；其次当我们使用append给切片添加元素的时候，当长度大于容量造成切片扩容的时候，此时底层数据结构就会发生拷贝，此时的s1和s2是完全独立的两个切片

slice的各种初始化方式

package main

import "fmt"

func main() {
    var s []string
    log(1, s)         // empty = true, nil = true

    s = []string(nil) // empty = treu, nil = true
    log(2, s)

    s = []string{}    // empty = true, nil = false
    log(3, s)

    s = make([]string, 0)  // empty = true, nil = false
    log(4, s)
}

func log(i int, s []string) {
    fmt.Printf("%d: empty=%t\tnil=%t\n", i, len(s) == 0, s == nil)
}

从上面的代码结果中可以看出：前两种初始化方式没有内存分配，整个slice都是空的；而后面两种初始化方式会给slice分配内存，并将相应的字段初始化为0.

struct

我们知道对于形似下列格式的结构体，有如下几种初始化方式：

type Point struct {
    X int
    Y int
}

// 按照顺序初始化，不利于后期修改和增添代码，不建议！！！
p1 := Point{1, 2}
// 使用字面值对结构体成员初始化
p2 := Point{X: 3, Y: 4}
// 使用点运算符访问成员进行初始化
var p3 Point
p3.X = 3
p3.Y = 4

但使用点运算符进行初始化的时候，对于嵌套的结构体不友好，需要多写一个中间变量，这时候我们可以使用匿名结构体。

type Point struct {
    X int
    Y int
}

// 匿名结构体只需要写出变量类型，不需要命名
type Circle struct {
    Point
    Radius int
}

func main() {
    var c Circle
    c.X = 1
    c.Y = 2
    c.Radius = 3
    fmt.Println(c)
}

但尴尬的是，这种语法糖仅仅对使用点运算符访问有效。当我们使用字面值对结构体成员指定初始化的时候，是不可以的。结构体字面值必须遵循形状类型声明时的结构。

// 下面两种初始化的方式都是错误的，无法通过编译
var c1 = Circle{1, 2, 3}
var c2 = Circle{X : 1, Y : 2, Radius: 3}
// 必须按照类型声明时的结构去初始化
// 按照顺序初始化
var c1 = Circle{Point{1, 2}, 5}
// 按照字段去初始化，这里其实是一个语法糖，因为Circle中实际是没有名为Point的字段，它是一个匿名变量
var c2 = Circle{
    Point: Point{
       X: 2,
       Y: 3,
    },
    Radius: 5,
}

map

map是一种查找时间复杂度为O(1)的数据结构。主要实现方式是哈希表。在C++中，map由红黑树实现，平均查找时间复杂度为logN。它的查找速度没有那么快，但是好处是数据是有序的，在某些情况下更合适；unordered_map则是由哈希表实现，并且通过拉链法解决冲突，默认负载因子是1.0。go语言中的map也是使用拉链法解决冲突，那么拉链法相比开放地址法/再散列法的优势在哪呢？

拉链法解决冲突简单，且无堆积现象，非同义词之间不会发生冲突，平均查找长度较短
拉链法中链表的结点是动态申请的，更适合于大部分场景下我们无法提前确定表长的情况
开放定址法/再散列法数据都是储存在数组中，为了减少冲突，要求负载因子较小，容易浪费空间

基于以上，我们来看一下go语言中的map是如何实现的。

map结构

整体结构图如下：

hmap

可以看到，map其实内部就是一个hmap结构体：

// A header for a Go map.
type hmap struct {
	// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
        // 元素的数量
        count     int // # live cells == size of map.  Must be first (used by len() builtin)
        // 标志位
        flags     uint8
        // 哈希桶的数量：2^B
        B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
        // 溢出桶的数量
        noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
        // 哈希随机数种子
        hash0     uint32 // hash seed
        // 哈希桶
        buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
        // 旧哈希桶，当扩容的时候会逐步对数据进行迁移
        oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
        // 记录扩容进度，序号小于该值的桶中的数据已经迁移到新桶中
        nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)
        //可选字段，下面会详细介绍
        extra *mapextra // optional fields
}

bmap

比较重要的数据结构都已经在注释里写出作用了，具体的bmap是通过编译时推导出来的，因为map中的键/值类型都不是固定的。

// MapBucketType makes the map bucket type given the type of the map.
func MapBucketType(t *types.Type) *types.Type {
	if t.MapType().Bucket != nil {
		return t.MapType().Bucket
	}

	keytype := t.Key()
	elemtype := t.Elem()
	types.CalcSize(keytype)
	types.CalcSize(elemtype)
        //如果键值的大小大于128字节，则使用指针保存
	if keytype.Size() > MAXKEYSIZE {
		keytype = types.NewPtr(keytype)
	}
	if elemtype.Size() > MAXELEMSIZE {
		elemtype = types.NewPtr(elemtype)
	}

	field := make([]*types.Field, 0, 5)

	// The first field is: uint8 topbits[BUCKETSIZE].
	arr := types.NewArray(types.Types[types.TUINT8], BUCKETSIZE)
	field = append(field, makefield("topbits", arr))

	arr = types.NewArray(keytype, BUCKETSIZE)
	arr.SetNoalg(true)
	keys := makefield("keys", arr)
	field = append(field, keys)

	arr = types.NewArray(elemtype, BUCKETSIZE)
	arr.SetNoalg(true)
	elems := makefield("elems", arr)
	field = append(field, elems)

	// If keys and elems have no pointers, the map implementation
	// can keep a list of overflow pointers on the side so that
	// buckets can be marked as having no pointers.
	// Arrange for the bucket to have no pointers by changing
	// the type of the overflow field to uintptr in this case.
	// See comment on hmap.overflow in runtime/map.go.
  // 如果key/value为指针类型，则overflow类型为unsafe.Pointer
  // 如果key/value不含指针，则overflow类型为uintptr
  // 区别：uintptr类型不会被gc认为是引用，不会被gc扫描
	otyp := types.Types[types.TUNSAFEPTR]
	if !elemtype.HasPointers() && !keytype.HasPointers() {
		otyp = types.Types[types.TUINTPTR]
	}
	overflow := makefield("overflow", otyp)
	field = append(field, overflow)

	// link up fields
	bucket := types.NewStruct(field[:])
	bucket.SetNoalg(true)
	types.CalcSize(bucket)

	// Check invariants that map code depends on.
	if !types.IsComparable(t.Key()) {
		base.Fatalf("unsupported map key type for %v", t)
	}
	if BUCKETSIZE < 8 {
		base.Fatalf("bucket size %d too small for proper alignment %d", BUCKETSIZE, 8)
	}
	if uint8(keytype.Alignment()) > BUCKETSIZE {
		base.Fatalf("key align too big for %v", t)
	}
	if uint8(elemtype.Alignment()) > BUCKETSIZE {
		base.Fatalf("elem align %d too big for %v, BUCKETSIZE=%d", elemtype.Alignment(), t, BUCKETSIZE)
	}
	if keytype.Size() > MAXKEYSIZE {
		base.Fatalf("key size too large for %v", t)
	}
	if elemtype.Size() > MAXELEMSIZE {
		base.Fatalf("elem size too large for %v", t)
	}
	if t.Key().Size() > MAXKEYSIZE && !keytype.IsPtr() {
		base.Fatalf("key indirect incorrect for %v", t)
	}
	if t.Elem().Size() > MAXELEMSIZE && !elemtype.IsPtr() {
		base.Fatalf("elem indirect incorrect for %v", t)
	}
	if keytype.Size()%keytype.Alignment() != 0 {
		base.Fatalf("key size not a multiple of key align for %v", t)
	}
	if elemtype.Size()%elemtype.Alignment() != 0 {
		base.Fatalf("elem size not a multiple of elem align for %v", t)
	}
	if uint8(bucket.Alignment())%uint8(keytype.Alignment()) != 0 {
		base.Fatalf("bucket align not multiple of key align %v", t)
	}
	if uint8(bucket.Alignment())%uint8(elemtype.Alignment()) != 0 {
		base.Fatalf("bucket align not multiple of elem align %v", t)
	}
	if keys.Offset%keytype.Alignment() != 0 {
		base.Fatalf("bad alignment of keys in bmap for %v", t)
	}
	if elems.Offset%elemtype.Alignment() != 0 {
		base.Fatalf("bad alignment of elems in bmap for %v", t)
	}

	// Double-check that overflow field is final memory in struct,
	// with no padding at end.
	if overflow.Offset != bucket.Size()-int64(types.PtrSize) {
		base.Fatalf("bad offset of overflow in bmap for %v, overflow.Offset=%d, bucket.Size()-int64(types.PtrSize)=%d",
			t, overflow.Offset, bucket.Size()-int64(types.PtrSize))
	}

	t.MapType().Bucket = bucket

	bucket.StructType().Map = t
	return bucket
}

经过推导后的bmap结构如下：

bmap struct {
    tophash [8]uint8
    keys   [8]keyType
    elems  [8]elemType
    overflow *bucket
}

tophash用来保存哈希值的高八位
每个bmap桶中保存八个键值对
保存的键值对，键/值是分开连续存放的，方便内存对齐
overflow指向溢出桶，且根据键/值中是否为指针类型，它的类型为uintptr/unsafe.Pointer

mapextra

上面我们说到bmap中的overflow字段会根据键/值是否是指针类型来决定它自己的类型。当键值非指针类型，overflow是uintptr类型，gc不会把它记作引用，避免了多余的扫描操作。但这样这段内存有可能会被gc给回收。所以指向溢出桶的指针会统一保存在mapextr结构体中，我们详细看一下它的结构：

// mapextra holds fields that are not present on all maps.
type mapextra struct {
	// If both key and elem do not contain pointers and are inline, then we mark bucket
	// type as containing no pointers. This avoids scanning such maps.
	// However, bmap.overflow is a pointer. In order to keep overflow buckets
	// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
	// overflow and oldoverflow are only used if key and elem do not contain pointers.
	// overflow contains overflow buckets for hmap.buckets.
	// oldoverflow contains overflow buckets for hmap.oldbuckets.
	// The indirection allows to store a pointer to the slice in hiter.
  // 本身是一个指针，指向一个保存bmap类型指针的slice
  // 分别指向所有的溢出桶/旧的溢出桶
	overflow    *[]*bmap
	oldoverflow *[]*bmap
  // nextOverflow holds a pointer to a free overflow bucket.
  // 指向所有预分配的溢出桶
	nextOverflow *bmap
}

map的创建

map的创建有以下三个函数：

makemap_small() *hmap
makemap64() *hmap
makemap() *hmap

当创建的是一个小对象，经过查阅源码和我的实际调试：当hint < 64的时候就会调用makemap_small()，实际存储数据的哈希桶会之后使用到的时候再申请内存：

// makemap_small implements Go map creation for make(map[k]v) and
// make(map[k]v, hint) when hint is known to be at most bucketCnt
// at compile time and the map needs to be allocated on the heap.
func makemap_small() *hmap {
  // 只是给hmap分配了内存，并没有申请bucket的步骤
	h := new(hmap)
	h.hash0 = fastrand()
	return h
}

而makemap64实际上是调用了makemap()完成内存分配：

func makemap64(t *maptype, hint int64, h *hmap) *hmap {
	if int64(int(hint)) != hint {
		hint = 0
	}
	return makemap(t, int(hint), h)
}

我们详细看一下makemap()的实现：

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
	mem, overflow := math.MulUintptr(uintptr(hint), t.Bucket.Size_)
	if overflow || mem > maxAlloc {
		hint = 0
	}

	// initialize Hmap
	// 给hmap申请内存
	if h == nil {
		h = new(hmap)
	}
	// 初始化哈希种子
	h.hash0 = fastrand()

	// Find the size parameter B which will hold the requested # of elements.
	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
	B := uint8(0)
  // 根据传入的参数B和hint决定实际的桶的数量
	for overLoadFactor(hint, B) {
		B++
	}
	h.B = B

	// allocate initial hash table
	// if B == 0, the buckets field is allocated lazily later (in mapassign)
	// If hint is large zeroing this memory could take a while.
	if h.B != 0 {
		var nextOverflow *bmap
		// 分配正常的桶和溢出桶
		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
          // 可以看到溢出桶保存在hmap中的extra字段
		if nextOverflow != nil {
			h.extra = new(mapextra)
			h.extra.nextOverflow = nextOverflow
		}
	}

	return h
}

扩容

golang 将扩容分成两种:

等量扩容
增量扩容

负载因子的计算和桶的数量(即 B)有关，因此如果负载因子过大，则说明B过小，此时需要将 B 加 1，对应的桶数变成了原来的 2 倍，这就是增量扩容；

否则，可能是溢出桶太多了，实际数据并没有那么多，这样会影响查找效率。比如连续插入数据后删除，导致溢出桶很多。这种情况下只需要将松散的数据集中起来即可，桶数量保持不变，这就是等量扩容。并且扩容过程和redis类似，它采用了渐进式扩容。把数据的搬迁分散到每次对数据的访问过程中。

如何对runtime中的函数进行debug

使用go build -gcflags -S hello.go查看汇编源码，从中查看调用了哪个函数
使用dlv进行debug

dlv debug hello.go

// 从main包入口打短点
b main.main

// 在汇编源码中找到的被调用函数中打断点
b /usr/local/go/src/runtime/map.go:306

// 常用指令
r  //重新执行
n  //单步执行
print xx  //打印变量的值

示例如下：

make和new的区别

GO语言中的new有两个作用：

分配内存
将变量初始化

然后返回对应类型的指针。与C++中不同的是，它没有构造函数的概念，并不会像C++那样在分配内存后，调用构造函数对类进行初始化。

而make也是用于内存分配，但它和new不同，它只用于

chan
map
slice

从上面的解析中可以知道，这三种类型本身就是引用类型，所以它返回的就是这三个类型本身，而不是指针了（因为没有必要多套一层）。它的内部包含指针指向它实际保存数据的地方。而make函数除了接收类型参数以外，还有一个可变长度的整型参数。就拿slice举例吧，它内部包含一个指向实际保存数据的指针和len、cap参数。我们可以通过给make参数传递这两个值去指定它的长度和容量。举例如下：

// 对于普通的整型变量，先使用new分配内存，再赋值，相当于a := 1
a := new(int)
*a = 1
fmt.Println(*a)
//对于切片这种引用类型，new只是给它的保存数据的指针和两个字段分配了内存并初始化
b := new([]int)
fmt.Printf("len(*b) = %d, cap(*b) = %d\n", len(*b), cap(*b))
*b = append(*b, 77)
fmt.Printf("len(*b) = %d, cap(*b) = %d\n", len(*b), cap(*b))
// 而make可以同时根据cap字段去申请相应的内存，有点类似于浅拷贝和深拷贝的区别
c := make([]int, 0)
fmt.Printf("len(c) = %d, cap(c) = %d\n", len(c), cap(c))
c = append(c, 1)
fmt.Printf("len(c) = %d, cap(c) = %d\n", len(c), cap(c))

实际内存分配可以参照下图：（不太准确，实际上new([]int)几乎和make([]int, 0)等价）

个人拙见，之所要创造两个关键字。主要还是因为GO语言中没有构造函数的概念，我们无法控制new在申请内存之后相应的构造行为。

go语言中基本数据结构的实现