阅读 39

golang map源码剖析

具体源码在 runtime/map.go下面,有1300+行代码

golang 的map底层使用哈希表实现的,源码中对应是一个结构体,具体如下

// A header for a Go map.
type hmap struct {
	// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
	count     int // # live cells == size of map.  Must be first (used by len() builtin)  map中元素的个数
	flags     uint8  //标识当前map是在读或者写的状态,每次操作会更新这个值
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)  桶的个数相关参数,2^B 是桶的个数
	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details  溢出桶的个数
	hash0     uint32 // hash seed   hash的种子,用方法生成的随机值,每个map都不一样,所以同一个key在不同map位置也不一样
	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.   桶数组的地址指针,指向桶连续存储地址的初始位置
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing  扩容时旧的桶的数组指针
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)  扩容搬迁进度

	extra *mapextra // optional fields  记录溢出相关信息
}
复制代码

每个map里面有若干桶,hmap里面只记录了桶的地址,找到具体桶内元素需要知道桶的结构,桶在源码中用bmap结构体表示,具体如下

// A bucket for a Go map.
type bmap struct {
    tophash [bucketCnt]uint8 # bucketCnt为8 存储哈希值的高8位
    data    byte[1]  //key value数据:key/key/key/.../value/value/value...
    overflow *bmap   //溢出bucket的地址
}
复制代码

根据哈希函数将key生成一个哈希值,其中低位哈希用来判断桶位置,高位哈希用来确定在桶中哪个cell。低位哈希就是哈希值的低B位,hmap结构体中的B,比如B为5,2^5=32,即该map有32个桶,只需要取哈希值的低5位就可以确定当前key-value落在哪个桶(bucket)中;高位哈希即tophash,是指哈希值的高8bits,根据tophash来确定key在桶中的位置。每个桶可以存储8对key-value,存储结构不是key/value/key/value...,而是key/key..value/value,这样可以避免字节对齐时的padding,节省内存空间。

当不同的key根据哈希得到的tophash和低位hash都一样,发生哈希碰撞,这个时候就体现overflow pointer字段的作用了。桶溢出时,就需要把key-value对存储在overflow bucket(溢出桶),overflow pointer就是指向overflow bucket的指针。如果overflow bucket也溢出了呢?那就再给overflow bucket新建一个overflow bucket,用指针串起来就形成了链式结构

初始化map

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
	mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
	if overflow || mem > maxAlloc {
		hint = 0
	}

	// initialize Hmap
	if h == nil {
		h = new(hmap)
	}
	h.hash0 = fastrand()//赋随机值

	// Find the size parameter B which will hold the requested # of elements.
	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
	B := uint8(0)
	for overLoadFactor(hint, B) {
		B++
	}
	h.B = B//个数赋值

	// allocate initial hash table
	// if B == 0, the buckets field is allocated lazily later (in mapassign)
	// If hint is large zeroing this memory could take a while.
	if h.B != 0 {
		var nextOverflow *bmap
		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
		if nextOverflow != nil {
			h.extra = new(mapextra)
			h.extra.nextOverflow = nextOverflow
		}
	}

	return h
}
复制代码

查找map中的元素

// mapaccess1 returns a pointer to h[key].  Never returns nil, instead
// it will return a reference to the zero object for the elem type if
// the key is not in the map.
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
	if raceenabled && h != nil {
		callerpc := getcallerpc()
		pc := funcPC(mapaccess1)
		racereadpc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	}
	if msanenabled && h != nil {
		msanread(key, t.key.size)
	}
	if h == nil || h.count == 0 {
		if t.hashMightPanic() {
			t.hasher(key, 0) // see issue 23734
		}
		return unsafe.Pointer(&zeroVal[0])
	}
	if h.flags&hashWriting != 0 {//若干目前标记是写,报panic不能并发读写
		throw("concurrent map read and map write")
	}
	hash := t.hasher(key, uintptr(h.hash0))//计算key的哈希值
	m := bucketMask(h.B)//计算桶链首地址
	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
	if c := h.oldbuckets; c != nil {
		if !h.sameSizeGrow() {
			// There used to be half as many buckets; mask down one more power of two.
			m >>= 1
		}
		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
		if !evacuated(oldb) {
			b = oldb
		}
	}
	top := tophash(hash)// 计算tophash值
bucketloop:
	for ; b != nil; b = b.overflow(t) {//遍历每一个桶
		for i := uintptr(0); i < bucketCnt; i++ {//根据高8位找在桶内的元素
			if b.tophash[i] != top {//tophash有8个元素,存储的key高8位信息
				if b.tophash[i] == emptyRest {
					break bucketloop
				}
				continue
			}
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() {
				k = *((*unsafe.Pointer)(k))
			}
			if t.key.equal(key, k) {//最后的判断key相等
				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				if t.indirectelem() {
					e = *((*unsafe.Pointer)(e))
				}
				return e//找到直接返回
			}
		}
	}
	return unsafe.Pointer(&zeroVal[0])//没找到,返回对应类型的零值
}

复制代码

map的查找有两种用法,底层方法都是用的上面的mapaccess1,在mapaccess1返回参数的基础上在加一些判断,返回参数是一个,如下

m := make(map[string]int)

print(m["1"]) -- 结果为: 0 (返回int对应的默认值)
复制代码

对应的源码,很简单,直接调用mapaccess1然后返回

func mapaccess1_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) unsafe.Pointer {
	e := mapaccess1(t, h, key)
	if e == unsafe.Pointer(&zeroVal[0]) {
		return zero
	}
	return e
}
复制代码

还有一种返回参数是两个,第二个参数标识map是否存在这个key,不存在,第一个参数就是默认零值

m := make(map[string]int)

val, ok := m["1"]
print(val, ok)  -- 结果为:0, false
复制代码

对应的源码,相比第一种就是判断如果是默认零值就多返回一个参数false,否则返回true

func mapaccess2_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) (unsafe.Pointer, bool) {
	e := mapaccess1(t, h, key)
	if e == unsafe.Pointer(&zeroVal[0]) {
		return zero, false
	}
	return e, true
}
复制代码

上面两种用法的源码mapaccess1_fat,和mapaccess2_fat是我猜测的,看源码感觉应该是这样的,没有验证过

map还有一种用发,使用for range遍历map元素,这里有一个考点是map乱序,1.map底层散列表在扩容场景下会重新散列,遍历顺序有变化 2.go设计者为了不让用户不依赖,每次遍历的桶的起点是不一样的

map 扩容策略
负载因子 = 键数量/bucket数量
哈希因子过小,说明空间利用率低
哈希因子过大,说明冲突严重,存取效率低

参考 www.bilibili.com/read/cv5588… blog.csdn.net/fengshenyun…

文章分类
后端
文章标签