go语言中的map
map是一个无序的键值对集合,由键(key)和值(value)组成。
map的底层实现
golang的map使用hash表实现,底层为一个指向hmap的指针
// A header for a Go map.
type hmap struct {
// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
extra *mapextra // optional fields
}
- count:表示map中键值对的数量
- flags:表示map是否处于写入状态
- B:桶的数量为2^B
- noverflow:溢出桶的数量
- hash0:随机种子
- buckets:bucket数组的指针,数组的大小为2^B
- oldbuckets:扩容阶段用于记录旧桶使用的溢出桶的地址,仅仅在扩容时非空
- nevacuate:扩容阶段下一个要迁移的旧桶编号
- extra:指向mapextra的指针,记录溢出桶相关信息
桶是指向bmap的指针,一个桶最多存储8个key;
// A bucket for a Go map.
type bmap struct {
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt elems.
// NOTE: packing all the keys together and then all the elems together makes the
// code a bit more complicated than alternating key/elem/key/elem/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}
- tophash:为一个长度为8的数组,数据存入map时,key的hash值的高八位会存储在这个数组中;tophash[0]的值如果小于minTopHash,则用来表示当前桶的状态,如下所示:
emptyRest = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.
emptyOne = 1 // this cell is empty
evacuatedX = 2 // key/elem is valid. Entry has been evacuated to first half of larger table.
evacuatedY = 3 // same as above, but evacuated to second half of larger table.
evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
minTopHash = 5 // minimum tophash for a normal filled cell.
- overflow:当当前桶8个元素存满后,overflow会指向一个溢出桶bmap
注意: 为了避免key的hash值的高八位与状态值相等产生歧义,tophash函数会在key的hash值的高八位小于minTopHash的时候对他加上minTopHash
func tophash(hash uintptr) uint8 {
top := uint8(hash >> (goarch.PtrSize*8 - 8))
if top < minTopHash {
top += minTopHash
}
return top
}
mapextra
溢出桶相关信息记录在mapextra结构体里面
type mapextra struct {
// If both key and elem do not contain pointers and are inline, then we mark bucket
// type as containing no pointers. This avoids scanning such maps.
// However, bmap.overflow is a pointer. In order to keep overflow buckets
// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
// overflow and oldoverflow are only used if key and elem do not contain pointers.
// overflow contains overflow buckets for hmap.buckets.
// oldoverflow contains overflow buckets for hmap.oldbuckets.
// The indirection allows to store a pointer to the slice in hiter.
overflow *[]*bmap
oldoverflow *[]*bmap
// nextOverflow holds a pointer to a free overflow bucket.
nextOverflow *bmap
}
- overflow:记录的是已使用的溢出桶地址
- oldoverflow:扩容阶段旧桶使用的溢出桶地址
- nextOverflow:指向下一个空闲溢出桶的地址
如图所示:
map的扩容原理
渐进式扩容
golang中map扩容采用的是渐进式扩容,扩容后,map中的key不会一次性搬迁,而是每次至多搬迁2个到新的桶内,插入,更新,删除key的时候都会进行搬迁工作,最后检查oldbuckets是否为nil用以判断旧桶内的key是否搬迁完毕。
翻倍扩容
当负载因子(count数量 / buckets数量)大于6.5时,会触发翻倍扩容。通过hash值与桶数量(n-1)的与运算(&)确认旧桶元素在新桶内的位置,假设旧桶的数量为4,则扩容后新桶的数量为8。旧桶中0号桶的hash值一定为xxxxxx00,因为n-1 = 3 = 00000011;那么旧桶在新桶的位置只能是0号或者4号桶,因为新桶n-1 = 7 = 00000111;如果旧桶0号桶的元素hash值第三位为1,则00000111 & xxxxx100 = 00000100 = 4;如果旧桶0号桶的元素hash值第三位为0,则00000111 & xxxxx000 = 00000000 = 0;
等量扩容
当负载因子小于6.5,但是溢出桶的数量过多,就会触发等量扩容,创建和旧桶数量一致的新桶,然后把原来的键值对迁移到新桶中。当常规桶数目小于等于2^15,溢出桶数量大于等于2^B;当常规桶数目大于2^15,溢出桶数量大于等于2^15;这两种情况下会触发等量扩容,即:
B <= 15, noverflow >= 2^B或b > 15, noverflow >= 2^15
一般发生在很多键值对被删除的情况下,这样会造成overflow的bucket数量增多,但负载因子又不高。同样数目的键值对,迁移到新桶中会把松散的键值对重新排列一次,使其排列的更加紧凑,进而保证更快的存取,这就是等量扩容的意义所在