golang中map的底层实现及扩容原理

809 阅读4分钟

go语言中的map

map是一个无序的键值对集合,由键(key)和值(value)组成。

map的底层实现

golang的map使用hash表实现,底层为一个指向hmap的指针

// A header for a Go map.
type hmap struct {
   // Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
   // Make sure this stays in sync with the compiler's definition.
   count     int // # live cells == size of map.  Must be first (used by len() builtin)
   flags     uint8
   B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
   noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
   hash0     uint32 // hash seed

   buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
   oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
   nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

   extra *mapextra // optional fields
}
  • count:表示map中键值对的数量
  • flags:表示map是否处于写入状态
  • B:桶的数量为2^B
  • noverflow:溢出桶的数量
  • hash0:随机种子
  • buckets:bucket数组的指针,数组的大小为2^B
  • oldbuckets:扩容阶段用于记录旧桶使用的溢出桶的地址,仅仅在扩容时非空
  • nevacuate:扩容阶段下一个要迁移的旧桶编号
  • extra:指向mapextra的指针,记录溢出桶相关信息

桶是指向bmap的指针,一个桶最多存储8个key;

// A bucket for a Go map.
type bmap struct {
   // tophash generally contains the top byte of the hash value
   // for each key in this bucket. If tophash[0] < minTopHash,
   // tophash[0] is a bucket evacuation state instead.
   tophash [bucketCnt]uint8
   // Followed by bucketCnt keys and then bucketCnt elems.
   // NOTE: packing all the keys together and then all the elems together makes the
   // code a bit more complicated than alternating key/elem/key/elem/... but it allows
   // us to eliminate padding which would be needed for, e.g., map[int64]int8.
   // Followed by an overflow pointer.
}
  • tophash:为一个长度为8的数组,数据存入map时,key的hash值的高八位会存储在这个数组中;tophash[0]的值如果小于minTopHash,则用来表示当前桶的状态,如下所示:
emptyRest      = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.
emptyOne       = 1 // this cell is empty
evacuatedX     = 2 // key/elem is valid.  Entry has been evacuated to first half of larger table.
evacuatedY     = 3 // same as above, but evacuated to second half of larger table.
evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
minTopHash     = 5 // minimum tophash for a normal filled cell.
  • overflow:当当前桶8个元素存满后,overflow会指向一个溢出桶bmap

image.png

注意: 为了避免key的hash值的高八位与状态值相等产生歧义,tophash函数会在key的hash值的高八位小于minTopHash的时候对他加上minTopHash

func tophash(hash uintptr) uint8 {
   top := uint8(hash >> (goarch.PtrSize*8 - 8))
   if top < minTopHash {
      top += minTopHash
   }
   return top
}

mapextra

溢出桶相关信息记录在mapextra结构体里面

type mapextra struct {
   // If both key and elem do not contain pointers and are inline, then we mark bucket
   // type as containing no pointers. This avoids scanning such maps.
   // However, bmap.overflow is a pointer. In order to keep overflow buckets
   // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
   // overflow and oldoverflow are only used if key and elem do not contain pointers.
   // overflow contains overflow buckets for hmap.buckets.
   // oldoverflow contains overflow buckets for hmap.oldbuckets.
   // The indirection allows to store a pointer to the slice in hiter.
   overflow    *[]*bmap
   oldoverflow *[]*bmap

   // nextOverflow holds a pointer to a free overflow bucket.
   nextOverflow *bmap
}
  • overflow:记录的是已使用的溢出桶地址
  • oldoverflow:扩容阶段旧桶使用的溢出桶地址
  • nextOverflow:指向下一个空闲溢出桶的地址

如图所示: image.png

map的扩容原理

渐进式扩容

golang中map扩容采用的是渐进式扩容,扩容后,map中的key不会一次性搬迁,而是每次至多搬迁2个到新的桶内,插入,更新,删除key的时候都会进行搬迁工作,最后检查oldbuckets是否为nil用以判断旧桶内的key是否搬迁完毕。

翻倍扩容

当负载因子(count数量 / buckets数量)大于6.5时,会触发翻倍扩容。通过hash值与桶数量(n-1)的与运算(&)确认旧桶元素在新桶内的位置,假设旧桶的数量为4,则扩容后新桶的数量为8。旧桶中0号桶的hash值一定为xxxxxx00,因为n-1 = 3 = 00000011;那么旧桶在新桶的位置只能是0号或者4号桶,因为新桶n-1 = 7 = 00000111;如果旧桶0号桶的元素hash值第三位为1,则00000111 & xxxxx100 = 00000100 = 4;如果旧桶0号桶的元素hash值第三位为0,则00000111 & xxxxx000 = 00000000 = 0;

等量扩容

当负载因子小于6.5,但是溢出桶的数量过多,就会触发等量扩容,创建和旧桶数量一致的新桶,然后把原来的键值对迁移到新桶中。当常规桶数目小于等于2^15,溢出桶数量大于等于2^B;当常规桶数目大于2^15,溢出桶数量大于等于2^15;这两种情况下会触发等量扩容,即: B <= 15, noverflow >= 2^Bb > 15, noverflow >= 2^15

一般发生在很多键值对被删除的情况下,这样会造成overflow的bucket数量增多,但负载因子又不高。同样数目的键值对,迁移到新桶中会把松散的键值对重新排列一次,使其排列的更加紧凑,进而保证更快的存取,这就是等量扩容的意义所在