golang map源码解析

221 阅读2分钟

map 源码解析

基本结构

根据hint创建数组,数组的元素为桶(基本格式如下图)。如果B<=4,则数组大小为2B2^B。否则数组大小为2B+2B42^B +2^{B-4},多出来的部分存储的是通过链表的方式解决冲突部分的数据。

image.png

扩容

扩容条件

元素数量较多

// Maximum average load of a bucket that triggers growth is 6.5.
// Represent as loadFactorNum/loadFactorDen, to allow integer math.
loadFactorNum = 13
loadFactorDen = 2
func overLoadFactor(count int, B uint8) bool {
	return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

如果平均每个桶的数量大于6.5个元素,则说明元素数量较多,需要扩容以减少冲突。这个时候是增量扩容,数组翻倍。

溢出桶过多

// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
// Note that most of these overflow buckets must be in sparse use;
// if use was dense, then we'd have already triggered regular map growth.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
	// If the threshold is too low, we do extraneous work.
	// If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
	// "too many" means (approximately) as many overflow buckets as regular buckets.
	// See incrnoverflow for more details.
	if B > 15 {
		B = 15
	}
	// The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
	return noverflow >= uint16(1)<<(B&15)
}

溢出桶的个数约等于数组大小的的时候,说明了链表中桶的数量过大(如果map中元素过多,则会先进行增量扩容),这个时候需要进行等量扩容。

hmap中的noverflow表示溢出桶的数量,类型是uint16(之所以用uin16是希望hmap保持较小)。所以在B大于16的时候,无法表示溢出桶的数量。这个时候就粗略的表示溢出桶的数量了。

// incrnoverflow increments h.noverflow.
// noverflow counts the number of overflow buckets.
// This is used to trigger same-size map growth.
// See also tooManyOverflowBuckets.
// To keep hmap small, noverflow is a uint16.
// When there are few buckets, noverflow is an exact count.
// When there are many buckets, noverflow is an approximate count.
func (h *hmap) incrnoverflow() {
	// We trigger same-size map growth if there are
	// as many overflow buckets as buckets.
	// We need to be able to count to 1<<h.B.
	if h.B < 16 {
		h.noverflow++
		return
	}
	// Increment with probability 1/(1<<(h.B-15)).
	// When we reach 1<<15 - 1, we will have approximately
	// as many overflow buckets as buckets.
	mask := uint32(1)<<(h.B-15) - 1
	// Example: if h.B == 18, then mask == 7,
	// and fastrand & 7 == 0 with probability 1/8.
	if fastrand()&mask == 0 {
		h.noverflow++
	}
}

由于基础数组大小是2B2^B, 并且uint16中最大的2的幂是2152^{15}。如果noverflow每次+1的可能性是2B152^{B-15},那么当noverflow增加到2152^{15}的时候,就相当于增加到了2B2^B

扩容方式

渐进式扩容

func growWork(t *maptype, h *hmap, bucket uintptr) {
	// make sure we evacuate the oldbucket corresponding
	// to the bucket we're about to use
	evacuate(t, h, bucket&h.oldbucketmask())

	// evacuate one more oldbucket to make progress on growing
	if h.growing() {
		evacuate(t, h, h.nevacuate)
	}
}
  1. 每次assign或者delete的时候,迁移此时key对应的通以及overflow的所有桶。

  2. 从第一个桶开始迁移,每次迁移一个桶以及overflow的所有桶,通过参数hmap.nevacuate表示。当hmap.nevacuate到达老数组边界的时候,表示迁移已经完成。

bucket迁移的位置

对于增量扩容的场景,每个元素都是可能迁移到新数组中前半部分或者后半部分。可以通过计算hash&2B1!=0hash \& 2^{B-1} != 0,如果为true就放到新数组的后半部分,否则放到数组的前半部分。简单点说就是根据hash中后B位的最高位决定迁移的位置,如果最高位为1说明在数组的后半部分,否在在数组的前半部分。