Golang源码5-内存管理

125 阅读12分钟

一、内存模型及内存分配

1.1 存储模型

内存: 一个泛化概念,包含RAM(随机存取存储器)、ROM(只读存储器)、cache(高速缓存)

主存:RAM,即内存条

1.2 虚拟内存

计算机系统内存管理的一种技术。它使得应用程序认为它拥有连续可用的内存,而实际上物理内存通常被分割成多个内存碎片,还有部分暂时存储在外部磁盘存储器上,在需要时进行数据交换。 ---- 维基百科

页大小4KB,实践经验所得, 分页管理的优点:

  • 内存利用率高;将内存分为了固定大小的块,不需要连续的内存空间;

  • 优化了碎片问题;没有外碎片,因为页的大小固定,但会有内碎片,即一个页可能填充不满;

  • 方便内存共享;

  • 方便内存动态分配;

1.3 TcMalloc (Thread-Caching Malloc)

TcMalloc 是一个高效的内存分配器,它是 Google 开发的一个开源库。它的主要目标是提供一个快速、可扩展和高效的内存分配器,以满足高性能应用程序的需求,具有以下特征:

  1. 大多数对象的快速、无竞争的分配和释放。

  2. 灵活使用内存,释放的内存可以重新用于不同大小的对象或返还给操作系统。

  3. 通过分配相同大小的对象页,降低每个对象的内存开销。

  4. 可以通过采样了解应用程序的内存使用情况。

image

TcMalloc分为3个部分,前、中、后。各自的职责如下。

1.3.1 Front前端

前端处理特定大小(通常小于256KB)的内存请求。前端有内存缓存,该缓存一次只能由一个线程访问,因此不需要任何的锁,从而大多数对象的分配和释放都很快;如果前端没有足够大小的缓存内存,前端会向中端请求一批内存来填充缓存;如果中端耗尽,或者请求的大小大于前端处理的最大内存大小,则请求到后端以满足大内存的分配。

Per-Thread cache: 空闲对象按照线程进行分配。

每个线程有自己的空闲对象池,因此小对象的分配和释放不需要加锁竞争。每个大小的size-class的空闲对象都以链表的形式维护,如果Per没有足够大小的内存,Per会向Middle请求内存来填充。

可能存在问题?

内存随线程数增加而增加

Per-CPU cache: 空闲对象按照CPU进行分配。

每个Cpu核预先分配一部分连续内存,由连续的header和连续的object 组成,header内会记录三个指针位置。

Begin: 该size-class 在此段内存的开始位置偏移量

End: 该size-class 在此段内存的结束位置偏移量

Cur: 该size-class 在此段内存的空闲对象指针的结束位置,[cur+1, end]: 为还未被利用的空间

如何避免线程竞争?

Restartable Sequences(Rseq) 系统调用,简单来说,应用代码执行前会向内核注册一个临界区,然后开始执行,如果发生抢占,必然进入内核态,在返回到此程序的用户态时,做一个检查,如果程序计数器位于其注册的临界区,表明程序在执行临界区时被打断了,此时需要重头开始执行临界区(重试),但是重试之前要回滚掉执行到一半的修改。

1.3.2 Middle中端

负责向前端提供内存,并向后端释放内存。分为传输缓存(transfer cache)和中央空闲列表(central free list),每个特定的大小size class 类别都会有一个传输缓存和一个中央空闲列表。

**Transfer cache: **当前端请求内存或释放内存时,将访问传输缓存。它维护了一个空闲对象指针数组,Transfer cache 是线程之间共享的,需要加锁;

**Central free list: **以跨度(span)的形式管理内存,对一个对象或多个对象的请求由中央空闲列表通过从跨度中提取对象来满足,直到请求得到满足,如果跨度中的可用对象不足,则向后端请求更多的跨度。

基数树(Radix Tree), 后面详细介绍

1.3.3 后端

职责:

  • 管理大量未使用的内存;

  • 当没有合适大小的内存可用于满足分配时,负责从操作系统中获取内存;

  • 并将不需要的内存归还给操作系统。

构成:遗留页堆(Legacy page heap) 和 大页感知堆(Hugepage aware page heap)。

Legacy page heap: 以TcMalloc页大小的块管理内存。

本质是一个free list数组,数组的每个节点是一个free list,为单向链表,一般情况下k<256,对于数组下标位置k的free list,链表每个节点都管理了k pages,那么在申请k个pages时,则直接找到数据下标k,从free list 中取一个节点即可,如果无空闲页,则查找下一个数组下标位置的free list,如果还是没有,则直接从mmap中获取。

当内存归还的时候,会根据归还的page前后能否形成连续的区域,通过串联这些page并根据page 的数量添加到对应的free list中。

Hugepage aware page heap: 大页感知堆

传统page为4KB,在X86平台上,hugepage的大小是2MB,包含了3个部分,分别是HugeCache,HugeFiler, HugeRegion

HugeFiller: 负责处理大小在一个hugepage size 之内的内存分配

HugeCache: 负责处理大小超过一个hugepage size的内存分配请求, 并按照hugepage对齐

HugeRegion: 负责处理大小超过一个hugepage size的内存分配请求, 用于会产生大量碎片的情况,例如,申请2.1MB,如果使用HugeCache会分配4MB内存,产生1.9Mb的内存碎片,HugeRegion则允许不按照4MB对齐。

二、GoLang内存模型

2.1 内存模型

基于TcMalloc思想,如下:

2.2 核心概念

内存管理单元(mspan)

  • Golang也是按page管理内存的,但区别是1page为8KB

  • mspan 是Golang的内存管理的最小单元;

  • mspan 由连续的page页组成,大小为page的整数倍,并根据大小被分配为不同的等级(size class);

  • mspan 会按object进一步细分,并由alloCache(uint64) 的bitmap标识object是否空闲;

  • 同等级的mspan构成一个链表并从属于同一个mcentral,由一把互斥锁管理;

源码 runtime/mheap.go

type mspan struct {
 next *mspan     
 prev *mspan    
 list *mSpanList 

 startAddr uintptr 
 npages    uintptr 

 manualFreeList gclinkptr 
 
 freeindex uintptr
 nelems uintptr // number of object in the span.
 allocCache uint64

 allocBits  *gcBits
 gcmarkBits *gcBits

 sweepgen              uint32
 divMul                uint32        // for divide by elemsize
 allocCount            uint16        // number of allocated objects
 spanclass             spanClass     // size class and noscan (uint8)
 state                 mSpanStateBox // mSpanInUse etc; accessed atomically (get/set methods)
 needzero              uint8         // needs to be zeroed before allocation
 allocCountBeforeCache uint16        // a copy of allocCount that is stored just before this span is cached
 elemsize              uintptr       // computed from sizeclass or from npages
 limit                 uintptr       // end of data in span
 speciallock           mutex         // guards specials list
 specials              *special      // linked list of special records sorted by offset.
 
 freeIndexForScan uintptr
}

内存单元等级(spanClass)

  • spanClass 被划分为68个等级(即size-class 0 处理大对象, 1-67处理小对象)

  • spanClass 为uint8,前7位为等级, 最后一位为noscan 标识位,标识object是否包含指针即在gc时是否需要扫描

源码 runtime/mheap.go

type spanClass uint8

const (
 numSpanClasses = _NumSizeClasses << 1
 tinySpanClass  = spanClass(tinySizeClass<<1 | 1)
)

// spanClass = sizeclass + mscan
func makeSpanClass(sizeclass uint8, noscan bool) spanClass {
 return spanClass(sizeclass<<1) | spanClass(bool2int(noscan))
}

// 右移一位 size class 等级
func (sc spanClass) sizeclass() int8 {
 return int8(sc >> 1)
}
// gc是否需要扫描
func (sc spanClass) noscan() bool {
 return sc&1 != 0
}

线程缓存(mcache)

  • mcache为每个P独有的缓存

  • mcache缓存了一个spanclass 与mspan的内存映射

  • mcache持有一个微对象分配器tiny 处理小于16B的内存分配

源码 runtime/mcache.go

type mcache struct {
 nextSample uintptr // trigger heap sample after allocating this many bytes
 scanAlloc  uintptr // bytes of scannable heap allocated

 // P独有,微对象相关
 tiny       uintptr
 tinyoffset uintptr
 tinyAllocs uintptr

 // mcache缓存的mspan, 通过spanClass 映射
 alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass

 stackcache [_NumStackOrders]stackfreelist
 flushGen uint32
}

中心缓存(mCentral)

  • mcentral非线程私有,处理需要加锁

  • 一个mcentral 对应一个spanClass, 一个spanClass下包含两个mspan链表,full 和partial。

  • full 和partial使用数组结构主要是为了处理GC

源码 runtime/mcentral.go

type mcentral struct {
 // mcentral 对应的spanClass
 spanclass spanClass 
 // 有空闲的msapn
 partial [2]spanSet
 // 无空闲的mspan
 full    [2]spanSet
}

type spanSet struct {
 spineLock mutex
 spine     unsafe.Pointer // *[N]*spanSetBlock, accessed atomically
 spineLen  uintptr        // Spine array length, accessed atomically
 spineCap  uintptr        // Spine array cap, accessed under lock
 
 index headTailIndex
}

全局堆缓存(mheap)

  • mheap持有页分配器(基于Radix Trees实现),内存按page进行管理,每个page为8KB

  • 通过bitMap标识页的使用情况,标识标识是否分配给mspan,而不是是否被分配给具体的对象

  • 通过heapAreana 记录page 到mspan 的映射

  • 内存不够时,向操作系统申请,申请单位为heapArena(64M)

源码 runtime/mheap.go

type mheap struct {
 // 锁, 总的来说,所有mspan都受到 mheap_.lock 的保护
 lock mutex

 _ uint32 

 // 页分配器, 基于Radix Tree 实现
 pages pageAlloc
 
 sweepgen uint32

 // 管理所有mheap创造的mspan切片, 每个mspan严格出现一次
 allspans []*mspan // all spans out there

 pagesInUse         atomic.Uint64 // pages of spans in stats mSpanInUse
 pagesSwept         atomic.Uint64 // pages swept this cycle
 pagesSweptBasis    atomic.Uint64 // pagesSwept to use as the origin of the sweep ratio
 sweepHeapLiveBasis uint64        // value of gcController.heapLive to use as the origin of sweep ratio; written with lock, read without
 sweepPagesPerByte  float64       // proportional sweep ratio; written with lock, read without


 reclaimIndex atomic.Uint64
 reclaimCredit atomic.Uintptr

 // heapAreana存储了堆的原数据信息, 使用arenaIdx计算下标
 // heapArena 数组,64位操作系统,[1][2^22]
 // heapArena 大小为64M
 arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena

 heapArenaAlloc linearAlloc
 arenaHints *arenaHint
 arena linearAlloc
 allArenas []arenaIdx
 sweepArenas []arenaIdx
 markArenas []arenaIdx
 curArena struct {
  base, end uintptr
 }

 _ uint32 // ensure 64-bit alignment of central
 // mcentral, 个数与spancalss一致
 central [numSpanClasses]struct {
  mcentral mcentral
  pad      [cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
 }

 spanalloc             fixalloc // allocator for span*
 cachealloc            fixalloc // allocator for mcache*
 specialfinalizeralloc fixalloc // allocator for specialfinalizer*
 specialprofilealloc   fixalloc // allocator for specialprofile*
 specialReachableAlloc fixalloc // allocator for specialReachable
 speciallock           mutex    // lock for special record allocators.
 arenaHintAlloc        fixalloc // allocator for arenaHints

 unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
}

// 
type heapArena struct {
 bitmap [heapArenaBitmapBytes]byte
 spans [pagesPerArena]*mspan
 pageInUse [pagesPerArena / 8]uint8
 pageMarks [pagesPerArena / 8]uint8

 pageSpecials [pagesPerArena / 8]uint8
 checkmarks *checkmarksMap
 zeroedBase uintptr
}

空闲页寻址分配器(pageAlloc)

  • 底层基于基数树实现,每个基数树管理16GB内存,总共持有2^14棵基数树

  • 基数树使用bitmap表示内存页是否空闲,0空闲,1标识已被mspan占用

源码 runtime/mpagealloc.go

const (
 summaryLevels = 5
)

type pageAlloc struct {
 // 5 层 基数树
 summary [summaryLevels][]pallocSum
 
 chunks [1 << pallocChunksL1Bits]*[1 << pallocChunksL2Bits]pallocData

 searchAddr offAddr
 start, end chunkIdx

 inUse addrRanges

 _ uint32 // Align scav so it's easier to reason about alignment within scav.

 // scav stores the scavenger state.
 scav struct {
  index scavengeIndex
  released uintptr
  _ uint32 // Align assistTime for atomics on 32-bit platforms.
  assistTime atomic.Int64
 }

 // mheap_.lock. This level of indirection makes it possible
 // to test pageAlloc indepedently of the runtime allocator.
 mheapLock *mutex
 sysStat *sysMemStat

 summaryMappedReady uintptr
 test bool
}

基数树(Radix Tree)

  • 由pallocSum构成的树形结构

  • pallocSum 为uint64类型,第一个bit废弃不用,其他分为3个部分。

  • 每个父pallocSum有8个子pallocSum, 根pallocSum bitMap 范围为2^21 页, 每向下一层/8

  • 根pallocSum映射的内存大小为 2^21 * 8KB = 16GB

  • mheap寻找内存页过程,先看start是否符合,是则寻找成功;再看max,max符合及向下层查询;否则看end;非跟节点情况下,会看当前节点的end 和 同级节点的start组合之后是否满足。

源码 runtime/mpagealloc.go

const (
 pallocSumBytes = unsafe.Sizeof(pallocSum(0))

 // maxPackedValue is the maximum value that any of the three fields in
 // the pallocSum may take on.
 maxPackedValue    = 1 << logMaxPackedValue
 // 21
 logMaxPackedValue = logPallocChunkPages + (summaryLevels-1)*summaryLevelBits

 freeChunkSum = pallocSum(uint64(pallocChunkPages) |
  uint64(pallocChunkPages<<logMaxPackedValue) |
  uint64(pallocChunkPages<<(2*logMaxPackedValue)))
)

type pallocSum uint64

// start, max, end pack的结果
func packPallocSum(start, max, end uint) pallocSum {
 if max == maxPackedValue {
  return pallocSum(uint64(1 << 63))
 }
 return pallocSum((uint64(start) & (maxPackedValue - 1)) |
  ((uint64(max) & (maxPackedValue - 1)) << logMaxPackedValue) |
  ((uint64(end) & (maxPackedValue - 1)) << (2 * logMaxPackedValue)))
}

// start extracts the start value from a packed sum.
func (p pallocSum) start() uint {
 if uint64(p)&uint64(1<<63) != 0 {
  return maxPackedValue
 }
 return uint(uint64(p) & (maxPackedValue - 1))
}

// max extracts the max value from a packed sum.
func (p pallocSum) max() uint {
 if uint64(p)&uint64(1<<63) != 0 {
  return maxPackedValue
 }
 return uint((uint64(p) >> logMaxPackedValue) & (maxPackedValue - 1))
}

// end extracts the end value from a packed sum.
func (p pallocSum) end() uint {
 if uint64(p)&uint64(1<<63) != 0 {
  return maxPackedValue
 }
 return uint((uint64(p) >> (2 * logMaxPackedValue)) & (maxPackedValue - 1))
}

// unpack unpacks all three values from the summary.
func (p pallocSum) unpack() (uint, uint, uint) {
 if uint64(p)&uint64(1<<63) != 0 {
  return maxPackedValue, maxPackedValue, maxPackedValue
 }
 return uint(uint64(p) & (maxPackedValue - 1)),
  uint((uint64(p) >> logMaxPackedValue) & (maxPackedValue - 1)),
  uint((uint64(p) >> (2 * logMaxPackedValue)) & (maxPackedValue - 1))
}

heapArena

  • Page 到mspan的映射,Gc时,通过地址找到页,根据页查找器对应的mspan

  • heapArena包含了8192个页, 对应内存大小 为64MB,也是mheap向操作系统申请内存的最小单位

源码 runtime/mheap.go

type heapArena struct {

 bitmap [heapArenaBitmapBytes]byte
// page 到 mspan的映射
 spans [pagesPerArena]*mspan

 pageInUse [pagesPerArena / 8]uint8
 pageMarks [pagesPerArena / 8]uint8
 pageSpecials [pagesPerArena / 8]uint8

 checkmarks *checkmarksMap
 zeroedBase uintptr
}

2.3 对象内存分配过程

按照创建的对象大小创建流程如下:

  1. 从P的mcache的tiny中取内存 (无锁)

  2. 根据spanClass,从P的mcache的mspan中取内存(无锁)

  3. 根据spanClass找mcentral,从mcentral中取mspan填充到mache,再从mspan中取内存(spanClass锁)

  4. 根据spanclass找mheap, mheap取空闲页组成mspan充到mache,再从mspan中取内存(全局锁)

  5. 从操作系统申请内存,填充mheap,并重复4操作

源码 runtime/malloc.go

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
 // 省略...
 // DMP 获取m
 mp := acquirem()
 if mp.mallocing != 0 {
  throw("malloc deadlock")
 }
 if mp.gsignal == getg() {
  throw("malloc during signal")
 }
 mp.mallocing = 1

 shouldhelpgc := false
 dataSize := userSize
 c := getMCache(mp)  // 获取mcache
 if c == nil {
  throw("mallocgc called without a P or outside bootstrapping")
 }
 var span *mspan
 var x unsafe.Pointer
 noscan := typ == nil || typ.ptrdata == 0
 // In some cases block zeroing can profitably (for latency reduction purposes)
 // be delayed till preemption is possible; delayedZeroing tracks that state.
 delayedZeroing := false
 // 微对象/小对象
 if size <= maxSmallSize {
  // 微对象
  if noscan && size < maxTinySize {
   // tiny的空闲位置offset
   off := c.tinyoffset
   // 按照对象大小向上取整至2的整数次幂 进行内存对齐
   if size&7 == 0 {
    off = alignUp(off, 8)
   } else if goarch.PtrSize == 4 && size == 12 {
    off = alignUp(off, 8)
   } else if size&3 == 0 {
    off = alignUp(off, 4)
   } else if size&1 == 0 {
    off = alignUp(off, 2)
   }
   // 够用
   if off+size <= maxTinySize && c.tiny != 0 {
    // The object fits into existing tiny block.
    x = unsafe.Pointer(c.tiny + off)
    c.tinyoffset = off + size
    c.tinyAllocs++
    mp.mallocing = 0
    releasem(mp)
    return x
   }
   
   // 从mcache中获取span
   span = c.alloc[tinySpanClass]
   // 从mCache中获取内存
   v := nextFreeFast(span)
   if v == 0 {
    // 从mcache中获取失败,则从mcentral, 从mcentral失败则从mheap中获取内存
    v, span, shouldhelpgc = c.nextFree(tinySpanClass)
   }
   x = unsafe.Pointer(v)
   (*[2]uint64)(x)[0] = 0
   (*[2]uint64)(x)[1] = 0
   // See if we need to replace the existing tiny block with the new one
   // based on amount of remaining free space.
   if !raceenabled && (size < c.tinyoffset || c.tiny == 0) {
    // Note: disabled when race detector is on, see comment near end of this function.
    c.tiny = uintptr(x)
    c.tinyoffset = size
   }
   size = maxTinySize
  } else {
   // 小对象,找到sizeClass
   var sizeclass uint8
   if size <= smallSizeMax-8 {
    sizeclass = size_to_class8[divRoundUp(size, smallSizeDiv)]
   } else {
    sizeclass = size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]
   }
   size = uintptr(class_to_size[sizeclass])
   // 根据sizeclass和noscan创建mspan
   spc := makeSpanClass(sizeclass, noscan)
   // 获取mcache中的span
   span = c.alloc[spc]
   // 从mcache的span获取内存
   v := nextFreeFast(span)
   if v == 0 {
    // 从mcache中获取失败,则从mcentral, 从mcentral失败则从mheap中获取内存
    v, span, shouldhelpgc = c.nextFree(spc)
   }
   x = unsafe.Pointer(v)
   if needzero && span.needzero != 0 {
    memclrNoHeapPointers(unsafe.Pointer(v), size)
   }
  }
 } else {
  shouldhelpgc = true
  // 大对象
  span = c.allocLarge(size, noscan)
  span.freeindex = 1
  span.allocCount = 1
  size = span.elemsize
  x = unsafe.Pointer(span.base())
  if needzero && span.needzero != 0 {
   if noscan {
    delayedZeroing = true
   } else {
    memclrNoHeapPointers(x, size)
    // We've in theory cleared almost the whole span here,
    // and could take the extra step of actually clearing
    // the whole thing. However, don't. Any GC bits for the
    // uncleared parts will be zero, and it's just going to
    // be needzero = 1 once freed anyway.
   }
  }
 }
  // 省略...
}

源码 runtime/malloc.go

func nextFreeFast(s *mspan) gclinkptr {
 // Ctz64算法,找到空闲object
 theBit := sys.Ctz64(s.allocCache)
 if theBit < 64 {
  result := s.freeindex + uintptr(theBit)
  if result < s.nelems {
   freeidx := result + 1
   if freeidx%64 == 0 && freeidx != s.nelems {
    return 0
   }
   s.allocCache >>= uint(theBit + 1)
   s.freeindex = freeidx
   s.allocCount++
   // 返回内存地址
   return gclinkptr(result*s.elemsize + s.base())
  }
 }
 return 0
}


func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
 s = c.alloc[spc]
 shouldhelpgc = false
 // 从mcache的span中获取空闲object偏移量
 freeIndex := s.nextFreeIndex()
 if freeIndex == s.nelems {
  // The span is full.
  if uintptr(s.allocCount) != s.nelems {
   println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
   throw("s.allocCount != s.nelems && freeIndex == s.nelems")
  }
  // 从mcentral, mheap中获取
  c.refill(spc)
  shouldhelpgc = true
  s = c.alloc[spc]
  // 再次获取空闲object
  freeIndex = s.nextFreeIndex()
 }

 if freeIndex >= s.nelems {
  throw("freeIndex is not valid")
 }

 v = gclinkptr(freeIndex*s.elemsize + s.base())
 s.allocCount++
 if uintptr(s.allocCount) > s.nelems {
  println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
  throw("s.allocCount > s.nelems")
 }
 return
}

func (c *mcache) refill(spc spanClass) {
 // Return the current cached span to the central lists.
 s := c.alloc[spc]

 if uintptr(s.allocCount) != s.nelems {
  throw("refill of span with free space remaining")
 }
 if s != &emptymspan {
  // Mark this span as no longer cached.
  if s.sweepgen != mheap_.sweepgen+3 {
   throw("bad sweepgen in refill")
  }
  mheap_.central[spc].mcentral.uncacheSpan(s)

  // Count up how many slots were used and record it.
  stats := memstats.heapStats.acquire()
  slotsUsed := int64(s.allocCount) - int64(s.allocCountBeforeCache)
  atomic.Xadd64(&stats.smallAllocCount[spc.sizeclass()], slotsUsed)

  // Flush tinyAllocs.
  if spc == tinySpanClass {
   atomic.Xadd64(&stats.tinyAllocCount, int64(c.tinyAllocs))
   c.tinyAllocs = 0
  }
  memstats.heapStats.release()

  // Count the allocs in inconsistent, internal stats.
  bytesAllocated := slotsUsed * int64(s.elemsize)
  gcController.totalAlloc.Add(bytesAllocated)

  // Clear the second allocCount just to be safe.
  s.allocCountBeforeCache = 0
 }

 // Get a new cached span from the central lists.
 s = mheap_.central[spc].mcentral.cacheSpan()
 if s == nil {
  throw("out of memory")
 }

 if uintptr(s.allocCount) == s.nelems {
  throw("span has no free space")
 }

 // Indicate that this span is cached and prevent asynchronous
 // sweeping in the next sweep phase.
 s.sweepgen = mheap_.sweepgen + 3

 // Store the current alloc count for accounting later.
 s.allocCountBeforeCache = s.allocCount
 usedBytes := uintptr(s.allocCount) * s.elemsize
 gcController.update(int64(s.npages*pageSize)-int64(usedBytes), int64(c.scanAlloc))
 c.scanAlloc = 0

 c.alloc[spc] = s
}

源码 runtime/mcentral.go

// Allocate a span to use in an mcache.
func (c *mcentral) cacheSpan() *mspan {
 // Deduct credit for this span allocation and sweep if necessary.
 spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
 deductSweepCredit(spanBytes, 0)

 traceDone := false
 if trace.enabled {
  traceGCSweepStart()
 }

 spanBudget := 100

 var s *mspan
 var sl sweepLocker  // 锁

 // Try partial swept spans first.
 sg := mheap_.sweepgen
 if s = c.partialSwept(sg).pop(); s != nil {
  goto havespan
 }

 sl = sweep.active.begin()
 // 从两个链表里找到可用的
 if sl.valid {
  // Now try partial unswept spans. 尝试从partial中获取span
  for ; spanBudget >= 0; spanBudget-- {
   s = c.partialUnswept(sg).pop()
   if s == nil {
    break
   }
   if s, ok := sl.tryAcquire(s); ok {
    // We got ownership of the span, so let's sweep it and use it.
    s.sweep(true)
    sweep.active.end(sl)
    goto havespan
   }
  }
  
  // 尝试从full中获取
  for ; spanBudget >= 0; spanBudget-- {
   s = c.fullUnswept(sg).pop()
   if s == nil {
    break
   }
   if s, ok := sl.tryAcquire(s); ok {
    // We got ownership of the span, so let's sweep it.
    s.sweep(true)
    // Check if there's any free space.
    freeIndex := s.nextFreeIndex()
    if freeIndex != s.nelems {
     s.freeindex = freeIndex
     sweep.active.end(sl)
     goto havespan
    }
    // Add it to the swept list, because sweeping didn't give us any free space.
    c.fullSwept(sg).push(s.mspan)
   }
   // See comment for partial unswept spans.
  }
  sweep.active.end(sl)
 }
 if trace.enabled {
  traceGCSweepDone()
  traceDone = true
 }

 // We failed to get a span from the mcentral so get one from mheap. 没有可用的了,要从mheap获取了
 s = c.grow()
 if s == nil {
  return nil
 }

 // At this point s is a span that should have free slots. 至此,已经获取到空闲可用的span
havespan:
 if trace.enabled && !traceDone {
  traceGCSweepDone()
 }
 n := int(s.nelems) - int(s.allocCount)
 if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
  throw("span has no free objects")
 }
 freeByteBase := s.freeindex &^ (64 - 1)
 whichByte := freeByteBase / 8
 // Init alloc bits cache.
 s.refillAllocCache(whichByte)

 // Adjust the allocCache so that s.freeindex corresponds to the low bit in
 // s.allocCache.
 s.allocCache >>= s.freeindex % 64

 return s
}

// grow allocates a new empty span from the heap and initializes it for c's size class.
func (c *mcentral) grow() *mspan {
 npages := uintptr(class_to_allocnpages[c.spanclass.sizeclass()])
 size := uintptr(class_to_size[c.spanclass.sizeclass()])

 s := mheap_.alloc(npages, c.spanclass)
 if s == nil {
  return nil
 }

 // Use division by multiplication and shifts to quickly compute:
 // n := (npages << _PageShift) / size
 n := s.divideByElemSize(npages << _PageShift)
 s.limit = s.base() + size*n
 heapBitsForAddr(s.base()).initSpan(s)
 return s
}

源码 runtime/mheap.go

// alloc allocates a new span of npage pages from the GC'd heap.
// spanclass indicates the span's size class and scannability.
func (h *mheap) alloc(npages uintptr, spanclass spanClass) *mspan {
 // Don't do any operations that lock the heap on the G stack.
 // It might trigger stack growth, and the stack growth code needs
 // to be able to allocate heap.
 var s *mspan
 // 执行权由g -> g0
 systemstack(func() {
  // To prevent excessive heap growth, before allocating n pages
  // we need to sweep and reclaim at least n pages.
  if !isSweepDone() {
   h.reclaim(npages)
  }
  s = h.allocSpan(npages, spanAllocHeap, spanclass)
 })
 return s
}

func (h *mheap) allocSpan(npages uintptr, typ spanAllocType, spanclass spanClass) (s *mspan) {
 // Function-global state.
 gp := getg()
 base, scav := uintptr(0), uintptr(0)
 growth := uintptr(0)

 needPhysPageAlign := physPageAlignedStacks && typ == spanAllocStack && pageSize < physPageSize
 
 pp := gp.m.p.ptr()
 if !needPhysPageAlign && pp != nil && npages < pageCachePages/4 {
  c := &pp.pcache

  // If the cache is empty, refill it.
  if c.empty() {
   lock(&h.lock)
   *c = h.pages.allocToCache()
   unlock(&h.lock)
  }

  // Try to allocate from the cache.
  base, scav = c.alloc(npages)
  if base != 0 {
   s = h.tryAllocMSpan()
   if s != nil {
    goto HaveSpan
   }
  }
 }
 // 加锁
 lock(&h.lock)

 if needPhysPageAlign {
  // Overallocate by a physical page to allow for later alignment.
  extraPages := physPageSize / pageSize
  // 基数树查找空闲页
  base, _ = h.pages.find(npages + extraPages)
  if base == 0 {
   var ok bool
   // 尝试向操作系统申请内存
   growth, ok = h.grow(npages + extraPages)
   if !ok {
    unlock(&h.lock)
    return nil
   }
   base, _ = h.pages.find(npages + extraPages)
   if base == 0 {
    throw("grew heap, but no adequate free space found")
   }
  }
  base = alignUp(base, physPageSize)
  scav = h.pages.allocRange(base, npages)
 }
 if base == 0 {
  // Try to acquire a base address.
  base, scav = h.pages.alloc(npages)
  if base == 0 {
   var ok bool
   growth, ok = h.grow(npages)
   if !ok {
    unlock(&h.lock)
    return nil
   }
   base, scav = h.pages.alloc(npages)
   if base == 0 {
    throw("grew heap, but no adequate free space found")
   }
  }
 }
 if s == nil {
  // We failed to get an mspan earlier, so grab
  // one now that we have the heap lock.
  s = h.allocMSpanLocked()
 }
 unlock(&h.lock)

HaveSpan:
 // At this point, both s != nil and base != 0, and the heap
 // lock is no longer held. Initialize the span.
 s.init(base, npages)
 if h.allocNeedsZero(base, npages) {
  s.needzero = 1
 }
 nbytes := npages * pageSize
 if typ.manual() {
  s.manualFreeList = 0
  s.nelems = 0
  s.limit = s.base() + s.npages*pageSize
  s.state.set(mSpanManual)
 } else {
  // We must set span properties before the span is published anywhere
  // since we're not holding the heap lock.
  s.spanclass = spanclass
  if sizeclass := spanclass.sizeclass(); sizeclass == 0 {
   s.elemsize = nbytes
   s.nelems = 1
   s.divMul = 0
  } else {
   s.elemsize = uintptr(class_to_size[sizeclass])
   s.nelems = nbytes / s.elemsize
   s.divMul = class_to_divmagic[sizeclass]
  }

  // Initialize mark and allocation structures.
  s.freeindex = 0
  s.freeIndexForScan = 0
  s.allocCache = ^uint64(0) // all 1s indicating all free.
  s.gcmarkBits = newMarkBits(s.nelems)
  s.allocBits = newAllocBits(s.nelems)
  
  atomic.Store(&s.sweepgen, h.sweepgen)
  s.state.set(mSpanInUse)
 }

 // Publish the span in various locations. 建立页到mspan的映射
 h.setSpans(s.base(), npages, s)

 // 省略...

 return s
}

源码 runtime/pagealloc.go

func (p *pageAlloc) find(npages uintptr) (uintptr, offAddr) {
 // 持有heap锁
 assertLockHeld(p.mheapLock)

 // current level.
 i := 0
 
 firstFree := struct {
  base, bound offAddr
 }{
  base:  minOffAddr,
  bound: maxOffAddr,
 }

 foundFree := func(addr offAddr, size uintptr) {
  if firstFree.base.lessEqual(addr) && addr.add(size-1).lessEqual(firstFree.bound) {
   // This range fits within the current firstFree window, so narrow
   // down the firstFree window to the base and bound of this range.
   firstFree.base = addr
   firstFree.bound = addr.add(size - 1)
  } else if !(addr.add(size-1).lessThan(firstFree.base) || firstFree.bound.lessThan(addr)) {
   // This range only partially overlaps with the firstFree range,
   // so throw.
   print("runtime: addr = ", hex(addr.addr()), ", size = ", size, "\n")
   print("runtime: base = ", hex(firstFree.base.addr()), ", bound = ", hex(firstFree.bound.addr()), "\n")
   throw("range partially overlaps")
  }
 }
 lastSum := packPallocSum(0, 0, 0)
 lastSumIdx := -1

nextLevel:
 // 依次遍历每一层
 for l := 0; l < len(p.summary); l++ {
  // For the root level, entriesPerBlock is the whole level.
  entriesPerBlock := 1 << levelBits[l]
  logMaxPages := levelLogPages[l]

  i <<= levelBits[l]

  // Slice out the block of entries we care about.
  entries := p.summary[l][i : i+entriesPerBlock]

  j0 := 0
  if searchIdx := offAddrToLevelIndex(l, p.searchAddr); searchIdx&^(entriesPerBlock-1) == i {
   j0 = searchIdx & (entriesPerBlock - 1)
  }

  var base, size uint
  for j := j0; j < len(entries); j++ {
   sum := entries[j]
   if sum == 0 {
    // A full entry means we broke any streak and
    // that we should skip it altogether.
    size = 0
    continue
   }

   foundFree(levelIndexToOffAddr(l, i+j), (uintptr(1)<<logMaxPages)*pageSize)
   
   s := sum.start()
   // start 满足, break
   if size+s >= uint(npages) {
    if size == 0 {
     base = uint(j) << logMaxPages
    }
    // We hit npages; we're done!
    size += s
    break
   }
   // max 满足 到下一层
   if sum.max() >= uint(npages) {
    i += j
    lastSumIdx = i
    lastSum = sum
    continue nextLevel
   }
   // 尝试将end和同级的start 连接
   if size == 0 || s < 1<<logMaxPages {
    size = sum.end()
    base = uint(j+1)<<logMaxPages - size
    continue
   }
   // The entry is completely free, so continue the run.
   size += 1 << logMaxPages
  }
  if size >= uint(npages) {
   // We found a sufficiently large run of free pages straddling
   // some boundary, so compute the address and return it.
   addr := levelIndexToOffAddr(l, i).add(uintptr(base) * pageSize).addr()
   return addr, p.findMappedAddr(firstFree.base)
  }
  if l == 0 {
   // We're at level zero, so that means we've exhausted our search.
   return 0, maxSearchAddr()
  }
  
  print("runtime: summary[", l-1, "][", lastSumIdx, "] = ", lastSum.start(), ", ", lastSum.max(), ", ", lastSum.end(), "\n")
  print("runtime: level = ", l, ", npages = ", npages, ", j0 = ", j0, "\n")
  print("runtime: p.searchAddr = ", hex(p.searchAddr.addr()), ", i = ", i, "\n")
  print("runtime: levelShift[level] = ", levelShift[l], ", levelBits[level] = ", levelBits[l], "\n")
  for j := 0; j < len(entries); j++ {
   sum := entries[j]
   print("runtime: summary[", l, "][", i+j, "] = (", sum.start(), ", ", sum.max(), ", ", sum.end(), ")\n")
  }
  throw("bad summary data")
 }
 // 获取chunk index
 ci := chunkIdx(i)
 j, searchIdx := p.chunkOf(ci).find(npages, 0)
 if j == ^uint(0) {
  // We couldn't find any space in this chunk despite the summaries telling
  // us it should be there. There's likely a bug, so dump some state and throw.
  sum := p.summary[len(p.summary)-1][i]
  print("runtime: summary[", len(p.summary)-1, "][", i, "] = (", sum.start(), ", ", sum.max(), ", ", sum.end(), ")\n")
  print("runtime: npages = ", npages, "\n")
  throw("bad summary data")
 }

 // 计算地址
 addr := chunkBase(ci) + uintptr(j)*pageSize

 searchAddr := chunkBase(ci) + uintptr(searchIdx)*pageSize
 foundFree(offAddr{searchAddr}, chunkBase(ci+1)-searchAddr)
 return addr, p.findMappedAddr(firstFree.base)
}

三、总结

核心是TcMalloc