前言

本章将从整体梳理一下Netty池化内存使用的流程，包括分配内存和释放内存。

一、PooledByteBufAllocator

PooledByteBufAllocator类加载执行cinit，根据配置信息决定一些关键的规格信息：比如页大小（8K）、Chunk树深度（11）、Arena数组长度（核数*2）。通过页大小和Chunk树深度间接决定了Chunk的大小（16M）。

// 1. 决定页大小 8K
int defaultPageSize = SystemPropertyUtil.getInt("io.netty.allocator.pageSize", 8192);
// ...
DEFAULT_PAGE_SIZE = defaultPageSize;

// 2. 决定Chunk树深度 11
int defaultMaxOrder = SystemPropertyUtil.getInt("io.netty.allocator.maxOrder", 11);
// ...
DEFAULT_MAX_ORDER = defaultMaxOrder;

// 3. Chunk内存块大小 16M = 页大小 << 树深度
final int defaultChunkSize = DEFAULT_PAGE_SIZE << DEFAULT_MAX_ORDER;

final Runtime runtime = Runtime.getRuntime();
// 4. Arena数组长度 核数*2
final int defaultMinNumArena = NettyRuntime.availableProcessors() * 2;
DEFAULT_NUM_HEAP_ARENA = Math.max(0,
                                  SystemPropertyUtil.getInt(
                                      "io.netty.allocator.numHeapArenas",
                                      (int) Math.min(
                                          defaultMinNumArena,
                                          runtime.maxMemory() / defaultChunkSize / 2 / 3)));
DEFAULT_NUM_DIRECT_ARENA = Math.max(0,
                                    SystemPropertyUtil.getInt(
                                        "io.netty.allocator.numDirectArenas",
                                        (int) Math.min(
                                            defaultMinNumArena,
                                            PlatformDependent.maxDirectMemory() / defaultChunkSize / 2 / 3)));

// 5. 线程缓存PoolThreadCache中不同规格MemoryRegionCache的mpsc队列长度
DEFAULT_TINY_CACHE_SIZE = SystemPropertyUtil.getInt("io.netty.allocator.tinyCacheSize", 512);
DEFAULT_SMALL_CACHE_SIZE = SystemPropertyUtil.getInt("io.netty.allocator.smallCacheSize", 256);
DEFAULT_NORMAL_CACHE_SIZE = SystemPropertyUtil.getInt("io.netty.allocator.normalCacheSize", 64);

PooledByteBufAllocator构造方法，一些成员变量赋值，并构造PoolArena数组。

public PooledByteBufAllocator(boolean preferDirect, int nHeapArena, int nDirectArena, int pageSize, int maxOrder,
                              int tinyCacheSize, int smallCacheSize, int normalCacheSize,
                              boolean useCacheForAllThreads, int directMemoryCacheAlignment) {
    super(preferDirect);
    threadCache = new PoolThreadLocalCache(useCacheForAllThreads);
    this.tinyCacheSize = tinyCacheSize;
    this.smallCacheSize = smallCacheSize;
    this.normalCacheSize = normalCacheSize;
    chunkSize = validateAndCalculateChunkSize(pageSize, maxOrder);
    int pageShifts = validateAndCalculatePageShifts(pageSize);
	// 堆内存Arena数组构造
    if (nHeapArena > 0) {
        heapArenas = newArenaArray(nHeapArena);
        for (int i = 0; i < heapArenas.length; i ++) {
            PoolArena.HeapArena arena = new PoolArena.HeapArena(this,
                                                                pageSize, maxOrder, pageShifts, chunkSize,
                                                                directMemoryCacheAlignment);
            heapArenas[i] = arena;
        }
    }
	// 直接内存Arena数组构造
    if (nDirectArena > 0) {
        directArenas = newArenaArray(nDirectArena);
        for (int i = 0; i < directArenas.length; i ++) {
            PoolArena.DirectArena arena = new PoolArena.DirectArena(
                this, pageSize, maxOrder, pageShifts, chunkSize, directMemoryCacheAlignment);
            directArenas[i] = arena;
        }
    }
}

当PooledByteBufAllocator构造完成后，PoolArena也构造完了，回忆一下Arena的结构如下。

Allocator可以分配HeapBuffer也可以分配DirectBuffer，这里选择newDirectBuffer方法，分配直接内存。

private final PoolThreadLocalCache threadCache;
@Override
protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
    // 1. 获得当前线程缓存 和 线程缓存对应的PoolArena
    PoolThreadCache cache = threadCache.get();
    PoolArena<ByteBuffer> directArena = cache.directArena;

    final ByteBuf buf;
    if (directArena != null) {
        // 2. 选择池化
        buf = directArena.allocate(cache, initialCapacity, maxCapacity);
    } else {
        // 选择非池化
        buf = PlatformDependent.hasUnsafe() ?
            UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity) :
        new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
    }
    // 3. 如果配置了内存泄露检测，将ByteBuf包装一下，忽略
    return toLeakAwareBuffer(buf);
}

对于上面代码的第一步，获取PoolThreadLocalCache线程变量里的PoolThreadCache实例。如果此时当前线程没有分配PoolThreadCache，会触发PoolThreadLocalCache的initialValue方法，选择一个使用率最少的Arena给当前线程持有。

@Override
protected synchronized PoolThreadCache initialValue() {
    // 每个线程从 公共数组heapArenas和directArenas中
    // 使用最少的一个Arena作为当前线程的Arena
    final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);
    final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);

    final Thread current = Thread.currentThread();
    if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {
        // 构造PoolThreadCache
        final PoolThreadCache cache = new PoolThreadCache(
            heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize,
            DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);
        return cache;
    }
    // ...
}

二、池化内存分配主流程

PoolArena的allocate方法首先调用PoolArena.DirectArena#newByteBuf创建一个池化Buffer。

PooledByteBuf<T> allocate(PoolThreadCache cache, int reqCapacity, int maxCapacity) {
    // 创建一个池化ByteBuf
    PooledByteBuf<T> buf = newByteBuf(maxCapacity);
    // 为ByteBuf分配内存
    allocate(cache, buf, reqCapacity);
    return buf;
}

接下来allocate方法为PooledByteBuf分配Chunk和handle（记得吗？handle是Chunk内存块的偏移量信息）。内存分配示意图 allocate方法是内存分配的主流程，根据内存规格走的流程节点不同。

private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
    // 申请容量标准化 向上取最接近的2的n次幂(之前看过实现)
    final int normCapacity = normalizeCapacity(reqCapacity);
    // 申请容量 小于 8KB
    if (isTinyOrSmall(normCapacity)) {
        int tableIdx;
        PoolSubpage<T>[] table;
        boolean tiny = isTiny(normCapacity);
        // 1 第一级 --- 尝试从线程缓存PoolThreadCache分配
        // 申请容量 小于 512B
        if (tiny) {
            // 尝试从 PoolThreadCache 的 MemoryRegionCache 中获取
            if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
                // was able to allocate out of the cache so move on
                return;
            }
            tableIdx = tinyIdx(normCapacity);
            table = tinySubpagePools;
        }
        // 申请容量 大于 512B
        else {
            // 尝试从 PoolThreadCache 的 MemoryRegionCache 中获取
            if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
                return;
            }
            tableIdx = smallIdx(normCapacity);
            table = smallSubpagePools;
        }
        // 2 第二级 --- 尝试从tinySubpagePools或smallSubpagePools中获取
        final PoolSubpage<T> head = table[tableIdx];

        synchronized (head) {
            final PoolSubpage<T> s = head.next;
            if (s != head) {
                long handle = s.allocate();
                s.chunk.initBufWithSubpage(buf, null, handle, reqCapacity, cache);
                return;
            }
        }
        // 3 第三级 --- 正常分配逻辑
        synchronized (this) {
            allocateNormal(buf, reqCapacity, normCapacity, cache);
        }
        return;
    }
    // 申请容量 小于 16MB
    if (normCapacity <= chunkSize) {
        // 1 第一级 --- 尝试从线程缓存PoolThreadCache分配
        if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {
            return;
        }
        // 2 第二级 --- 正常分配逻辑
        synchronized (this) {
            allocateNormal(buf, reqCapacity, normCapacity, cache);
            ++allocationsNormal;
        }
    }
    // 申请容量 大于 16MB 创建非池化
    else {
        allocateHuge(buf, reqCapacity);
    }
}

对于Tiny和Small规格的内存分配，要经历四层：线程缓存->Subpage池->ChunkList->新Chunk，后两步就是allocateNormal方法。其中Subpage池分配，需要对Subpage对应规格的链表的空头节点加锁，因为Subpage出入链表是多线程操作（PoolChunk#allocateSubpage和PoolChunk#free）。ChunkList分配和新Chunk分配也被包裹成allocateNormal方法加锁，因为ChunkList会被多线程操作。

对于Normal规格的内存分配，经历三层，不会走Subpage池因为Normal规格大于等于页大小，不是由Subpage分配的，而是由Chunk直接分配的。

对于Huge规格的内存分配，只会用一个非池化的特殊Chunk来分配内存。

三、Huge规格内存分配

Huge规格指的是标准化后，分配内存大于16MB的内存，这超出了正常一个Chunk的大小。Netty的处理方式是通过创建特殊Chunk来复用逻辑。

private void allocateHuge(PooledByteBuf<T> buf, int reqCapacity) {
    // 1. 创建一个特殊的PoolChunk
    PoolChunk<T> chunk = newUnpooledChunk(reqCapacity);
    // 2. 执行PooledByteBuf的init0方法
    buf.initUnpooled(chunk, reqCapacity);
}

1、创建特殊Chunk

PoolArena.DirectArena的newUnpooledChunk方法，直接调用PoolChunk的一个特殊构造方法。

@Override
protected PoolChunk<ByteBuffer> newUnpooledChunk(int capacity) {
    // 默认直接内存对齐填充是0
    if (directMemoryCacheAlignment == 0) {
        // 创建JDK ByteBuffer 正常情况下是16M，但是这里是标准化后超过16M的Huge规格内存
        ByteBuffer byteBuffer = allocateDirect(capacity);
        // 构造特殊PoolChunk
        return new PoolChunk<ByteBuffer>(this, byteBuffer, capacity, 0);
    }
    // ...省略
}

PoolChunk专门提供给Huge规格内存的构造方法，很多关键的成员变量都是null，如memoryMap树，此外unpooled属性是true。

/** Creates a special chunk that is not pooled. */
PoolChunk(PoolArena<T> arena, T memory, int size, int offset) {
    unpooled = true;
    this.arena = arena;
    this.memory = memory;
    this.offset = offset;
    memoryMap = null;
    depthMap = null;
    subpages = null;
    subpageOverflowMask = 0;
    pageSize = 0;
    pageShifts = 0;
    maxOrder = 0;
    unusable = (byte) (maxOrder + 1);
    chunkSize = size;
    log2ChunkSize = log2(chunkSize);
    maxSubpageAllocs = 0;
    cachedNioBuffers = null;
}

2、初始化Buffer

PooledByteBuf的初始化方法之前讲Chunk和Subpage内存分配的时候看过，这里还有一个initUnpooled方法是专门给Huge规格用的，无非是特殊Chunk分配内存，很多属性没有。

// Tiny/Small（Subpage分配）、Normal(Chunk分配)
void init(PoolChunk<T> chunk, ByteBuffer nioBuffer,
          long handle, int offset, int length, int maxLength, PoolThreadCache cache) {
    init0(chunk, nioBuffer, handle, offset, length, maxLength, cache);
}
// Huge(特殊Chunk分配)
void initUnpooled(PoolChunk<T> chunk, int length) {
    init0(chunk, null, 0, chunk.offset, length, length, null);
}
// 上面两个方法最后进入的主入口
private void init0(PoolChunk<T> chunk, ByteBuffer nioBuffer,
                   long handle, int offset, int length, int maxLength, PoolThreadCache cache) {
    this.chunk = chunk;
    memory = chunk.memory;
    tmpNioBuf = nioBuffer;
    allocator = chunk.arena.parent;
    this.cache = cache;
    this.handle = handle;
    this.offset = offset;
    this.length = length;
    this.maxLength = maxLength;
}

四、正常分配逻辑

PoolArena#allocateNormal方法是正常分配逻辑，需要从Chunk分配得到内存。整个方法需要被synchronized包裹，因为涉及qXXX链表多线程操作。

private final PoolChunkList<T> q050;
private final PoolChunkList<T> q025;
private final PoolChunkList<T> q000;
private final PoolChunkList<T> qInit;
private final PoolChunkList<T> q075;
private final PoolChunkList<T> q100;
// Method must be called inside synchronized(this) { ... } block
private void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity, PoolThreadCache threadCache) {
    // 1. 尝试使用PoolChunkList中的PoolTrunk分配 （已经存在的PoolChunk）
    if (q050.allocate(buf, reqCapacity, normCapacity, threadCache) ||
        q025.allocate(buf, reqCapacity, normCapacity, threadCache) ||
        q000.allocate(buf, reqCapacity, normCapacity, threadCache) ||
        qInit.allocate(buf, reqCapacity, normCapacity, threadCache) ||
        q075.allocate(buf, reqCapacity, normCapacity, threadCache)) {
        return;
    }
    // 2. 新建一个PoolChunk放入qInit
    PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
    boolean success = c.allocate(buf, reqCapacity, normCapacity, threadCache);
    assert success;
    qInit.add(c);
}

这里只关注第一步PoolChunkList分配，第二步PoolChunk分配已经讲过了。

1、回顾PoolChunkList

PoolChunkList根据Chunk使用率规格（minUsage、maxUsage）会分为不同的PoolChunkList实例，同一使用率规格的Chunk保存在同一个PoolChunkList实例中，并通过链表的方式存储（head）。Arena维护不同使用率规格Chunk的PoolChunkList（qXXX），彼此通过PoolChunkList的前（prevList）后（nextList）指针连接。

初始Arena中的所有PoolChunkList的head指针都是空，新建Chunk之后会加入qinit对应的PoolChunkList实例，后续Chunk使用率波动，会在各个PoolChunkList（q000-q100）中来回移动。

final class PoolChunkList<T> implements PoolChunkListMetric {
    // 所属Arena
    private final PoolArena<T> arena;
    // 后驱PoolChunkList
    private final PoolChunkList<T> nextList;
    // 前驱PoolChunkList
    private PoolChunkList<T> prevList;
    // Chunk使用率规格下限
    private final int minUsage;
    // Chunk使用率规格上限
    private final int maxUsage;
    // 由当前实例管理的Chunk可分配内存上限（通过minUsage计算得到）
    private final int maxCapacity;
    // Chunk链表头节点 初始为NULL
    private PoolChunk<T> head;
    // 剩余可用内存下限值，小于等于这个值的Chunk需要移动到nextList
    private final int freeMinThreshold;
    // 剩余可用内存上限值，大于这个值的Chunk需要移动到prevList
    private final int freeMaxThreshold;
}

2、allocate

boolean allocate(PooledByteBuf<T> buf, int reqCapacity, int normCapacity, PoolThreadCache threadCache) {
    // 如果标准化申请容量大于maxCapacity 不处理
    if (normCapacity > maxCapacity) {
        return false;
    }
    // 游标遍历链表
    for (PoolChunk<T> cur = head; cur != null; cur = cur.next) {
        // 尝试使用当前游标对应的PoolChunk分配内存
        if (cur.allocate(buf, reqCapacity, normCapacity, threadCache)) {
            // 如果分配成功了，看看Chunk剩余可用内存是不是小于当前规格PoolChunkList的最小阈值
            if (cur.freeBytes <= freeMinThreshold) {
                // 如果是的话，从当前PoolChunkList链表中移除
                remove(cur);
                // 并且加入下一个规格的PoolChunkList链表
                nextList.add(cur);
            }
            return true;
        }
    }
    return false;
}

PoolChunkList#allocate先会判断当前规格的PoolChunkList的maxCapacity是否足够分配标准化申请容量，然后遍历当前实例持有的Chunk链表，直到找到Chunk调用Chunk#allocate成功。

如果分配成功，那么Chunk的剩余可分配字节freeBytes会减少，可能小于当前PoolChunkList的freeMinThreshold阈值，需要移动这个PoolChunk到下一个规格的PoolChunkList中。

首先PoolChunkList#remove方法移除链表节点。

private void remove(PoolChunk<T> cur) {
    if (cur == head) {
        head = cur.next;
        if (head != null) {
            head.prev = null;
        }
    } else {
        PoolChunk<T> next = cur.next;
        cur.prev.next = next;
        if (next != null) {
            next.prev = cur.prev;
        }
    }
}

接着调用下一个规格的PoolChunkList的add方法，最终调用add0方法，加入这个PoolChunk。每次新加入的PoolChunk都会作为头节点插入链表。

void add0(PoolChunk<T> chunk) {
    chunk.parent = this;
    if (head == null) {
        head = chunk;
        chunk.prev = null;
        chunk.next = null;
    } else {
        chunk.prev = null;
        chunk.next = head;
        head.prev = chunk;
        head = chunk;
    }
}

五、内存释放

当客户端使用ByteBuf结束，调用AbstractReferenceCountedByteBuf#release(int)方法时，如果引用计数为0（ReferenceCountUpdater内部实际是计数器变为奇数），调用子类deallocate方法。

// ReferenceCountUpdater实现引用计数
private static final ReferenceCountUpdater<AbstractReferenceCountedByteBuf> updater = new ReferenceCountUpdater<AbstractReferenceCountedByteBuf>() {
    // ...
};
@Override
public boolean release(int decrement) {
    return handleRelease(updater.release(this, decrement));
}

private boolean handleRelease(boolean result) {
    if (result) {
        deallocate();
    }
    return result;
}

PooledByteBuf#deallocate池化ByteBuf的实现如下，将成员变量置空，recycle方法将PooledByteBuf实例回收到对象池，重点在于PoolArena的free方法。

@Override
protected final void deallocate() {
    if (handle >= 0) {
        final long handle = this.handle;
        this.handle = -1;
        memory = null;
        chunk.arena.free(chunk, tmpNioBuf, handle, maxLength, cache);
        tmpNioBuf = null;
        chunk = null;
        recycle();
    }
}
private void recycle() {
    recyclerHandle.recycle(this);
}

PoolArena的free方法，如果chunk是16MB以上的特殊大内存，unpooled=true，会直接回收底层内存资源（如ByteBuffer或byte数组）。其他情况优先尝试放入PoolThreadCache线程缓存。

void free(PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle, int normCapacity, PoolThreadCache cache) {
    // Huge规格，直接释放底层ByteBuffer
    if (chunk.unpooled) {
        destroyChunk(chunk);
    }
    // 其他
    else {
        SizeClass sizeClass = sizeClass(normCapacity);
        // 尝试放入线程缓存
        if (cache != null && cache.add(this, chunk, nioBuffer, handle, normCapacity, sizeClass)) {
            return;
        }
        // 释放
        freeChunk(chunk, handle, sizeClass, nioBuffer, false);
    }
}

接下来四种情况会继续进入PoolChunk的free方法。

PoolThreadCache内部对应的MemoryRegionCache里的mpsc队列满了，无法继续缓存元素。
PoolThreadCache缓存成功。PoolThreadCache作为FastThreadLocal，当调用FastThreadLocal#remove方法时会触发onRemoval钩子，执行整个PoolThreadCache的回收。
PoolThreadCache缓存成功。PoolThreadCache的finalize方法触发，执行整个PoolThreadCache的回收。
PoolThreadCache缓存成功。PoolThreadCache分配内存达到8192次后，执行trim修剪。

接下来进入PoolChunk的free方法，之前讲过。

void free(long handle, ByteBuffer nioBuffer) {
    // handle低32位是memoryMap的下标
    int memoryMapIdx = memoryMapIdx(handle);
    // handle高32位是subpage的位图索引
    int bitmapIdx = bitmapIdx(handle);
    // 位图索引不为0，表示是Subpage分配出去的内存，归还给Subpage
    if (bitmapIdx != 0) {
        // 找到对应规格PoolSubpage的头节点
        PoolSubpage<T> subpage = subpages[subpageIdx(memoryMapIdx)];
        PoolSubpage<T> head = arena.findSubpagePoolHead(subpage.elemSize);
        synchronized (head) {
            // 返回true，表示subpage还在使用，可以不用整体回收subpage给trunk
            // 返回false，表示subpage已经从Arena的subpage池中回收了，不会再被使用
            //            那么需要将当前memoryMapIdx回收给trunk
            if (subpage.free(head, bitmapIdx & 0x3FFFFFFF)) {
                return;
            }
        }
    }

    // 如果位图索引为0 或 归还给subpage失败 则 归还给Chunk的memoryMap

    // 增加可用分配字节
    freeBytes += runLength(memoryMapIdx);
    // 设置memoryMap[memoryMapIdx] = 原始值 = depth[memoryMapIdx]
    setValue(memoryMapIdx, depth(memoryMapIdx));
    // 自下而上更新memoryMap[memoryMapIdx]之上的节点
    updateParentsFree(memoryMapIdx);
    // 把ByteBuffer缓存起来，以备下次使用，下次只要重置其中的index即可使用，减少频繁new对象带来的GC
    if (nioBuffer != null && cachedNioBuffers != null &&
        cachedNioBuffers.size() < PooledByteBufAllocator.DEFAULT_MAX_CACHED_BYTEBUFFERS_PER_CHUNK) {
        cachedNioBuffers.offer(nioBuffer);
    }
}

接下来由于Chunk可分配内存的变动，又会触发Chunk在几个规格的ChunkList中移动。下面的move0方法会被递归调用，由于Chunk内存使用率的减少，最终可能变为0，没有ChunkList会接收，最终会返回false。

// PoolChunkList.java
boolean free(PoolChunk<T> chunk, long handle, ByteBuffer nioBuffer) {
    // 归还给Subpage或Chunk 见上面
    chunk.free(handle, nioBuffer);
    // 调整Chunk属于哪个PoolChunkList
    // 递归到没有前驱PoolChunkList会返回false，代表Chunk不再使用可以释放
    if (chunk.freeBytes > freeMaxThreshold) {
        remove(chunk);
        return move0(chunk);
    }
    return true;
}

private boolean move0(PoolChunk<T> chunk) {
    // 前驱节点为空，当前节点是q000，且chunk使用率为0，表示chunk没用了
    if (prevList == null) {
        assert chunk.usage() == 0;
        return false;
    }
    // 递归
    return prevList.move(chunk);
}

private boolean move(PoolChunk<T> chunk) {
    if (chunk.freeBytes > freeMaxThreshold) {
        return move0(chunk);
    }
    // ... 
}

最后回到PoolArena#freeChunk，如果没有PoolChunkList接收使用率变化后的PoolChunk，Chunk会被销毁，回收内存资源。

void freeChunk(PoolChunk<T> chunk, long handle, SizeClass sizeClass, ByteBuffer nioBuffer, boolean finalizer) {
    final boolean destroyChunk;
    synchronized (this) {
        // ...
        // PoolChunkList#free
        destroyChunk = !chunk.parent.free(chunk, handle, nioBuffer);
    }
    // 没有PoolChunkList接收变动后的Chunk，会回收整个Chunk
    if (destroyChunk) {
        destroyChunk(chunk);
    }
}

destroyChunk是PoolArena的抽象方法，子类实现。DirectArena实现如下，底层就是JDKByteBuffer的资源释放了。

@Override
protected void destroyChunk(PoolChunk<ByteBuffer> chunk) {
    if (PlatformDependent.useDirectBufferNoCleaner()) {
        PlatformDependent.freeDirectNoCleaner(chunk.memory);
    } else {
        PlatformDependent.freeDirectBuffer(chunk.memory);
    }
}

总结

PooledByteBufAllocator在cinit时，根据配置信息决定一些关键的规格信息：比如页大小（8K）、Chunk树深度（11）、Arena数组长度（核数*2）。通过页大小和Chunk树深度间接决定了Chunk的大小（16M）。PooledByteBufAllocator构造方法构造了PoolArena数组。
PooledByteBufAllocator在分配内存时会获取PoolThreadLocalCache线程变量里的PoolThreadCache实例。如果此时当前线程没有分配PoolThreadCache，会触发PoolThreadLocalCache的initialValue方法，选择一个使用率最少的Arena给当前线程持有。
对于Tiny和Small规格的内存分配，要经历四层：线程缓存->Subpage池->ChunkList->新Chunk；对于Normal规格的内存分配，经历三层，不会走Subpage池，因为Normal规格大于等于页大小，不是由Subpage分配的，而是由Chunk直接分配的；对于Huge规格的内存分配，只会用一个非池化的特殊Chunk来分配内存。
Netty向系统申请内存的单位是Chunk=16M，Chunk会加入PoolChunkList。Chunk根据使用率的多少，会在不同规格使用率PoolChunkList节点（qXXX）中来回移动。当Chunk分配出去内存时，会判断Chunk剩余可用内存是不是小于当前规格PoolChunkList的最小阈值，如果小于阈值移动到后一个PoolChunkList；反之，如果Chunk可用内存增加时（比如Page级别内存被放回Chunk），可能会移动到前一个PoolChunkList。
当ByteBuf引用计数为0时，会触发内存释放。与内存分配的四层结构相同，内存释放时也是经过四层：线程缓存->Subpage池->Chunk->System。其中Huge规格直接释放，归还给系统；Normal规格不会经过Subpage池。

Netty源码（六）内存分配主流程

前言