引言：Netty 内存管理设计与原理总结

内存规格分类

Netty 内存管理-内存规格.jpg Netty 把内存规格分成了4大类，对应枚举类 io.netty.buffer.PoolArena.SizeClass:

Tiny：代表 0B，16B，32B，48B......496B 之间的内存块，例如共 32 种情况。
Small：代表 512B，1K，2K，4K 之间的内存块，共 4 种情况。
Normal：代表 8K，16K，32K，64K......16M 之间的内存块，共 126 中情况。
Huge：代表大于 16M 的内存块。

不同规格的内存分配策略会有所不同，其中 Tiny 和 Small 的分配策略比较类似，Tiny，Small，Normal 都会使用池化的方式进行内存分配，而 Huge 使用非池化的方式进行内存分配。

内存分配单位

Netty 定义了几种内存分配单位，分别为 Chunk、Page、Subpage

Chunk：Netty 向操作系统申请内存的单位，每个 Chunk 的默认大小为 16M，也就是一般情况下，Netty 每次会向申请 16M 的内存。Chunk 可以看做是 Page 的集合。
Page：Chunk 管理内存的单位，Page 的大小为 8K。Page 可以看做是 SubPage 的集合。
SubPage：将 Page 划分为多个相同大小的子块进行分配，这里的子块就相当于 Subpage，用于 Tiny 和 Small 规格的分配。

Netty 内存管理-内存分配单位.jpg

内存池架构设计

Netty 内存管理-内存池架构设计.jpg

PoolArena

Netty 会创建固定数量的 PoolArena 进行内存分配，每个线程都会通过轮询的方式选择并绑定到一个 PoolArena 中，PoolArena 的数量与 CPU 核数有关，通过创建多个 PoolArena 可以缓解资源竞争问题。

实际上线程绑定的是 PoolThreadCache，PoolThreadCache 初始化时会绑定一个 PoolArena，具体逻辑在 io.netty.buffer.PooledByteBufAllocator.PoolThreadLocalCache#initialValue

PoolArena 有两个实现类：
- HeapArena：负责堆内内存分配。
- DirectArena：负责对外内存分配。

PoolChunkList

PoolArena 根据 PoolChunk 的内存使用率，把 PoolChunk 分类到 6 个 PoolChunkList 里（PoolChunkList 实际上是 PoolChunk 的集合）：
- qInit：内存使用率为 (0%, 25%) 的 PoolChunk
- q000：内存使用率为 [1, 50%) 的 PoolChunk
- q025：内存使用率为 [25, 75%) 的 PoolChunk
- q050：内存使用率为 [50, 100%) 的 PoolChunk
- q075：内存使用率为 [75, 100%) 的 PoolChunk
- q100：内存使用率为 100% 的 PoolChunk
在内存分配和释放的过程中，PoolChunk 会在这 6 个 PoolChunkList 中移动。
每个 PoolChunkList 的上下限都有交叉重叠的部分，因为 PoolChunk 需要在 PoolChunkList 不断移动，如果每个 PoolChunkList 的内存使用率的临界值都是恰好衔接的，例如 1 ~ 50%、50% ~ 75%，那么如果 PoolChunk 的使用率一直处于 50% 的临界值，会导致 PoolChunk 在两个 PoolChunkList 不断移动，造成性能损耗。
在内存分配时，这 6 个 PoolChunkList 的访问顺序是 q050->q025->q000->qInit->q075，依次判断是否有足够的内存进行分配。从 q050 开始分配可以更大程度的利用内存。

// io.netty.buffer.PoolArena#allocateNormal
private void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
    if (q050.allocate(buf, reqCapacity, normCapacity) || q025.allocate(buf, reqCapacity, normCapacity) ||
        q000.allocate(buf, reqCapacity, normCapacity) || qInit.allocate(buf, reqCapacity, normCapacity) ||
        q075.allocate(buf, reqCapacity, normCapacity)) {
        return;
    }

    // Add a new chunk.
    PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
    boolean success = c.allocate(buf, reqCapacity, normCapacity);
    assert success;
    qInit.add(c);
}

PoolChunk

PoolChunk 可以理解为 Page 的集合，Page 只是一种抽象的概念，实际在 Netty 中 Page 所指的是 PoolChunk 所管理的子内存块，每个子内存块采用 PoolSubpage 表示。
Netty 将 PoolChunk 分配成 2048 个 Page，并使用伙伴算法来管理，最终形成一颗满二叉树，二叉树中所有子节点的内存都属于其父节点管理。

final class PoolChunk<T> implements PoolChunkMetric {
    final PoolArena<T> arena;
    final T memory; // 存储的数据
    private final byte[] memoryMap; // 满二叉树中的节点是否被分配，数组大小为 4096
    private final byte[] depthMap; // 满二叉树中的节点高度，数组大小为 4096
    private final PoolSubpage<T>[] subpages; // PoolChunk 中管理的 2048 个 8K 内存块
    private int freeBytes; // 剩余的内存大小
    PoolChunkList<T> parent;
    PoolChunk<T> prev;
    PoolChunk<T> next;

    // 省略其他代码
}

PoolSubpage

Netty 没有 Page 的概念，使用 subpage 表示。
在小内存分配的场景下，即分配的内存大小小于一个 Page 8K，会使用 PoolSubpage 进行管理。
PoolSubpage 会把管理的内存分成大小相同的子内存，通过位图 bitmap 记录子内存是否已经被使用，bit 的取值为 0 或者 1。
在初次小内存分配的场景下，会从 PoolChunk 中选出一个可用的 PoolSubpage，然后把该 PoolSubpage 添加到 PoolArena 相应的 tinySubpagePools 或 smallSubpagePools 下，下次小内存分配时直接从 tinySubpagePools 或 smallSubpagePools 分配即可，从而提高分配效率。

提高分配效率的原因是：如果 tinySubpagePools 或 smallSubpagePools 中有足够的内存分配时，主需要锁住对应内存规格的 head 节点即可，而不足够分配时，需要锁住整个 PoolArena 对象去申请内存分配，降低了并发度。

PoolThreadCache & MemoryRegionCache

当内存释放时，Netty 并没有将缓存归还给 PoolChunk，而是使用 PoolThreadCache 缓存起来，当下次有同样规格的内存分配时，直接从 PoolThreadCache 取出使用即可。PoolThreadCache 缓存 Tiny、Small、Normal 三种类型的数据，而且根据堆内和堆外内存的类型进行了区分。

内存规格	内存大小	数组长度
Tiny	0B,16B,32B,48B......496B	32
Small	512B,1K,2K,4K	4
Normal	8K,16K,32K	3

注意：Normal 内存规则的内存大小只是 8K - 32K，而不是 8K - 16M。

final class PoolThreadCache {
    final PoolArena<byte[]> heapArena;
    final PoolArena<ByteBuffer> directArena;
    private final MemoryRegionCache<byte[]>[] tinySubPageHeapCaches;
    private final MemoryRegionCache<byte[]>[] smallSubPageHeapCaches;
    private final MemoryRegionCache<ByteBuffer>[] tinySubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] smallSubPageDirectCaches;
    private final MemoryRegionCache<byte[]>[] normalHeapCaches;
    private final MemoryRegionCache<ByteBuffer>[] normalDirectCaches;
    
    private abstract static class MemoryRegionCache<T> {
        private final int size;
        private final Queue<Entry<T>> queue;
        private final SizeClass sizeClass;
        private int allocations;
        
        static final class Entry<T> {
            final Handle<Entry<?>> recyclerHandle;
            PoolChunk<T> chunk;
            ByteBuffer nioBuffer;
            long handle = -1;
        }
    }

    // 省略其他代码
}

MemoryRegionCache 实际就是一个队列，当内存释放时，将内存块加入队列当中，下次再分配同样规格的内存时，直接从队列中取出空闲的内存块。
值得一提的是，Entry 对象是由轻量级对象池 io.netty.util.Recycler 去管理的，关于 io.netty.util.Recycler 也是一个有趣的知识点。在日常工作中，如果遇到对象创建成本高，希望重复利用的场景，也可以使用对象池 io.netty.util.Recycler 去管理。

内存分配

PoolArena 是内存分配的核心类，里面包含内存分配的整体流程，核心方法：

// io.netty.buffer.PoolArena#allocate(io.netty.buffer.PoolThreadCache, io.netty.buffer.PooledByteBuf<T>, int)
private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
    final int normCapacity = normalizeCapacity(reqCapacity);
    if (isTinyOrSmall(normCapacity)) { // capacity < pageSize
        int tableIdx;
        PoolSubpage<T>[] table;
        boolean tiny = isTiny(normCapacity);
        if (tiny) { // < 512
            if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
                // was able to allocate out of the cache so move on
                return;
            }
            tableIdx = tinyIdx(normCapacity);
            table = tinySubpagePools;
        } else {
            if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
                // was able to allocate out of the cache so move on
                return;
            }
            tableIdx = smallIdx(normCapacity);
            table = smallSubpagePools;
        }

        final PoolSubpage<T> head = table[tableIdx];

        /**
         * Synchronize on the head. This is needed as {@link PoolChunk#allocateSubpage(int)} and
         * {@link PoolChunk#free(long)} may modify the doubly linked list as well.
         */
        synchronized (head) {
            final PoolSubpage<T> s = head.next;
            if (s != head) {
                assert s.doNotDestroy && s.elemSize == normCapacity;
                long handle = s.allocate();
                assert handle >= 0;
                s.chunk.initBufWithSubpage(buf, null, handle, reqCapacity);
                incTinySmallAllocation(tiny);
                return;
            }
        }
        synchronized (this) {
            allocateNormal(buf, reqCapacity, normCapacity);
        }

        incTinySmallAllocation(tiny);
        return;
    }
    if (normCapacity <= chunkSize) {
        if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {
            // was able to allocate out of the cache so move on
            return;
        }
        synchronized (this) {
            allocateNormal(buf, reqCapacity, normCapacity);
            ++allocationsNormal;
        }
    } else {
        // Huge allocations are never served via the cache so just call allocateHuge
        allocateHuge(buf, reqCapacity);
    }
}

可以看到，对于 Tiny，Small，Normal 内存规格的分配时，优先使用当前线程绑定的 PoolTheadCache 来分配，分配失败时再由 PoolArena 来控制分配，而 Huge 内存规格直接使用非池化的方式分配。
当申请 Tiny，Small 内存规格的内存时，申请的顺序从为当前线程中申请 -> 从 PoolArena 中相应的 subpagePools 的头结点申请 -> 从 PoolArena 管理的 PoolChunk 中申请，锁的粒度越来越大，性能也会越来越低。

Netty 内存管理-第 2 页.jpg

有一点要提一下，看源码时经常看到一个 long 类型变量 handle，通过该变量的前 32 位，可以获取到 PoolSubpage 子内存在 bitmap 数组中的下标，通过后 32 位，可以获取到 PoolChunk 管理的满二叉树中节点的位置。

内存释放

内存释放时机

通过 PoolThreadCache 分配内存成功 8192 次后，就会触发检查内存使用频率和并可能进行内存回收。

// io.netty.buffer.PoolThreadCache#allocate
private boolean allocate(MemoryRegionCache<?> cache, PooledByteBuf buf, int reqCapacity) {
    if (cache == null) {
        // no cache found so just return false here
        return false;
    }
    boolean allocated = cache.allocate(buf, reqCapacity);
    if (++ allocations >= freeSweepAllocationThreshold) {
        allocations = 0;
        trim();
    }
    return allocated;
}
void trim() {
    trim(tinySubPageDirectCaches);
    trim(smallSubPageDirectCaches);
    trim(normalDirectCaches);
    trim(tinySubPageHeapCaches);
    trim(smallSubPageHeapCaches);
    trim(normalHeapCaches);
}

// 最终会调用 io.netty.buffer.PoolThreadCache.MemoryRegionCache#trim
public final void trim() {
    int free = size - allocations;
    allocations = 0;

    // We not even allocated all the number that are
    if (free > 0) {
        free(free, false);
    }
}
private  void freeEntry(Entry entry, boolean finalizer) {
    PoolChunk chunk = entry.chunk;
    long handle = entry.handle;
    ByteBuffer nioBuffer = entry.nioBuffer;

    if (!finalizer) {
        // recycle now so PoolChunk can be GC'ed. This will only be done if this is not freed because of
        // a finalizer.
        entry.recycle();
    }

    chunk.arena.freeChunk(chunk, handle, sizeClass, nioBuffer, finalizer);
}

在线程退出的时候回收该线程的所有内存，PoolThreadCache 重载了 finalize() 方法，在销毁前执行缓存回收的逻辑。

// io.netty.buffer.PoolThreadCache#finalize
/// TODO: In the future when we move to Java9+ we should use java.lang.ref.Cleaner.
@Override
protected void finalize() throws Throwable {
    try {
        super.finalize();
    } finally {
        free(true);
    }
}

/**
 *  Should be called if the Thread that uses this cache is about to exist to release resources out of the cache
 */
void free(boolean finalizer) {
    // As free() may be called either by the finalizer or by FastThreadLocal.onRemoval(...) we need to ensure
    // we only call this one time.
    if (freed.compareAndSet(false, true)) {
        int numFreed = free(tinySubPageDirectCaches, finalizer) +
                free(smallSubPageDirectCaches, finalizer) +
                free(normalDirectCaches, finalizer) +
                free(tinySubPageHeapCaches, finalizer) +
                free(smallSubPageHeapCaches, finalizer) +
                free(normalHeapCaches, finalizer);

        if (numFreed > 0 && logger.isDebugEnabled()) {
            logger.debug("Freed {} thread-local buffer(s) from thread: {}", numFreed,
                    Thread.currentThread().getName());
        }

        if (directArena != null) {
            directArena.numThreadCaches.getAndDecrement();
        }

        if (heapArena != null) {
            heapArena.numThreadCaches.getAndDecrement();
        }
    }
}

io.netty.buffer.PooledByteBufAllocator.PoolThreadLocalCache#onRemoval

// io.netty.buffer.PooledByteBufAllocator.PoolThreadLocalCache#onRemoval
@Override
protected void onRemoval(PoolThreadCache threadCache) {
    threadCache.free(false);
}

Netty 笔记 - Netty 内存管理