前言

一般用到池化技术的地方都使用到了线程缓存技术（如连接池HikariCP），用于提高池化实例的分配效率，本章学习Netty内存池如何使用线程缓存。

首先分析Netty的Allocator内存分配器的继承结构，引出线程缓存PoolThreadCache。

重点分析PoolThreadCache数据结构、内存分配。

一、Allocator

1、ByteBufAllocator

public interface ByteBufAllocator {
	// 默认Allocator
    ByteBufAllocator DEFAULT = ByteBufUtil.DEFAULT_ALLOCATOR;

    /**
     * Allocate a {@link ByteBuf}. If it is a direct or heap buffer
     * depends on the actual implementation.
     */
    ByteBuf buffer();
    ByteBuf buffer(int initialCapacity);
    ByteBuf buffer(int initialCapacity, int maxCapacity);

    /**
     * Allocate a {@link ByteBuf}, preferably a direct buffer which is suitable for I/O.
     */
    ByteBuf ioBuffer();
    ByteBuf ioBuffer(int initialCapacity);
    ByteBuf ioBuffer(int initialCapacity, int maxCapacity);

    /**
     * Allocate a heap {@link ByteBuf}.
     */
    ByteBuf heapBuffer();
    ByteBuf heapBuffer(int initialCapacity);
    ByteBuf heapBuffer(int initialCapacity, int maxCapacity);

    /**
     * Allocate a direct {@link ByteBuf}.
     */
    ByteBuf directBuffer();
    ByteBuf directBuffer(int initialCapacity);
    ByteBuf directBuffer(int initialCapacity, int maxCapacity);

    /**
     * Allocate a {@link CompositeByteBuf}.
     * If it is a direct or heap buffer depends on the actual implementation.
     */
    CompositeByteBuf compositeBuffer();
    CompositeByteBuf compositeBuffer(int maxNumComponents);
    CompositeByteBuf compositeHeapBuffer();
    CompositeByteBuf compositeHeapBuffer(int maxNumComponents);
    CompositeByteBuf compositeDirectBuffer();
    CompositeByteBuf compositeDirectBuffer(int maxNumComponents);

    /**
     * Returns {@code true} if direct {@link ByteBuf}'s are pooled
     */
    boolean isDirectBufferPooled();

    /**
     * Calculate the new capacity of a {@link ByteBuf} that is used when a {@link ByteBuf} needs to expand by the
     * {@code minNewCapacity} with {@code maxCapacity} as upper-bound.
     */
    int calculateNewCapacity(int minNewCapacity, int maxCapacity);
 }

ByteBufAllocator接口规定了Allocator需要实现的规范。Allocator需要实现以下功能：

创建Buffer：可以使用堆内存也可以使用堆外内存。
创建IOBuffer：创建适用于IO操作的Buffer，尽量使用堆外内存。
创建Heap/Direct Buffer：Allocator都必须能通过堆内存或堆外内存创建Buffer。
创建组合Buffer：可以使用堆内存也可以使用堆外内存。组合Buffer是Netty零拷贝特性的其中之一。
计算Buffer扩容大小：在[minNewCapacity,maxCapacity]区间内，选择实际扩容大小。

此外ByteBufAllocator接口还定义了默认Allocator：ByteBufAllocator DEFAULT = ByteBufUtil.DEFAULT_ALLOCATOR。通常选择池化Allocator：PooledByteBufAllocator.DEFAULT。

public final class ByteBufUtil {
    static final ByteBufAllocator DEFAULT_ALLOCATOR;
    static {
        String allocType = SystemPropertyUtil.get(
                "io.netty.allocator.type", PlatformDependent.isAndroid() ? "unpooled" : "pooled");
        allocType = allocType.toLowerCase(Locale.US).trim();
        ByteBufAllocator alloc;
        if ("unpooled".equals(allocType)) {
            alloc = UnpooledByteBufAllocator.DEFAULT;
        } else if ("pooled".equals(allocType)) {
            alloc = PooledByteBufAllocator.DEFAULT;
        } else {
            alloc = PooledByteBufAllocator.DEFAULT;
        }
        DEFAULT_ALLOCATOR = alloc;
    }
}

2、AbstractByteBufAllocator

创建Buffer骨架

根据成员变量directByDefault区分堆和非堆，来创建普通和组合Buffer。

创建普通Buffer：

@Override
public ByteBuf buffer(int initialCapacity, int maxCapacity) {
    if (directByDefault) {
        return directBuffer(initialCapacity, maxCapacity);
    }
    return heapBuffer(initialCapacity, maxCapacity);
}

创建组合Buffer：

@Override
public CompositeByteBuf compositeBuffer() {
    if (directByDefault) {
        return compositeDirectBuffer();
    }
    return compositeHeapBuffer();
}
@Override
public CompositeByteBuf compositeDirectBuffer() {
    return compositeDirectBuffer(DEFAULT_MAX_COMPONENTS);
}
@Override
public CompositeByteBuf compositeDirectBuffer(int maxNumComponents) {
    return toLeakAwareBuffer(new CompositeByteBuf(this, true, maxNumComponents));
}

根据是否有Unsafe和是否支持池化直接内存创建IOBuffer：

@Override
public ByteBuf ioBuffer(int initialCapacity, int maxCapacity) {
    if (PlatformDependent.hasUnsafe() || isDirectBufferPooled()) {
        return directBuffer(initialCapacity, maxCapacity);
    }
    return heapBuffer(initialCapacity, maxCapacity);
}

最后，AbstractByteBufAllocator需要子类实现两个方法：使用堆内存创建Buffer和使用非堆内存创建Buffer。

/**
* Create a heap {@link ByteBuf} with the given initialCapacity and maxCapacity.
*/
protected abstract ByteBuf newHeapBuffer(int initialCapacity, int maxCapacity);

/**
* Create a direct {@link ByteBuf} with the given initialCapacity and maxCapacity.
*/
protected abstract ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity);

包装泄露检测Buffer

除了Allocator的骨架实现之外，AbstractByteBufAllocator提供了两个静态方法，用于包装Buffer为支持泄露检测的Buffer。

// 普通Buffer
protected static ByteBuf toLeakAwareBuffer(ByteBuf buf) {
    ResourceLeakTracker<ByteBuf> leak;
    switch (ResourceLeakDetector.getLevel()) {
        case SIMPLE:
            leak = AbstractByteBuf.leakDetector.track(buf);
            if (leak != null) {
                buf = new SimpleLeakAwareByteBuf(buf, leak);
            }
            break;
        case ADVANCED:
        case PARANOID:
            leak = AbstractByteBuf.leakDetector.track(buf);
            if (leak != null) {
                buf = new AdvancedLeakAwareByteBuf(buf, leak);
            }
            break;
        default:
            break;
    }
    return buf;
}
// 组合Buffer
protected static CompositeByteBuf toLeakAwareBuffer(CompositeByteBuf buf) {
    ResourceLeakTracker<ByteBuf> leak;
    switch (ResourceLeakDetector.getLevel()) {
        case SIMPLE:
            leak = AbstractByteBuf.leakDetector.track(buf);
            if (leak != null) {
                buf = new SimpleLeakAwareCompositeByteBuf(buf, leak);
            }
            break;
        case ADVANCED:
        case PARANOID:
            leak = AbstractByteBuf.leakDetector.track(buf);
            if (leak != null) {
                buf = new AdvancedLeakAwareCompositeByteBuf(buf, leak);
            }
            break;
        default:
            break;
    }
    return buf;
}

计算Buffer扩容大小

calculateNewCapacity根据用户申请的容量范围[minNewCapacity,maxCapacity]，选择合适的扩容大小返回。minNewCapacity以4MB为分界线，使用不同的策略返回。

public int calculateNewCapacity(int minNewCapacity, int maxCapacity) {
    final int threshold = CALCULATE_THRESHOLD; // 4 MiB page
    // 下限等于4MB，直接返回4MB
    if (minNewCapacity == threshold) {
        return threshold;
    }
    // 下限大于4MB，向上取4MB的整数倍，但是不会超过上限
    if (minNewCapacity > threshold) {
        int newCapacity = minNewCapacity / threshold * threshold;
        if (newCapacity > maxCapacity - threshold) {
            newCapacity = maxCapacity;
        } else {
            newCapacity += threshold;
        }
        return newCapacity;
    }
    // 下限小于4MB，找64以上的2的n次幂 直到大于等于下限，但是不会超过上限
    int newCapacity = 64;
    while (newCapacity < minNewCapacity) {
        newCapacity <<= 1;
    }
    return Math.min(newCapacity, maxCapacity);
}

3、PooledByteBufAllocator

这里重点关注PooledByteBufAllocator的成员变量，先不考虑内存分配的流程，只是为了引出Netty线程缓存。

private final PoolArena<byte[]>[] heapArenas;
private final PoolArena<ByteBuffer>[] directArenas;
private final PoolThreadLocalCache threadCache;

heapArenas：PoolArena数组，负责创建堆内存Buffer。
directArenas：PoolArena数组，负责创建直接内存Buffer。
threadCache：FastThreadLocal实例，持有当前线程对应的PoolThreadCache，PoolThreadCache持有PoolArena。

两个Arena的数组长度一般等于核数*2，计算逻辑在静态代码块中。

// 最小Arena数量 = 核数 * 2
final int defaultMinNumArena = NettyRuntime.availableProcessors() * 2;
// 默认Chunk大小 16MB
final int defaultChunkSize = DEFAULT_PAGE_SIZE << DEFAULT_MAX_ORDER;
// 堆Arena数组大小计算逻辑
DEFAULT_NUM_HEAP_ARENA = Math.max(0,
                                  SystemPropertyUtil.getInt(
                                      "io.netty.allocator.numHeapArenas",
                                      (int) Math.min(
                                          defaultMinNumArena,
                                          runtime.maxMemory() / defaultChunkSize / 2 / 3)));
// 非堆Arena数组大小计算逻辑
DEFAULT_NUM_DIRECT_ARENA = Math.max(0,
                                    SystemPropertyUtil.getInt(
                                        "io.netty.allocator.numDirectArenas",
                                        (int) Math.min(
                                            defaultMinNumArena,
                                            PlatformDependent.maxDirectMemory() / defaultChunkSize / 2 / 3)));

二、PoolThreadLocalCache

PoolThreadLocalCache是PooledByteBufAllocator的内部类，继承FastThreadLocal。重点关注它的initialValue方法，构造PoolThreadCache线程缓存实例。

final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache> {
    private final boolean useCacheForAllThreads;

    PoolThreadLocalCache(boolean useCacheForAllThreads) {
        this.useCacheForAllThreads = useCacheForAllThreads;
    }

    @Override
    protected synchronized PoolThreadCache initialValue() {
        // 每个线程从 公共数组heapArenas和directArenas中
        // 使用最少的一个Arena作为当前线程的Arena
        final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);
        final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);
        final Thread current = Thread.currentThread();
        if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {
            // 构造PoolThreadCache
            final PoolThreadCache cache = new PoolThreadCache(
                heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize,
                DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);
            return cache;
        }
        return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0);
    }
 }

PoolThreadLocalCache是PooledByteBufAllocator的内部类，所以可以获取heapArenas和directArenas。leastUsedArena方法选择使用率最少的Arena作为当前线程缓存持有的Arena。每个Arena通过numThreadCaches统计当前Arena被几个线程共同使用。

private <T> PoolArena<T> leastUsedArena(PoolArena<T>[] arenas) {
    if (arenas == null || arenas.length == 0) {
        return null;
    }
    PoolArena<T> minArena = arenas[0];
    for (int i = 1; i < arenas.length; i++) {
        PoolArena<T> arena = arenas[i];
        if (arena.numThreadCaches.get() < minArena.numThreadCaches.get()) {
            minArena = arena;
        }
    }
    return minArena;
}

线程是通过最少使用原则选择Arena的，Arena可以被多个线程使用。Allocator分配内存的流程时，取当前线程对应的PoolThreadCache中持有的Arena来分配内存，所以合理的线程数量和Arena数组长度可以减少资源竞争（锁）。

三、PoolThreadCache

PoolThreadCache实例通过FastThreadLocal管理，负责管理当前线程缓存的内存块，利用单线程分配缓存的内存块，避免了锁竞争，效率较高。（区别于Arena，Arena虽然被PoolThreadCache持有，但是会被多线程操作，考虑Arena数量小于线程数量的情况）

1、成员变量

// 堆Arena
final PoolArena<byte[]> heapArena;
// 非堆Arena
final PoolArena<ByteBuffer> directArena;
// 堆内存
private final MemoryRegionCache<byte[]>[] tinySubPageHeapCaches;
private final MemoryRegionCache<byte[]>[] smallSubPageHeapCaches;
private final MemoryRegionCache<byte[]>[] normalHeapCaches;
// 非堆内存
private final MemoryRegionCache<ByteBuffer>[] tinySubPageDirectCaches;
private final MemoryRegionCache<ByteBuffer>[] smallSubPageDirectCaches;
private final MemoryRegionCache<ByteBuffer>[] normalDirectCaches;
// 当前PoolThreadCache成功进行内存分配计数器
private int allocations;
// 一个阈值，allocations达到之后会触发trim，尝试将缓存的内存块归还给Chunk
private final int freeSweepAllocationThreshold;
// 一个标志位 为了确保当前实例只会执行一次全量资源释放
private final AtomicBoolean freed = new AtomicBoolean();

2、构造方法

PoolThreadCache构造方法需要确定成员变量。

PoolThreadCache(PoolArena<byte[]> heapArena, PoolArena<ByteBuffer> directArena,
                    int tinyCacheSize, int smallCacheSize, int normalCacheSize,
                    int maxCachedBufferCapacity, int freeSweepAllocationThreshold) {
    this.freeSweepAllocationThreshold = freeSweepAllocationThreshold;
    this.heapArena = heapArena;
    this.directArena = directArena;
    if (directArena != null) {
        // cacheSize=512 数组大小32
        tinySubPageDirectCaches = createSubPageCaches(
            tinyCacheSize, PoolArena.numTinySubpagePools, SizeClass.Tiny);
        // cacheSize=256 数组大小4
        smallSubPageDirectCaches = createSubPageCaches(
            smallCacheSize, directArena.numSmallSubpagePools, SizeClass.Small);
        numShiftsNormalDirect = log2(directArena.pageSize);
        // cacheSize=64 maxCachedBufferCapacity=32768用于计算数组大小
        normalDirectCaches = createNormalCaches(
            normalCacheSize, maxCachedBufferCapacity, directArena);
        directArena.numThreadCaches.getAndIncrement();
    } else {
        // No directArea is configured so just null out all caches
        tinySubPageDirectCaches = null;
        smallSubPageDirectCaches = null;
        normalDirectCaches = null;
        numShiftsNormalDirect = -1;
    }
    if (heapArena != null) {
        // Create the caches for the heap allocations
        // ...
    } else {
        // No heapArea is configured so just null out all caches
        // ...
    }
}

freeSweepAllocationThreshold：io.netty.allocator.cacheTrimInterval，线程缓存清理阈值，默认8192，当PoolThreadCache执行分配8192次之后会尝试清理缓存，将内存归还给Chunk或Subpage。
heapArena/directArena：由PoolThreadLocalCache选择的被最少线程使用的Arena。
XXXCaches数组：MemoryRegionCache实例数组，和Arena中的Subpage池类似。
- **规格：**tinySubPageDirectCaches维护tiny规格的内存，smallSubPageDirectCaches维护small规格的内存，normalDirectCaches维护small规格的内存。
- **数组长度：**tiny和small规格都使用了Arena的Subpage数组的长度，所以同样可以通过规格确定数组下标。normal规格是通过maxCachedBufferCapacity计算得到的。
- **cacheSize：**这个变量是MemoryRegionCache实例内部的mpsc（Multiple Producer Single Consumer）队列的长度，表示一个线程缓存（PoolThreadCache）指定规格（MemoryRegionCache实例数组下标）内存块最多缓存多少个（mpsc队列长度）。

四、MemoryRegionCache

1、成员变量与构造方法

private abstract static class MemoryRegionCache<T> {
    private final int size;
    private final Queue<Entry<T>> queue;
    private final SizeClass sizeClass;
    private int allocations;
    MemoryRegionCache(int size, SizeClass sizeClass) {
        // 标准化为2的n次幂
        this.size = MathUtil.safeFindNextPositivePowerOfTwo(size);
        // 构造mpsc队列，如果有unsafe是MpscArrayQueue实例
        queue = PlatformDependent.newFixedMpscQueue(this.size);
        // 规格
        this.sizeClass = sizeClass;
    }
}

size：当前实例的缓存上限，即队列queue的大小。
queue：mpsc队列，存放内存块的容器（实际是存放chunk引用和对应handle值）。
sizeClass：规格，如Tiny、Small、Normal。
allocations：累计成功分配次数，当trim触发时会重置为0。

2、实现类

MemoryRegionCache是个抽象类，需要子类实现initBuf方法，用于初始化PooledByteBuf。(回忆PoolChunk#initBuf）

 /**
 * Init the {@link PooledByteBuf} using the provided chunk and handle with the capacity restrictions.
 */
protected abstract void initBuf(PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle,
                                        PooledByteBuf<T> buf, int reqCapacity, PoolThreadCache threadCache);

SubPageMemoryRegionCache

SubPageMemoryRegionCache负责处理Tiny和Small规格内存块的初始化，最后还是调用了PoolChunk#initBufWithSubpage。

private static final class SubPageMemoryRegionCache<T> extends MemoryRegionCache<T> {
    SubPageMemoryRegionCache(int size, SizeClass sizeClass) {
        super(size, sizeClass);
    }

    @Override
    protected void initBuf(
        PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle, PooledByteBuf<T> buf, int reqCapacity,
        PoolThreadCache threadCache) {
        chunk.initBufWithSubpage(buf, nioBuffer, handle, reqCapacity, threadCache);
    }
}

NormalMemoryRegionCache

NormalMemoryRegionCache负责处理Normal规格内存块的初始化，最后还是调用了PoolChunk#initBuf。

private static final class NormalMemoryRegionCache<T> extends MemoryRegionCache<T> {
    NormalMemoryRegionCache(int size) {
        super(size, SizeClass.Normal);
    }

    @Override
    protected void initBuf(
        PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle, PooledByteBuf<T> buf, int reqCapacity,
        PoolThreadCache threadCache) {
        chunk.initBuf(buf, nioBuffer, handle, reqCapacity, threadCache);
    }
}

3、Entry

MemoryRegionCache缓存内存块使用mpsc队列存储元素，这个元素就是Entry，Entry的目的就是封装。

chunk：所属16MB内存块。
handle：偏移量信息，高32位是subpage位图下标，低32位是chunk的memoryMap下标。
nioBuffer：缓存ByteBuffer实例，目的是防止频繁new对象导致的GC，如PooledByteBuf#tmpNioBuf，缓存实例之后每次使用前只需要重新移动ByteBuffer的指针即可。
recyclerHandle：对象回收器Handle，对象池后续再说。

static final class Entry<T> {
    final Handle<Entry<?>> recyclerHandle;
    PoolChunk<T> chunk;
    ByteBuffer nioBuffer;
    long handle = -1;
}

五、PoolThreadCache使用

1、缓存内存块

当ByteBuf被客户端释放后，会尝试放入当前线程对应的PoolThreadCache中，以便下次同一线程的内存块分配，方法返回false表示使用线程缓存失败（mpsc队列已满）这时候会回收到Chunk中（公共区域）。

boolean add(PoolArena<?> area, PoolChunk chunk, ByteBuffer nioBuffer,
            long handle, int normCapacity, SizeClass sizeClass) {
    // 根据SizeClass和normCapacity 选择 MemoryRegionCache
    MemoryRegionCache<?> cache = cache(area, normCapacity, sizeClass);
    if (cache == null) {
        return false;
    }
    // 封装为entry放入cache里的队列
    return cache.add(chunk, nioBuffer, handle);
}

首先根据参数，选择缓存到哪个MemoryRegionCache中。

private MemoryRegionCache<?> cache(PoolArena<?> area, int normCapacity, SizeClass sizeClass) {
    switch (sizeClass) {
        case Normal:
            return cacheForNormal(area, normCapacity);
        case Small:
            return cacheForSmall(area, normCapacity);
        case Tiny:
            return cacheForTiny(area, normCapacity);
        default:
            throw new Error();
    }
}

对于Small规格，调用cacheForSmall方法。首先根据标准化大小，找到Small规格数组的下标idx。然后区分Arena是直接内存还是堆内存，取对应MemoryRegionCache数组对应idx位置的MemoryRegionCache返回。

private MemoryRegionCache<?> cacheForSmall(PoolArena<?> area, int normCapacity) {
    int idx = PoolArena.smallIdx(normCapacity);
    if (area.isDirect()) {
        return cache(smallSubPageDirectCaches, idx);
    }
    return cache(smallSubPageHeapCaches, idx);
}
private static <T> MemoryRegionCache<T> cache(MemoryRegionCache<T>[] cache, int idx) {
    if (cache == null || idx > cache.length - 1) {
        return null;
    }
    return cache[idx];
}

// PoolArena.java
// 根据标准化大小，找到Small规格数组的下标
static int smallIdx(int normCapacity) {
    int tableIdx = 0;
    int i = normCapacity >>> 10;
    while (i != 0) {
        i >>>= 1;
        tableIdx ++;
    }
    return tableIdx;
}

选择完MemoryRegionCache后，尝试加入MemoryRegionCache里的queue队列。如果尝试失败会直接回收资源（将Entry实例回收到对象池）返回false，否则返回true。

public final boolean add(PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle) {
    // 构造一个Entry放入尝试放入mpsc队列
    Entry<T> entry = newEntry(chunk, nioBuffer, handle);
    boolean queued = queue.offer(entry);
    // 如果队列放不下了，需要将资源直接回收
    if (!queued) {
        entry.recycle();
    }
    // 返回是否成功入队
    return queued;
}

static final class Entry<T> {
    // 对象回收器
    final Handle<Entry<?>> recyclerHandle;
    // 所属Chunk（16M内存块）
    PoolChunk<T> chunk;
    // 缓存ByteBuffer实例，目的是防止频繁new对象导致的GC
    ByteBuffer nioBuffer;
    // 偏移量 高32位subpage位图下标，低32位chunk的memoryMap下标
    long handle = -1;

    void recycle() {
        // 置空
        chunk = null;
        nioBuffer = null;
        handle = -1;
        // 回收到对象池
        recyclerHandle.recycle(this);
    }
}

2、分配内存块

分配内存时，优先会尝试用当前线程对应的PoolThreadCache分配。这时候就会调用到PoolThreadCache#allocateTiny或PoolThreadCache#allocateSmall或PoolThreadCache#allocateNormal方法。

boolean allocateTiny(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int normCapacity) {
    return allocate(cacheForTiny(area, normCapacity), buf, reqCapacity);
}
boolean allocateSmall(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int normCapacity) {
    return allocate(cacheForSmall(area, normCapacity), buf, reqCapacity);
}
boolean allocateNormal(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int normCapacity) {
    return allocate(cacheForNormal(area, normCapacity), buf, reqCapacity);
}

首先根据规格类型选择allocateXXX方法，然后根据normCapacity标准化容量，确定使用的MemoryRegionCache实例（参考上面的cacheForSmall方法），进入allocate方法。

private boolean allocate(MemoryRegionCache<?> cache, PooledByteBuf buf, int reqCapacity) {
    if (cache == null) {
        return false;
    }
    // 尝试从从mpsc队列中获取可用内存块（chunk和handle决定内存块），初始化buffer
    boolean allocated = cache.allocate(buf, reqCapacity, this);
    // 增加PoolThreadCache的分配计数器
    // 分配8192次之后 触发一次trim 将内存块归还
    if (++ allocations >= freeSweepAllocationThreshold) {
        allocations = 0;
        trim();
    }
    return allocated;
}

PoolThreadCache的allocate方法调用MemoryRegionCache的allocate方法，将内存块封装为Entry放入mpsc队列。同时增加allocations计数，如果计数达到8192，重置计数器，并触发一次trim修剪。

public final boolean allocate(PooledByteBuf<T> buf, int reqCapacity, PoolThreadCache threadCache) {
    // 尝试从队列中获取Entry
    Entry<T> entry = queue.poll();
    if (entry == null) {
        return false;
    }
    // 执行子类的initBuf方法（见MemoryRegionCache的实现类）
    initBuf(entry.chunk, entry.nioBuffer, entry.handle, buf, reqCapacity, threadCache);
    // 将entry对象实例回收到对象池
    entry.recycle();
    // 增加MemoryRegionCache的分配计数器
    ++ allocations;
    return true;
}

可以看到PoolThreadCache的计数器是无论分配成功或失败都会增加，而MemoryRegionCache的计数器只有分配成功（从mpsc队列获取元素成功）才会增加，是否需要执行trim操作取决于PoolThreadCache的计数器。

3、修剪

每当PoolThreadCache执行8192次allocate方法后，会触发一次trim操作。

void trim() {
    trim(tinySubPageDirectCaches);
    trim(smallSubPageDirectCaches);
    trim(normalDirectCaches);
    trim(tinySubPageHeapCaches);
    trim(smallSubPageHeapCaches);
    trim(normalHeapCaches);
}
private static void trim(MemoryRegionCache<?>[] caches) {
    if (caches == null) {
        return;
    }
    for (MemoryRegionCache<?> c: caches) {
        trim(c);
    }
}

private static void trim(MemoryRegionCache<?> cache) {
    if (cache == null) {
        return;
    }
    cache.trim();
}

trim方法认为，如果当前MemoryReigionCache没有经常allocate成功，将实际执行free方法。

/**
* Free up cached {@link PoolChunk}s if not allocated frequently enough.
*/
public final void trim() {
    // 计算mpsc队列剩余空间
    int free = size - allocations;
    // 重置计数器
    allocations = 0;
    // 如果mpsc队列尚有空间剩余，认为not allocated frequently enough
    if (free > 0) {
        free(free, false);
    }
}

trim是为了防止当前线程通过PoolThreadCache长时间占用内存块却不用，导致内存泄露。比如分配了1次512B规格的内存，虽然release了，但是只是归还给某个线程的PoolThreadCache了。又经过8191次分配16B规格内存，触发了trim操作，将之前的512B内存归还给Chunk。

@Test
public void test01xxxvx() {
    PooledByteBufAllocator allocator = (PooledByteBufAllocator) ByteBufAllocator.DEFAULT;
    allocator.newDirectBuffer(512, Integer.MAX_VALUE).release();
    for (int i = 0; i < 8191; i++) {
        allocator.newDirectBuffer(5, Integer.MAX_VALUE);
    }
}

4、归还内存块

PoolThreadCache的free方法负责将内存块归还给Chunk，放回Chunk后又可以供其他线程获取。

PoolThreadCache的free方法在两种情况下会被调用：

GC时PoolThreadCache实例被标记为待回收，执行Object的finalize方法

@Override
protected void finalize() throws Throwable {
    try {
        super.finalize();
    } finally {
        free(true);
    }
}

FastThreadLocal的remove方法被调用时，触发PoolThreadLocalCache#onRemoval钩子

@Override
protected void onRemoval(PoolThreadCache threadCache) {
    threadCache.free(false);
}

PoolThreadCache的free方法确保只会执行一次，会执行每个MemoryRegionCache数组的资源释放工作。

// 一个标志位 为了确保当前实例只会执行一次全量资源释放
private final AtomicBoolean freed = new AtomicBoolean();
void free(boolean finalizer) {
    // 因为finalize方法和onRemoval方法都会触发free方法，cas确保方法只会执行一次
    if (freed.compareAndSet(false, true)) {
        // 执行每个MemoryRegionCache数组的资源释放工作
        int numFreed = free(tinySubPageDirectCaches, finalizer) +
            free(smallSubPageDirectCaches, finalizer) +
            free(normalDirectCaches, finalizer) +
            free(tinySubPageHeapCaches, finalizer) +
            free(smallSubPageHeapCaches, finalizer) +
            free(normalHeapCaches, finalizer);
		// 对应Arena被线程缓存使用的次数-1
        if (directArena != null) {
            directArena.numThreadCaches.getAndDecrement();
        }

        if (heapArena != null) {
            heapArena.numThreadCaches.getAndDecrement();
        }
    }
}

最后进入MemoryRegionCache的free方法。循环从mpsc队列中获取Entry，执行freeEntry方法。

public final int free(boolean finalizer) {
    return free(Integer.MAX_VALUE, finalizer);
}

private int free(int max, boolean finalizer) {
    int numFreed = 0;
    for (; numFreed < max; numFreed++) {
        Entry<T> entry = queue.poll();
        if (entry != null) {
            freeEntry(entry, finalizer);
        } else {
            // all cleared
            return numFreed;
        }
    }
    return numFreed;
}

如果free的入口是Object方法的finalize，这里会回收Entry到对象池。最后将内存块调用Arena.freeChunk方法归还Chunk对应handle位置的内存块（见上一章）。

private  void freeEntry(Entry entry, boolean finalizer) {
    PoolChunk chunk = entry.chunk;
    long handle = entry.handle;
    ByteBuffer nioBuffer = entry.nioBuffer;
    // 如果不是Object的finalize方法触发的资源释放，将entry对象回收到对象池
    if (!finalizer) {
        entry.recycle();
    }
    // 将内存块归还给Chunk
    chunk.arena.freeChunk(chunk, handle, sizeClass, nioBuffer, finalizer);
}

总结

PooledByteBufAllocator是池化内存分配器，持有PoolThreadLocalCache实例，负责分配内存。
PoolThreadLocalCache继承FastThreadLocal，负责保存当前线程持有的PoolThreadCache。在initValue时，选择最少使用Arena与当前线程绑定，构造PoolThreadCache。
PoolThreadCache由FastThreadLocal管理，负责缓存和分配当前线程持有的内存块。
- 缓存内存块：当Bytebuf被客户端释放时，会尝试放入当前线程对应的PoolThreadCache。
- 分配内存块：客户端申请内存时，会尝试从当前线程对应的PoolThreadCache获取。
- 修剪：每当PoolThreadCache执行8192次allocate方法后，会触发一次修剪操作，目的是防止当前线程长时间持有内存块不用，导致内存泄露。
- 归还内存块：PoolThreadCache在被回收时（GC或FastThreadLocal被摧毁），会将缓存的内存块归还给Chunk。
MemoryRegionCache是线程缓存管理的最小单元，内部维护了mpsc队列存放内存块。

Netty源码（五）内存分配（中）线程缓存

前言