Buffer

228 阅读7分钟

一、NIO的Buffer

  1. NIO的数据读取和写入都是经过缓冲区的
  2. Buffer的实质是操作字节或者基础类型的数组

1.Buffer的重要参数

// Invariants: mark <= position <= limit <= capacity
private int mark = -1;
private int position = 0;
private int limit;
private int capacity;

// Used only by direct buffers
// NOTE: hoisted here for speed in JNI GetDirectBufferAddress
long address;

1.1 mark

用来标记当前position的位置,当调用reset方法时会将当前position重置到mark的位置

/**
 * Sets this buffer's mark at its position.
 *
 * @return  This buffer
 */
public final Buffer mark() {
    mark = position;
    return this;
}

/**
 * Resets this buffer's position to the previously-marked position.
 *
 * <p> Invoking this method neither changes nor discards the mark's
 * value. </p>
 *
 * @return  This buffer
 *
 * @throws  InvalidMarkException
 *          If the mark has not been set
 */
public final Buffer reset() {
    int m = mark;
    if (m < 0)
        throw new InvalidMarkException();
    position = m;
    return this;
}

1.2 position

用来标识下一个要读取或者写入的位置

1.3 limit

  1. 用来标识第一个不能被读或者写的位置
  2. 所以在读模式下limit是等于容量capacity的
  3. 写模式下limit等于buffer中存在数据的最大值的位置

1.4 capacity

表示缓冲区的最大容量

1.5 address

仅仅在直接内存时使用,标识物理位置

2、Buffer的重要API

// 指针重新设置,为下一个Channe做准备
public final Buffer clear() {
    position = 0;
    limit = capacity;
    mark = -1;
    return this;
}
// 从头开始读或者写
public final Buffer flip() {
    limit = position;
    position = 0;
    mark = -1;
    return this;
}
// 和flip的区别是不设置limit
public final Buffer rewind() {
    position = 0;
    mark = -1;
    return this;
}

3、Buffer的重要实现类

3.1 ByteBuffer

  1. ByteBuffer数组底层的数据结构是字节数组
  2. slice共享原Buffer的内容,两个Buffer的修改都会对彼此产生影响,因为底层是维护同一个数组;
  3. duplicate和slice其实都算是浅拷贝
public abstract class ByteBuffer
    extends Buffer
    implements Comparable<ByteBuffer>
{
    final byte[] hb;                  // Non-null only for heap buffers
    final int offset;
    boolean isReadOnly;                 // Valid only for heap buffers
    
    // 创建一个直接内存分配的ByteBuffer
    public static ByteBuffer allocateDirect(int capacity) {
        return new DirectByteBuffer(capacity);
    }
    
    // 创建一个堆内存分配的ByteBuffer
    public static ByteBuffer allocate(int capacity) {
        if (capacity < 0)
            throw new IllegalArgumentException();
        return new HeapByteBuffer(capacity, capacity);
    }
    
    /**
     * Creates a new byte buffer whose content is a shared subsequence of
     * this buffer's content.
     *
     * <p> The content of the new buffer will start at this buffer's current
     * position.  Changes to this buffer's content will be visible in the new
     * buffer, and vice versa; the two buffers' position, limit, and mark
     * values will be independent.
     *
     * <p> The new buffer's position will be zero, its capacity and its limit
     * will be the number of bytes remaining in this buffer, and its mark
     * will be undefined.  The new buffer will be direct if, and only if, this
     * buffer is direct, and it will be read-only if, and only if, this buffer
     * is read-only.  </p>
     *
     * @return  The new byte buffer
     */
    public abstract ByteBuffer slice();

    /**
     * Creates a new byte buffer that shares this buffer's content.
     *
     * <p> The content of the new buffer will be that of this buffer.  Changes
     * to this buffer's content will be visible in the new buffer, and vice
     * versa; the two buffers' position, limit, and mark values will be
     * independent.
     *
     * <p> The new buffer's capacity, limit, position, and mark values will be
     * identical to those of this buffer.  The new buffer will be direct if,
     * and only if, this buffer is direct, and it will be read-only if, and
     * only if, this buffer is read-only.  </p>
     *
     * @return  The new byte buffer
     */
    public abstract ByteBuffer duplicate();
    
    /**
     * Relative <i>get</i> method.  Reads the byte at this buffer's
     * current position, and then increments the position.
     */
    public abstract byte get();
    
    /**
     * <p> Writes the given byte into this buffer at the current
     * position, and then increments the position. </p>
     */
    public abstract ByteBuffer put(byte b);

}

3.2 IntBuffer

  1. IntBuffer的底层是int数组
  2. 除了IntBuffer,还有LongBuffer、FloatBuffer等
  3. 这些都只是抽象类,具体实现要看下面的基于堆和基于直接内存的实现

3.3 DirectByteBuffer&&HeapByteBuffer

  1. HeapByteBuffer和DirectByteBuffer的实现就是操作Buffer的指针
  2. HeapByteBuffer是基于index来获取或者设置数组数据的,DirectByteBuffer是基于Unsafe直接操作内存地址来实现数据的获取的
  3. 下面以get方法为例说明

3.3.1 HeapByteBuffer.get

final byte[] hb;                  // Non-null only for heap buffers
protected int ix(int i) {
    // offset默认是0,也就是获取数组在position位置的数据
    return i + offset;
}

public byte get() {
    return hb[ix(nextGetIndex())];
}

final int nextGetIndex() {                          // package-private
    if (position >= limit)
        throw new BufferUnderflowException();
    return position++;
}

3.3.2 DirectByteBuffer.get

// 通过内存地址去获取数组数据
private long ix(int i) {
    return address + ((long)i << 0);
}

public byte get() {
    // unsafe可以参考https://tech.meituan.com/2019/02/14/talk-about-java-magic-class-unsafe.html
    return ((unsafe.getByte(ix(nextGetIndex()))));
}

二、netty的ByteBuf

1、ByteBuf的重要参数

int readerIndex; // 读指针
int writerIndex; // 写指针
private int markedReaderIndex; // mark之后的读指针
private int markedWriterIndex; // mark之后的写指针
private int maxCapacity; // 最大容量
  1. 实现读写分离,使得对Buffer的操作更加方便
  2. 下图是各变量之间的相互关系
+-------------------+------------------+------------------+
     | discardable bytes |  readable bytes  |  writable bytes  |
     |                   |     (CONTENT)    |                  |
     +-------------------+------------------+------------------+
     |                   |                  |                  |
     0      <=      readerIndex   <=   writerIndex    <=    capacity

2、ByteBuf的重要API

// 从当前readerIndex指针开始往后读一个字节的数据并移动readerIndex,将存储单位转化为Byte
@Override
public byte readByte() {
    checkReadableBytes0(1);
    int i = readerIndex;
    byte b = _getByte(i);
    readerIndex = i + 1;
    return b;
}

// 从当前readerIndex指针开始往后读4个字节的数据并移动readerIndex,将存储单位转化为Int
@Override
public int readInt() {
    checkReadableBytes0(4);
    int v = _getInt(readerIndex);
    readerIndex += 4;
    return v;
}

// 从当前writerIndex开始往后写src.size个字节并移动writerIndex
@Override
public ByteBuf writeBytes(byte[] src) {
    writeBytes(src, 0, src.length);
    return this;
}

// 获取当前Buffer中可读的字节数
@Override
public int readableBytes() {
    return writerIndex - readerIndex;
}

// 获取当前Buffer中可写的字节数
@Override
public int writableBytes() {
    return capacity() - writerIndex;
}

3、ByteBuf的重要实现类

ByteBuf有很多子类,大致可以按照3个维度来进行分类,分别如下:

  1. Pooled和UnPooled:池化内存,就是从预分配好的内存空间中提取一段连续的内存封装成一个ByteBuf;类似于线程池、连接池等;netty提供了池化和非池化(UnPooled)的ByteBuf
  2. Unsafe和非Unsafe:Unsafe是JDK底层的一个负责I/O操作的对象,可以直接获得对象的内存地址,基于内存地址进行读写操作
  3. Direct和Heap:Direct即堆外内存,直接调用JDK底层的API进行物理内存分配,不在JVM的堆内存中进行分配,需要手动释放;Heap也就是在JVM的堆内存中进行分配空间 image

4、ByteBuf的零拷贝

4.1 传统意义上的零拷贝

传统意义上的零拷贝是指操作系统层面上的零拷贝,也即避免在用户态与内核态之间来回拷贝数据的技术

4.1.1 读取和写入数据过程

  1. 内核从磁盘中将数据读取到内核缓冲区
  2. cpu将内核缓冲数据copy到应用缓冲区
  3. 当向磁盘中写入数据时,cpu再将应用缓冲区数据copy到内核缓冲区
  4. 内核缓冲区再进行刷盘操作(或者从内核socket buffer拷贝到网卡接口缓冲区)

4.1.2 解决方案

  1. java提供的FileChannel.transferTo就可以避免上面的两次copy,其实就是sendFile系统调用,使用的是mmap(虚拟内存映射)

4.2 Netty的零拷贝

  1. netty的零拷贝完全是用户态的,它的Zero-Copy更多是优化用户态数据操作的概念
  2. netty零拷贝主要表现在以下几个方面

4.2.1 CompositeByteBuf

  1. 混合的ByteBuf,既可以有DirectByteBuf也可以有HeapByteBuf
  2. 当需要将两个ByteBuf进行合并时,NIO的做法是新建一个数组对象,数组的大小是header.size+body.size,然后将两个数组拷贝到新数组中
  3. netty提供的CompositeByteBuf可以直接将两个数组合并,而且之前的两个ByteBuf还是指向之前的内存地址,避免内存拷贝
ByteBuf header = ...
ByteBuf body = ...

CompositeByteBuf compositeByteBuf = Unpooled.compositeBuffer();
compositeByteBuf.addComponents(true, header, body);

4.2.2 通过wrap操作实现零拷贝

当需要把某个字节封装成ByteBuf时,需要定义一个数组对象,然后把字节数组赋值给新的字节数组,Netty提供了wrapBuffer方法可以直接把bytes赋值给ByteBuf,共享字节数组,避免的内存拷贝

byte[] bytes = ...
ByteBuf byteBuf = Unpooled.wrappedBuffer(bytes);

4.2.3 通过slice实现零拷贝

silce是把一个ByteBuf拆分,共享字节数组

ByteBuf byteBuf = ...
ByteBuf header = byteBuf.slice(0, 5);
ByteBuf body = byteBuf.slice(5, 10);

3、NIO和Netty中buffer的区别

  1. ByteBuf使用读写两个指针来判断,而ByteBuffer只有一个指针,这使得API操作起来更简单
  2. ByteBuf支持混合类型,而ByteBuffer只能使用数组
  3. 容量可以自动扩容,但是ByteBuffer不可以
  4. 支持池化,并且有引用计数,防止对象被回收