文件IO中ByteBuffer和Channel

609 阅读3分钟

概述

BIO

我们先来看一段代码

import java.io.File;
import java.io.FileInputStream;
public class BIOTest {
    public static void main(String[] args) throws Exception{
        File file = new File("haha.txt");
        FileInputStream fi = new FileInputStream(file);

        byte[] bytes = new byte[10];

        int c ;
        while ((c = fi.read(bytes)) != -1){
            String str = new String(bytes,0,c);

            System.out.println(str);
        }
    }
}

非常简单的一个文件读取操作 底层调用的是 FileInputStream 的 native 方法。读取一次的模型如图

image.png 可以看到读取10个字节的内容需要经过 磁盘->内核->堆外空间->用户堆空间 三次拷贝

NIO

这么多次的拷贝影响性能,有没有什么方法能减少拷贝的次数呢。

import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
public class BIOTest {
    public static void main(String[] args) throws Exception{

        FileInputStream fi = new FileInputStream("haha.txt");
        FileChannel channel = fi.getChannel();
        # 申请了一个堆外的内存
        ByteBuffer byteBuffer = ByteBuffer.allocateDirect(1);
        int c ;
        while ((c = channel.read(byteBuffer)) != -1){
            byteBuffer.flip();
            byte[] bytes = new byte[byteBuffer.remaining()];
            byteBuffer.get(bytes);
            System.out.println(new String(bytes));
            byteBuffer.clear();
        }
    }
}

image.png

mmap

还有没有更快的呢

import java.io.File;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
public class BIOTest {

    public static void main(String[] args) throws Exception {
        File file = new File("F:\workspace\zc-chat\src\main\resources\haha.txt");
        long len = 10;
        byte[] ds = new byte[(int) len];
        MappedByteBuffer mappedByteBuffer = new RandomAccessFile(file, "r")
                .getChannel()
                .map(FileChannel.MapMode.READ_ONLY, 0, len);
        for (int offset = 0; offset < len; offset++) {
            byte b = mappedByteBuffer.get();
            ds[offset] = b;
        }
        System.out.println(new String(ds));
    }
}

image.png 当然,上面是理论结构,实际测试下来,拷贝一个1g的文件,上面几种io其实差别并不大

FileChannel

通过上面的代码可以看出, NIO和BIO最明显的不同是 NIO从fileInputSteam里面获得了一个 fileChannel 然后从 fileChannel 里面读取的是 ByteBuffer, 我们先来看看 FileChannel

image.png 接口继承的太多,用到具体的在细讲主要知道 FileChannel是一个 Channel

# 一个channel 可以表示一个 硬件,一个file,一个网络socket.线程安全
public interface Channel extends Closeable {

    public boolean isOpen();

    public void close() throws IOException;
}

从案例中我们可以看出 filechannel可以对其进行读写操作,我们看看内部实现

public int read(ByteBuffer var1) throws IOException {
    this.ensureOpen();
    if (!this.readable) {
        throw new NonReadableChannelException();
    } else {
        synchronized(this.positionLock) {
            int var3 = 0;
            int var4 = -1;

            try {
                this.begin();
                var4 = this.threads.add();
                if (!this.isOpen()) {
                    byte var12 = 0;
                    return var12;
                } else {
                    do {
                        # 把fd内容读取到ByteBuffer 中
                        var3 = IOUtil.read(this.fd, var1, -1L, this.nd);
                    } while(var3 == -3 && this.isOpen());

                    int var5 = IOStatus.normalize(var3);
                    return var5;
                }
            } finally {
                this.threads.remove(var4);
                this.end(var3 > 0);

                assert IOStatus.check(var3);

            }
        }
    }
}

可以看到核心其实是 IOUtil的read方法

static int read(FileDescriptor var0, ByteBuffer var1, long var2, NativeDispatcher var4) throws IOException {
    if (var1.isReadOnly()) {
        throw new IllegalArgumentException("Read-only buffer");
    } else if (var1 instanceof DirectBuffer) {
        # 如果是直接内存,调用native方法直接 让 内核把数据从磁盘拷贝到内核空间,然后再读到jvm空间返回
        return readIntoNativeBuffer(var0, var1, var2, var4);
    } else {
        # 如果是java堆内存, 创建出来一块堆外内存
        ByteBuffer var5 = Util.getTemporaryDirectBuffer(var1.remaining());

        int var7;
        try {
            # 还是把磁盘拷贝到内核,内核拷贝到堆外内存
            int var6 = readIntoNativeBuffer(var0, var5, var2, var4);
            var5.flip();
            if (var6 > 0) {
            # 然后把堆外内存再拷贝到堆内中
                var1.put(var5);
            }

            var7 = var6;
        } finally {
            Util.offerFirstTemporaryDirectBuffer(var5);
        }
        return var7;
    }
}

通过源码可以看出来,channel在读取数据的时候会判断bytebuffer类型是不是 DirectBuffer 不是就会创建一个 直接内存然后拷贝一遍. Channel的写操作也是同理. 所以我们理解Channel就是可以理解为一个文件或者网络socket,读取是通过ByteBuffer,接下来我们就来看看ByteBuffer

ByteBuffer

可以理解为一个内存空间,只是存放的地方不同分为 HeapByteBuffer 堆内内存 DirectByteBuffer 堆外内存也叫直接内存. channel调用readwrite方法底层调用的就是 ByteBuffer 的read write方法,所以我们重点看两种bytebuffer是如何实现读写的. image.png Buffer封装了4个指针,并且封装了指针的操作,用于读写.

public abstract class Buffer {
    // Invariants: mark <= position <= limit <= capacity
    private int mark = -1;
    private int position = 0;
    private int limit;
    private int capacity;
」

接下来看看 ByteBuffer

public abstract class ByteBuffer
    extends Buffer
    implements Comparable<ByteBuffer>
{
    # 只给堆内存用
    final byte[] hb;                  // Non-null only for heap buffers
    final int offset;
    boolean isReadOnly;   
    
    # 创建直接内存
    public static ByteBuffer allocateDirect(int capacity) {
        return new DirectByteBuffer(capacity);
    }

    # 创建堆内存
    public static ByteBuffer allocate(int capacity) {
        if (capacity < 0)
            throw new IllegalArgumentException();
        return new HeapByteBuffer(capacity, capacity);
    }
    # 获得一个字节
    public abstract byte get();
    # 添加一个字节
    public abstract ByteBuffer put(byte b);
}

可以看到,ByteBuffer 提供了两个创建堆内存和非堆内存的方法,并且抽象了读取和写入的方法给子类

class HeapByteBuffer
    extends ByteBuffer
{
    # java堆空间的byte数组
    protected final byte[] hb;
    protected final int offset;


    public byte get() {
        return hb[ix(nextGetIndex())];
    }

    public ByteBuffer put(byte x) {

        hb[ix(nextPutIndex())] = x;
        return this;
    }
class DirectByteBuffer extends MappedByteBuffer 
        implements DirectBuffer
{

    # 可以理解为一个操作数组的指针
    protected static final Unsafe unsafe = Bits.unsafe();
    
    DirectByteBuffer(int cap) {                   // package-private
        # 初始化buffer 四个参数
        super(-1, 0, cap, cap);
        boolean pa = VM.isDirectMemoryPageAligned();
        int ps = Bits.pageSize();
        long size = Math.max(1L, (long)cap + (pa ? ps : 0));
        Bits.reserveMemory(size, cap);

        long base = 0;
        try {
            # 申请直接内存
            base = unsafe.allocateMemory(size);
        } catch (OutOfMemoryError x) {
            Bits.unreserveMemory(size, cap);
            throw x;
        }
        unsafe.setMemory(base, size, (byte) 0);
        if (pa && (base % ps != 0)) {
            // Round up to page boundary
            address = base + ps - (base & (ps - 1));
        } else {
            address = base;
        }
        cleaner = Cleaner.create(this, new Deallocator(base, size, cap));
        att = null;
    }

    
    public ByteBuffer put(byte x) {
        unsafe.putByte(ix(nextPutIndex()), ((x)));
        return this;
    }

    public byte get() {
        return ((unsafe.getByte(ix(nextGetIndex()))));
    }
}

通过源码可以看出来, DirectByteBuffer 和 HeapByteBuffer 一个管理堆的byte数组,一个是用 unsafe类管理堆外的数组.