【Netflix Hollow系列】从源码深入Hollow的encode实现

957 阅读15分钟

前言

书接上文,Hollow的内存实现与编码逻辑密不可分,本计划在一篇文章中全部梳理完成,但是奈何文章长度限制,只能分为两篇文章。在上一篇文章【Netflix Hollow系列】深入Hollow的内存实现以及ByteBuffer的应用 中已经详细介绍了Hollow的内存池的实现以及对ByteBuffer的应用,本文将继续深究Hollow数据的编码逻辑,通过编码,可以进一步优化内存使用,降低内存占用,提升访问速度,进而达到性能优化的目的。@空歌白石

BlobByteBuffer

在上一篇文章【Netflix Hollow系列】深入Hollow的内存实现以及ByteBuffer的应用 中针对encoded类型中大量使用了BlobByteBuffer负责所有在内存中数据的存储,也就是SHARED_MEMORY_LAZY内存模型的最底层实现,可以将BlobByteBuffer认为是对JDK的ByteBuffer在Hollow的一种封装和实现。

BlobByteBuffer实现了将类似于BLOB这样的大文件和MappedByteBuffer之间的桥梁,原因是MappedByteBuffer仅限于映射整数大小的内存。

请注意: JDK 14 将引入改进的 API 来访问外部内存并替换MappedByteBuffer

BlobByteBuffer不是线程安全的,但共享底层字节缓冲区以进行并行读取是安全的。支持的最大blob大小约为2的16次方。在遇到此限制之前,可能会达到 Hollow 中的其他限制或实际限制。

构造方法和属性

public static final int MAX_SINGLE_BUFFER_CAPACITY = 1 << 30;   // largest, positive power-of-two int

private final ByteBuffer[] spine;   // array of MappedByteBuffers
private final long capacity;        // in bytes
private final int shift;
private final int mask;

private long position;              // within index 0 to capacity-1 in the underlying ByteBuffer

private BlobByteBuffer(long capacity, int shift, int mask, ByteBuffer[] spine) {
    this(capacity, shift, mask, spine, 0);
}

private BlobByteBuffer(long capacity, int shift, int mask, ByteBuffer[] spine, long position) {

    if (!spine[0].order().equals(ByteOrder.BIG_ENDIAN)) {
        throw new UnsupportedOperationException("Little endian memory layout is not supported");
    }

    this.capacity = capacity;
    this.shift = shift;
    this.mask = mask;
    this.position = position;

    // The following assignment is purposefully placed *after* the population of all segments (this method is called
    // after mmap). The final assignment after the initialization of the array of MappedByteBuffers guarantees that
    // no thread will see any of the array elements before assignment.
    this.spine = spine;
}

ByteOrder.BIG_ENDIAN

Hollow的ByteBuffer字节顺序仅支持ByteOrder.BIG_ENDIAN,这也是Java默认的字节顺序。

ByteOrder.nativeOrder()方法返回本地JVM运行的硬件的字节顺序,使用和硬件一致的字节顺序可能使buffer更加有效。

存储结构

下图展示了BlobByteBuffer中由ByteBuffer[]byte[]组成的两级存储结构。

Hollow-BlobByteBuffer.drawio.png

mmapBlob

mmapBlob方法实现了使用MappedByteBuffer读取File的功能。

public static BlobByteBuffer mmapBlob(FileChannel channel, int singleBufferCapacity) throws IOException {
    long size = channel.size();
    if (size == 0) {
        throw new IllegalStateException("File to be mmap-ed has no data");
    }
    if ((singleBufferCapacity & (singleBufferCapacity - 1)) != 0) { // should be a power of 2
        throw new IllegalArgumentException("singleBufferCapacity must be a power of 2");
    }

    // 分成 bufferCount 个缓冲区,其 int 容量是 2 的幂,向上取最大的2的幂次方。
    final int bufferCapacity = size > (long) singleBufferCapacity
            ? singleBufferCapacity
            // 空歌白石:highestOneBit返回小于等于这个数字的一个2的幂次方数。
            : Integer.highestOneBit((int) size);
    long bufferCount = size % bufferCapacity == 0
            ? size / (long)bufferCapacity
            : (size / (long)bufferCapacity) + 1;
    if (bufferCount > Integer.MAX_VALUE)
        throw new IllegalArgumentException("file too large; size=" + size);

    // 空歌白石:计算占用的位数
    int shift = 31 - Integer.numberOfLeadingZeros(bufferCapacity); // log2

    // 空歌白石:计算掩码,【2的shift次方-1】
    int mask = (1 << shift) - 1;

    // 空歌白石:根据bufferCount初始化MappedByteBuffer。
    ByteBuffer[] spine = new MappedByteBuffer[(int)bufferCount];
    for (int i = 0; i < bufferCount; i++) {
        long pos = (long)i * bufferCapacity;
        int cap = i == (bufferCount - 1)
                ? (int)(size - pos)
                : bufferCapacity;

        // 空歌白石:从position位开始,读取capacity的bytes,映射到ByteBuffer中。
        ByteBuffer buffer = channel.map(READ_ONLY, pos, cap);
        /*
        * if (!((MappedByteBuffer) buffer).isLoaded()) // TODO(timt): make pre-fetching configurable
        *    ((MappedByteBuffer) buffer).load();
        */
        // 空歌白石:将没段的buffer写入到对应的spine中。
        spine[i] = buffer;
    }

    return new BlobByteBuffer(size, shift, mask, spine);
}

numberOfLeadingZeros

numberOfLeadingZeros作用是返回无符号整数i的最高非0位前面的0的个数,包括符号位在内;如果i为负数,这个方法将会返回0,符号位为1。比如说,10的二进制表示为 0000 0000 0000 0000 0000 0000 0000 1010,java的整型长度为32位。那么这个方法返回的就是28。

highestOneBit

Integer.highestOneBit方法,可以给它传入一个数字,它将返回小于等于这个数字的一个2的幂次方数。(最高的一位)

get

BlobByteBuffer提供了两个get方法,分别为getBytegetLong。这两个方法可以认为是读取ByteBuffer的入口。

基于上文中二维存储结构,可以轻松理解getbyte

/**
* Reads the byte at the given index.
* @param index byte index (from offset 0 in the backing BlobByteBuffer) at which to read byte value
* @return byte at the given index
* @throws IndexOutOfBoundsException if index out of bounds of the backing buffer
*/
public byte getByte(long index) throws BufferUnderflowException {
    if (index < capacity) {
        int spineIndex = (int)(index >>> (shift));
        int bufferIndex = (int)(index & mask);
        return spine[spineIndex].get(bufferIndex);
    }
    else {
        assert(index < capacity + Long.BYTES);
        // this situation occurs when read for bits near the end of the buffer requires reading a long value that
        // extends past the buffer capacity by upto Long.BYTES bytes. To handle this case,
        // return 0 for (index >= capacity - Long.BYTES && index < capacity )
        // these zero bytes will be discarded anyway when the returned long value is shifted to get the queried bits
        return (byte) 0;
    }
}

着重介绍getLong方法。getLong返回从给定字节索引开始的long值,getLong仅仅依赖于外部的startByteIndex,不受多线程影响,所以是线程安全的。

bigEndian可以计算给定大端字节顺序,将位置返回到给定字节索引的缓冲区中。Java nio DirectByteBuffers 默认为ByteOrder.BIG_ENDIAN。 Big-endianness 的类型会在BlobByteBuffer的构造函数中进行验证,前文已有提到。

/**
* Return the long value starting from given byte index. This method is thread safe.
* @param startByteIndex byte index (from offset 0 in the backing BlobByteBuffer) at which to start reading long value
* @return long value
*/
public long getLong(long startByteIndex) throws BufferUnderflowException {

    // 空歌白石;从StartByteIndex读取的偏移量
    int alignmentOffset = (int)(startByteIndex - this.position()) % Long.BYTES;

    // 空歌白石;此次读取的long长度边界位置
    long nextAlignedPos = startByteIndex - alignmentOffset + Long.BYTES;

    // 空歌白石;Long.BYTES始终为8
    byte[] bytes = new byte[Long.BYTES];

    for (int i = 0; i < Long.BYTES; i ++ ) {
        bytes[i] = getByte(bigEndian(startByteIndex + i, nextAlignedPos));
    }

    // 空歌白石;将8个byte值左移位存入一个long值中。
    return ((((long) (bytes[7]       )) << 56) |
            (((long) (bytes[6] & 0xff)) << 48) |
            (((long) (bytes[5] & 0xff)) << 40) |
            (((long) (bytes[4] & 0xff)) << 32) |
            (((long) (bytes[3] & 0xff)) << 24) |
            (((long) (bytes[2] & 0xff)) << 16) |
            (((long) (bytes[1] & 0xff)) <<  8) |
            (((long) (bytes[0] & 0xff))      ));
}

/**
* Given big-endian byte order, returns the position into the buffer for a given byte index. Java nio DirectByteBuffers
* are by default big-endian. Big-endianness is validated in the constructor.
* @param index byte index
* @param boundary index of the next 8-byte aligned byte
* @return position in buffer
*/
private long bigEndian(long index, long boundary) {
    long result;
    if (index < boundary) {
        result = (boundary - Long.BYTES) + (boundary - index) - 1;
    } else {
        result = boundary + (boundary + Long.BYTES - index) - 1;
    }
    return result;
}

VarInt

VarInt实现了可变字节整数编码和解码逻辑。解决了定长存储的整数类型绝对值较小时空间浪费大的问题。

write

可变字节编码的写入包含三种类型,写入null、写入int和写入long,数据可以被写入ByteDataArray的byte数组中,也可以直接被写入OutputStreambyte[]数据流中。

VarInt是如何写入数据呢?核心的解决思路是将整数类型由定长存储转为变长存储(能用1个字节存下就坚决不用2个或以上字节存储)。

让我们来具体例子看下,int类型的7,二进制表示:00000000 00000000 00000000 00000111。大家都知道int有32位,4个字节,其中前3个字节完全是浪费的空间,如果可以经过编码后,将7占用的空间从4个字节减少为1个字节,内存占用减少了75%,是相当可观了。

VarInt的实现原理并不复杂,就是将int按 7bit 分段,每个字节的最高位作为标识位,标识后一个字节是否属于该数据。1 代表后面的字节还是属于当前数据,0 代表这是当前数据的最后一个字节。

原始编码
     00000000  00000000  00000011  10111011
7bit切分
0000  0000000   0000000   0000111   0111011
varint编
                         00000111  10111011

下文的源码中会用到的一些十六进制数值,这里列出十六进制与十进制、二进制对照关系。

十六进制二级制十进制
0xFFFFFFFFFFFFFF00000000 00000001 00000000 00000000 00000000 00000000 00000000 0000000072057594037927940
0x1FFFFFFFFFFFF00000000 00000001 11111111 11111111 11111111 11111111 11111111 11111111562949953421311
0x3FFFFFFFFFF00000000 00000000 00000011 11111111 11111111 11111111 11111111 111111114398046511103
0x7FFFFFFFF00001111 11111111 11111111 11111111 11111111 11111111 11111111 11111111268435455
0x0FFFFFFF00001111 11111111 11111111 11111111268435455
0xFFFFFFF00001111 11111111 11111111 11111111268435455
0x1FFFFF00000000 00011111 11111111 111111112097151
0x3FFF00000000 00000000 00111111 1111111116383
0x7F00000000 00000000 00000000 01111111127
0x8000000000 00000000 00000000 10000000128
0x400000000000 00000000 01000000 0000000016384
0x20000000000000 00100000 00000000 000000002097152
0x1000000000001111 11111111 11111111 11111111268435455
0x80000000000000000 00000000 00000000 00001000 00000000 00000000 00000000 0000000034359738368
0x4000000000000000000 00000000 00000100 00000000 00000000 00000000 00000000 000000004398046511104
0x200000000000000000000 00000010 00000000 00000000 00000000 00000000 00000000 00000000562949953421312
0x10000000000000000000001 00000000 00000000 00000000 00000000 00000000 00000000 0000000072057594037927940

在线进制转换工具:tool.oschina.net/hexconvert/

write variable null

将可变长度整数的NULL写入提供的 ByteDataArray中,Hollow使用0x80表示字节数组中的null,占用一个byte位。0x80等于十进制的128,二级制表示为1000 0000

public static void writeVNull(ByteDataArray buf) {
    buf.write((byte)0x80);
    return;
}

write variable int

writeVInt有三个重载方法,分别可以将int类型的value,写入到ByteDataArrayOutputStream以及给定初始position的byte[]中。其中写入byte[]的方法会返回数据写入后的下一位position,便于后续继续写入数据。


public static void writeVInt(ByteDataArray buf, int value) {
    if(value > 0x0FFFFFFF || value < 0) buf.write((byte)(0x80 | ((value >>> 28))));
    if(value > 0x1FFFFF || value < 0)   buf.write((byte)(0x80 | ((value >>> 21) & 0x7F)));
    if(value > 0x3FFF || value < 0)     buf.write((byte)(0x80 | ((value >>> 14) & 0x7F)));
    if(value > 0x7F || value < 0)       buf.write((byte)(0x80 | ((value >>>  7) & 0x7F)));

    buf.write((byte)(value & 0x7F));
}

public static void writeVInt(OutputStream out, int value) throws IOException {
    if(value > 0x0FFFFFFF || value < 0) out.write((byte)(0x80 | ((value >>> 28))));
    if(value > 0x1FFFFF || value < 0)   out.write((byte)(0x80 | ((value >>> 21) & 0x7F)));
    if(value > 0x3FFF || value < 0)     out.write((byte)(0x80 | ((value >>> 14) & 0x7F)));
    if(value > 0x7F || value < 0)       out.write((byte)(0x80 | ((value >>>  7) & 0x7F)));

    out.write((byte)(value & 0x7F));
}

public static int writeVInt(byte data[], int pos, int value) {
    if(value > 0x0FFFFFFF || value < 0) data[pos++] = ((byte)(0x80 | ((value >>> 28))));
    if(value > 0x1FFFFF || value < 0)   data[pos++] = ((byte)(0x80 | ((value >>> 21) & 0x7F)));
    if(value > 0x3FFF || value < 0)     data[pos++] = ((byte)(0x80 | ((value >>> 14) & 0x7F)));
    if(value > 0x7F || value < 0)       data[pos++] = ((byte)(0x80 | ((value >>>  7) & 0x7F)));
    
    data[pos++] = (byte)(value & 0x7F);
    
    return pos;
}

write variable long

writeVLong有两种重载方法,分别可以将int类型的value,写入到ByteDataArrayOutputStream,并没有像writeVInt提供写入byte[]的方法。

public static void writeVLong(ByteDataArray buf, long value) {
    if(value < 0)                                buf.write((byte)0x81);
    if(value > 0xFFFFFFFFFFFFFFL || value < 0)   buf.write((byte)(0x80 | ((value >>> 56) & 0x7FL)));
    if(value > 0x1FFFFFFFFFFFFL || value < 0)    buf.write((byte)(0x80 | ((value >>> 49) & 0x7FL)));
    if(value > 0x3FFFFFFFFFFL || value < 0)      buf.write((byte)(0x80 | ((value >>> 42) & 0x7FL)));
    if(value > 0x7FFFFFFFFL || value < 0)        buf.write((byte)(0x80 | ((value >>> 35) & 0x7FL)));
    if(value > 0xFFFFFFFL || value < 0)          buf.write((byte)(0x80 | ((value >>> 28) & 0x7FL)));
    if(value > 0x1FFFFFL || value < 0)           buf.write((byte)(0x80 | ((value >>> 21) & 0x7FL)));
    if(value > 0x3FFFL || value < 0)             buf.write((byte)(0x80 | ((value >>> 14) & 0x7FL)));
    if(value > 0x7FL || value < 0)               buf.write((byte)(0x80 | ((value >>>  7) & 0x7FL)));

    buf.write((byte)(value & 0x7FL));
}

public static void writeVLong(OutputStream out, long value) throws IOException {
    if(value < 0)                                out.write((byte)0x81);
    if(value > 0xFFFFFFFFFFFFFFL || value < 0)   out.write((byte)(0x80 | ((value >>> 56) & 0x7FL)));
    if(value > 0x1FFFFFFFFFFFFL || value < 0)    out.write((byte)(0x80 | ((value >>> 49) & 0x7FL)));
    if(value > 0x3FFFFFFFFFFL || value < 0)      out.write((byte)(0x80 | ((value >>> 42) & 0x7FL)));
    if(value > 0x7FFFFFFFFL || value < 0)        out.write((byte)(0x80 | ((value >>> 35) & 0x7FL)));
    if(value > 0xFFFFFFFL || value < 0)          out.write((byte)(0x80 | ((value >>> 28) & 0x7FL)));
    if(value > 0x1FFFFFL || value < 0)           out.write((byte)(0x80 | ((value >>> 21) & 0x7FL)));
    if(value > 0x3FFFL || value < 0)             out.write((byte)(0x80 | ((value >>> 14) & 0x7FL)));
    if(value > 0x7FL || value < 0)               out.write((byte)(0x80 | ((value >>>  7) & 0x7FL)));

    out.write((byte)(value & 0x7FL));
}

read

readwrite的逆运算。read不仅仅需要当前位存储的byte[]InputStream,还需要具体的position

read variable null

读取对应position的byte值,判断是否为0x80。返回值为是否为null的boolean,并不是null。

public static boolean readVNull(ByteData arr, long position) {
    return arr.get(position) == (byte)0x80;
}

read variable int

readVInt的三个重载方法,一个从ByteDatabyte[]中直接读取对应position的byte值。另两个方法从InputStreamHollowBlobInput中使用read方法读取字节。

public static int readVInt(ByteData arr, long position) {
    byte b = arr.get(position++);

    // 空歌白石:判断读取的位是否为null,变长中不允许有空值。
    if(b == (byte) 0x80)
        throw new RuntimeException("Attempting to read null value as int");

    int value = b & 0x7F;
    while ((b & 0x80) != 0) {
        b = arr.get(position++);
        // 空歌白石:左移7位
        value <<= 7;
        value |= (b & 0x7F);
    }

    return value;
}

public static int readVInt(InputStream in) throws IOException {
    byte b = readByteSafely(in);

    if(b == (byte) 0x80)
        throw new RuntimeException("Attempting to read null value as int");

    int value = b & 0x7F;
    while ((b & 0x80) != 0) {
        b = readByteSafely(in);
        value <<= 7;
        value |= (b & 0x7F);
    }

    return value;
}

public static int readVInt(HollowBlobInput in) throws IOException {
    byte b = readByteSafely(in);

    if(b == (byte) 0x80)
        throw new RuntimeException("Attempting to read null value as int");

    int value = b & 0x7F;
    while ((b & 0x80) != 0) {
        b = readByteSafely(in);
        value <<= 7;
        value |= (b & 0x7F);
    }

    return value;
}

read variable long

readVLong的三个重载方法,一个从ByteDatabyte[]中直接读取对应position的long值。另两个方法从InputStreamHollowBlobInput中使用read方法读取字节。

public static long readVLong(ByteData arr, long position) {
    byte b = arr.get(position++);

    if(b == (byte) 0x80)
        throw new RuntimeException("Attempting to read null value as long");

    long value = b & 0x7F;
    while ((b & 0x80) != 0) {
        b = arr.get(position++);
        value <<= 7;
        value |= (b & 0x7F);
    }

    return value;
}
public static long readVLong(InputStream in) throws IOException {
    byte b = readByteSafely(in);

    if(b == (byte) 0x80)
        throw new RuntimeException("Attempting to read null value as long");

    long value = b & 0x7F;
    while ((b & 0x80) != 0) {
        b = readByteSafely(in);
        value <<= 7;
        value |= (b & 0x7F);
    }

    return value;
}
public static long readVLong(HollowBlobInput in) throws IOException {
    byte b = readByteSafely(in);

    if (b == (byte) 0x80)
        throw new RuntimeException("Attempting to read null value as long");

    long value = b & 0x7F;
    while ((b & 0x80) != 0) {
        b = readByteSafely(in);
        value <<= 7;
        value |= (b & 0x7F);
    }

    return value;
}

readByteSafely

InputStreamread方法从输入流中读取数据的下一个字节。 值字节作为 int 返回,范围为 0255。如果由于到达流的末尾而没有可用的字节,则返回值 -1。 此方法会一直阻塞的,直到输入数据可用、检测到流结束或引发异常。

public static byte readByteSafely(InputStream is) throws IOException {
    int i = is.read();
    if (i == -1) {
        throw new EOFException("Unexpected end of VarInt record");
    }
    return (byte)i;
}

HollowBlobInputread内部基于输入流的类型分为RandomAccessFileDataInputStream两种读写方式。

  • DataInputStreamInputStream的一种具体实现,因此使用DataInputStream的read和InputStream并无差别。
  • RandomAccessFile是基于文件的数据流读写,从此文件中读取一个字节的数据。 该字节以 0255 (0x00-0x0ff) 范围内的整数形式返回。 如果还没有输入可用,则此方法会阻塞。尽管 RandomAccessFile 不是 InputStream 的子类,但此方法的行为方式与 InputStreamInputStream.read() 方法完全相同。
public static byte readByteSafely(HollowBlobInput in) throws IOException {
    int i = in.read();
    if (i == -1) {
        throw new EOFException("Unexpected end of VarInt record");
    }
    return (byte)i;
}

next

从指定位置开始确定提供的 ByteData}中可变长度long` 的大小(以字节为单位)。

public static int nextVLongSize(ByteData arr, long position) {
    byte b = arr.get(position++);

    // 空歌白石:如果为null,只占用一位大小。
    if(b == (byte) 0x80)
        return 1;

    int length = 1;

    while((b & 0x80) != 0) {
        b = arr.get(position++);
        length++;
    }

    return length;
}

size & count

size两个重载方法分别计算确定int和long两种类型值在编码为可变长度整数时指定值的大小(以字节为单位)。

public static int sizeOfVInt(int value) {
    if(value < 0)
        return 5;
    if(value < 0x80)
        return 1;
    if(value < 0x4000)
        return 2;
    if(value < 0x200000)
        return 3;
    if(value < 0x10000000)
        return 4;
    return 5;
}

public static int sizeOfVLong(long value) {
    if(value < 0L)
        return 10;
    if(value < 0x80L)
        return 1;
    if(value < 0x4000L)
        return 2;
    if(value < 0x200000L)
        return 3;
    if(value < 0x10000000L)
        return 4;
    if(value < 0x800000000L)
        return 5;
    if(value < 0x40000000000L)
        return 6;
    if(value < 0x2000000000000L)
        return 7;
    if(value < 0x100000000000000L)
        return 8;
    return 9;
}

countVarIntsInRange计算指定范围内提供的 ByteData 中编码的可变长度整数的数量。

public static int countVarIntsInRange(ByteData byteData, long fieldPosition, int length) {
    int numInts = 0;

    boolean insideInt = false;

    for(int i=0;i<length;i++) {
        byte b = byteData.get(fieldPosition + i);

        if((b & 0x80) == 0) {
            numInts++;
            insideInt = false;
        } else if(!insideInt && b == (byte)0x80) {
            numInts++;
        } else {
            insideInt = true;
        }
    }

    return numInts;
}

sizecount方法一般会用于计算具体字段占用的byte位数,用于准确初始化或从回收站申请segments大小。

IOUtils

IOUtilsVarInt的一个典型的应用场景,我们以此进一步展开说明如何使用VarInt

package com.netflix.hollow.core.util;

import com.netflix.hollow.core.memory.encoding.VarInt;
import com.netflix.hollow.core.read.HollowBlobInput;
import java.io.DataOutputStream;
import java.io.IOException;

public class IOUtils {

    public static void copyBytes(HollowBlobInput in, DataOutputStream[] os, long numBytes) throws IOException {
        byte buf[] = new byte[4096];

        while(numBytes > 0) {
            int numBytesToRead = 4096;
            if(numBytes < 4096)
                numBytesToRead = (int)numBytes;
            int bytesRead = in.read(buf, 0, numBytesToRead);

            for(int i=0;i<os.length;i++) {
                os[i].write(buf, 0, bytesRead);
            }

            numBytes -= bytesRead;
        }
    }

    public static void copySegmentedLongArray(HollowBlobInput in, DataOutputStream[] os) throws IOException {
        long numLongsToWrite = VarInt.readVLong(in);
        for(int i=0;i<os.length;i++)
            VarInt.writeVLong(os[i], numLongsToWrite);

        copyBytes(in, os, numLongsToWrite * 8);
    }

    public static int copyVInt(HollowBlobInput in, DataOutputStream[] os) throws IOException {
        int value = VarInt.readVInt(in);
        for(int i=0;i<os.length;i++)
            VarInt.writeVInt(os[i], value);
        return value;
    }

    public static long copyVLong(HollowBlobInput in, DataOutputStream[] os) throws IOException {
        long value = VarInt.readVLong(in);
        for(int i=0;i<os.length;i++)
            VarInt.writeVLong(os[i], value);
        return value;
    }
}

VarInt总结

VarInt解决了定长存储的整数类型绝对值较小时空间浪费大的问题,但是,VarInt 编码同样存在缺陷,那就是存储大数的时候,反而会比 直接存储binary 的空间开销更大:由于需要接着用8bit中的首位作为标记位,因此本来 4 个字节存下的数可能需要 5 个字节,8 个字节存下的数可能需要 10个字节。有一部分问题可以有下文将要介绍的ZigZag编码方式得到解决,但是也并不能完全规避掉,因此在使用时需要特别的小心。

ZigZag

Zig-zag编码解决了绝对值较小的负数经过varint编码后空间开销较大的问题。

原码:         10000000  00000000  00000000  00001011
反码:         11111111  11111111  11111111  11110100
补码:         11111111  11111111  11111111  11110101
VarInt编码:   00001111  11111111  11111111  11111111  11110101

显然,对于绝对值较小的负数,用 varint 编码以后前导 1 过多,难以压缩,空间开销比 binary 编码还大。那么Zigzag如何解决这个问题呢?Zigzag的解决思路是将负数转正数,从而把前导 1 转成前导 0,便于 varint 压缩。

ZigZag的算法步骤可以总结为:

  1. 不分正负:符号位后置,数值位前移
  2. 对于负数:符号位不变,数值位取反
public class ZigZag {

    public static long encodeLong(long l) {
        return (l << 1) ^ (l >> 63);
    }

    public static long decodeLong(long l) {
        return (l >>> 1) ^ ((l << 63) >> 63);
    }

    public static int encodeInt(int i) {
        return (i << 1) ^ (i >> 31);
    }

    public static int decodeInt(int i) {
        return (i >>> 1) ^ ((i << 31) >> 31);
    }
}

我们用一个实际的例子来分析下:

负数(-11)
  补码:                       11111111  11111111  11111111  11110101
  符号位后置,数值位前移:      11111111  11111111  11111111  11101011
  符号位不变,数值位取反(21):  00000000  00000000  00000000  00010101

正数(11)
  补码:                       00000000  00000000  00000000  00010101
  符号位后置,数值位前移(22):  00000000  00000000  00000000  00101010

Zig-zag编码方式,在Hollow中主要用于对 com.netflix.hollow.core.schema.HollowObjectSchema.FieldType.INTcom.netflix.hollow.core.schema.HollowObjectSchema.FieldType.LONG 两种类型字段进行编码,这样做可以使用更少的位对更小的绝对值进行编码。

更为详细的分析可以参见之前写的一篇文章一种编码方式:ZigZag

ByteDataArray

ByteDataArray封装了对SegmentedByteArray的部分操作,可以完成写数据到SegmentedByteArray,同时用于追踪SegmentedByteArray写入的索引。

在使用VarInt算法编码数据是,就是从ByteDataArray获取需要编码的数据。ByteDataArray,并不太复杂,这里值贴出构造方法。

private final SegmentedByteArray buf;
private long position;

public ByteDataArray() {
    this(WastefulRecycler.DEFAULT_INSTANCE);
}

public ByteDataArray(ArraySegmentRecycler memoryRecycler) {
    buf = new SegmentedByteArray(memoryRecycler);
}

ByteArrayOrdinalMap

ByteArrayOrdinalMap负责byte与Ordinal之间映射关系的管理和维护。ByteArrayOrdinalMap数据结构将字节序列映射到Ordinal。

ByteArrayOrdinalMap可以看成一个哈希表(Map),名为pointersAndOrdinalsAtomicLongArray存储key,而 ByteDataArray 存储value。 每个key有两个组件。

AtomicLongArray的高 29 位代表Ordinal,低35位表示指向 ByteDataBuffer 中字节序列起始位置的指针,每个字节序列前面都有一个可变长度整数(参见 VarInt),表示序列的长度。

Hollow-ByteArrayOrdinalMap.drawio.png

Ordinal:本意是序号,在Hollow中可以理解为一条具体的记录序号。

构造方法和属性

private static final long EMPTY_BUCKET_VALUE = -1L;

private static final int BITS_PER_ORDINAL = 29;
private static final int BITS_PER_POINTER = Long.SIZE - BITS_PER_ORDINAL;
private static final long POINTER_MASK = (1L << BITS_PER_POINTER) - 1;
private static final long ORDINAL_MASK = (1L << BITS_PER_ORDINAL) - 1;
private static final long MAX_BYTE_DATA_LENGTH = 1L << BITS_PER_POINTER;

/// Thread safety:  We need volatile access semantics to the individual elements in the
/// pointersAndOrdinals array.
/// Ordinal is the high 29 bits.  Pointer to byte data is the low 35 bits.
/// In addition need volatile access to the reference when resize occurs
private volatile AtomicLongArray pointersAndOrdinals;
private final ByteDataArray byteData;
private final FreeOrdinalTracker freeOrdinalTracker;

// 空歌白石:Map大小
private int size;
// 空歌白石:Map的加载因子,当前map数量百分比超过此值时,会进行扩容
private int sizeBeforeGrow;

private BitSet unusedPreviousOrdinals;

// 空歌白石:将`Ordinal`映射到`Pointer`的数组,以便在写入 `blob` 流时可以轻松查找它们。
private long[] pointersByOrdinal;

/**
* 空歌白石:初始化一个拥有256个元素的数据,加载因子是70%。
*/
public ByteArrayOrdinalMap() {
    this(256);
}

/**
* 空歌白石:创建一个字节数组序数映射,其初始容量为给定大小,向上舍入到最接近的 2 次幂,负载因子为 70%。
*/
public ByteArrayOrdinalMap(int size) {
    size = bucketSize(size);

    this.freeOrdinalTracker = new FreeOrdinalTracker();
    this.byteData = new ByteDataArray(WastefulRecycler.DEFAULT_INSTANCE);
    this.pointersAndOrdinals = emptyKeyArray(size);
    this.sizeBeforeGrow = (int) (((float) size) * 0.7); /// 70% load factor
    this.size = 0;
}

// 空歌白石:向上舍入到最接近的 2 次幂
private static int bucketSize(int x) {
    // See Hackers Delight Fig. 3-3
    x = x - 1;
    x = x | (x >> 1);
    x = x | (x >> 2);
    x = x | (x >> 4);
    x = x | (x >> 8);
    x = x | (x >> 16);
    return (x < 256) ? 256 : (x >= 1 << 30) ? 1 << 30 : x + 1;
}

/**
* 空歌白石:创建一个指定大小的 AtomicLongArray,数组中的每个值都是 EMPTY_BUCKET_VALUE,也就是-1
*/
private AtomicLongArray emptyKeyArray(int size) {
    AtomicLongArray arr = new AtomicLongArray(size);
    // Volatile store not required, could use plain store
    // See VarHandles for JDK >= 9
    for (int i = 0; i < arr.length(); i++) {
        arr.lazySet(i, EMPTY_BUCKET_VALUE);
    }
    return arr;
}

put

put方法会预先分配一个Ordinal给到已经序列化的byte[]。

  • 这个方法不是线程安全的
  • 这个方法不会更新FreeOrdinalTrackerfreeOrdinal
public void put(ByteDataArray serializedRepresentation, int ordinal) {
    if (ordinal < 0 || ordinal > ORDINAL_MASK) {
        throw new IllegalArgumentException(String.format(
                "The given ordinal %s is out of bounds and not within the closed interval [0, %s]",
                ordinal, ORDINAL_MASK));
    }
    if (size > sizeBeforeGrow) {
        growKeyArray();
    }

    int hash = HashCodes.hashCode(serializedRepresentation);

    AtomicLongArray pao = pointersAndOrdinals;

    int modBitmask = pao.length() - 1;
    int bucket = hash & modBitmask;
    long key = pao.get(bucket);

    while (key != EMPTY_BUCKET_VALUE) {
        bucket = (bucket + 1) & modBitmask;
        key = pao.get(bucket);
    }

    long pointer = byteData.length();

    VarInt.writeVInt(byteData, (int) serializedRepresentation.length());
    serializedRepresentation.copyTo(byteData);
    if (byteData.length() > MAX_BYTE_DATA_LENGTH) {
        throw new IllegalStateException(String.format(
                "The number of bytes for the serialized representations, %s, is too large and is greater than the maximum of %s bytes",
                byteData.length(), MAX_BYTE_DATA_LENGTH));
    }

    key = ((long) ordinal << BITS_PER_POINTER) | pointer;

    size++;

    pao.set(bucket, key);
}

get ordinal

get方法返回先前添加的字节序列的ordinal。如果此字节序列尚未添加到map中,则返回 -1。这适用于客户端堆安全双快照加载。

/**
* Returns the ordinal for a previously added byte sequence.  If this byte sequence has not been added to the map, then -1 is returned.<p>
* <p>
* This is intended for use in the client-side heap-safe double snapshot load.
*
* @param serializedRepresentation the serialized representation
* @return The ordinal for this serialized representation, or -1.
*/
public int get(ByteDataArray serializedRepresentation) {
    return get(serializedRepresentation, HashCodes.hashCode(serializedRepresentation));
}

private int get(ByteDataArray serializedRepresentation, int hash) {
    AtomicLongArray pao = pointersAndOrdinals;

    int modBitmask = pao.length() - 1;
    
    // 空歌白石:Hash与掩码计算得到具体的bucket
    int bucket = hash & modBitmask;

    long key = pao.get(bucket);

    // Linear probing to resolve collisions
    // Given the load factor it is guaranteed that the loop will terminate
    // as there will be at least one empty bucket
    // To ensure this is the case it is important that pointersAndOrdinals
    // is read into a local variable and thereafter used, otherwise a concurrent
    // size increase may break this invariant
    while (key != EMPTY_BUCKET_VALUE) {
        if (compare(serializedRepresentation, key)) {
            return (int) (key >>> BITS_PER_POINTER);
        }

        bucket = (bucket + 1) & modBitmask;
        key = pao.get(bucket);
    }

    return -1;
}

ByteDataArray to Ordinal

通过getOrAssignOrdinal方法,可以从ByteDataArray转换为对应的Ordinal

public int getOrAssignOrdinal(ByteDataArray serializedRepresentation) {
    return getOrAssignOrdinal(serializedRepresentation, -1);
}

/**
* Adds a sequence of bytes to this map.  If the sequence of bytes has previously been added
* to this map then its assigned ordinal is returned.
* If the sequence of bytes has not been added to this map then a new ordinal is assigned
* and returned.
* <p>
* This operation is thread-safe.
*
* @param serializedRepresentation the sequence of bytes
* @param preferredOrdinal the preferred ordinal to assign, if not already assigned to
* another sequence of bytes and the given sequence of bytes has not previously been added
* @return the assigned ordinal
*/
public int getOrAssignOrdinal(ByteDataArray serializedRepresentation, int preferredOrdinal) {
    int hash = HashCodes.hashCode(serializedRepresentation);

    int ordinal = get(serializedRepresentation, hash);
    return ordinal != -1 ? ordinal : assignOrdinal(serializedRepresentation, hash, preferredOrdinal);
}

/// acquire the lock before writing.
private synchronized int assignOrdinal(ByteDataArray serializedRepresentation, int hash, int preferredOrdinal) {
    if (preferredOrdinal < -1 || preferredOrdinal > ORDINAL_MASK) {
        throw new IllegalArgumentException(String.format(
                "The given preferred ordinal %s is out of bounds and not within the closed interval [-1, %s]",
                preferredOrdinal, ORDINAL_MASK));
    }
    if (size > sizeBeforeGrow) {
        growKeyArray();
    }

    /// check to make sure that after acquiring the lock, the element still does not exist.
    /// this operation is akin to double-checked locking which is 'fixed' with the JSR 133 memory model in JVM >= 1.5.
    /// Note that this also requires pointersAndOrdinals be volatile so resizes are also visible
    AtomicLongArray pao = pointersAndOrdinals;

    int modBitmask = pao.length() - 1;
    int bucket = hash & modBitmask;
    long key = pao.get(bucket);

    while (key != EMPTY_BUCKET_VALUE) {
        if (compare(serializedRepresentation, key)) {
            return (int) (key >>> BITS_PER_POINTER);
        }

        bucket = (bucket + 1) & modBitmask;
        key = pao.get(bucket);
    }

    /// the ordinal for this object still does not exist in the list, even after the lock has been acquired.
    /// it is up to this thread to add it at the current bucket position.
    int ordinal = findFreeOrdinal(preferredOrdinal);
    if (ordinal > ORDINAL_MASK) {
        throw new IllegalStateException(String.format(
                "Ordinal cannot be assigned. The to be assigned ordinal, %s, is greater than the maximum supported ordinal value of %s",
                ordinal, ORDINAL_MASK));
    }

    long pointer = byteData.length();

    VarInt.writeVInt(byteData, (int) serializedRepresentation.length());
    /// Copying might cause a resize to the segmented array held by byteData
    /// A reading thread may observe a null value for a segment during the creation
    /// of a new segments array (see SegmentedByteArray.ensureCapacity).
    serializedRepresentation.copyTo(byteData);
    if (byteData.length() > MAX_BYTE_DATA_LENGTH) {
        throw new IllegalStateException(String.format(
                "The number of bytes for the serialized representations, %s, is too large and is greater than the maximum of %s bytes",
                byteData.length(), MAX_BYTE_DATA_LENGTH));
    }

    key = ((long) ordinal << BITS_PER_POINTER) | pointer;

    size++;

    /// this set on the AtomicLongArray has volatile semantics (i.e. behaves like a monitor release).
    /// Any other thread reading this element in the AtomicLongArray will have visibility to all memory writes this thread has made up to this point.
    /// This means the entire byte sequence is guaranteed to be visible to any thread which reads the pointer to that data.
    pao.set(bucket, key);

    return ordinal;
}

/**
* If the preferredOrdinal has not already been used, mark it and use it.  Otherwise,
* delegate to the FreeOrdinalTracker.
*/
private int findFreeOrdinal(int preferredOrdinal) {
    if (preferredOrdinal != -1 && unusedPreviousOrdinals.get(preferredOrdinal)) {
        unusedPreviousOrdinals.clear(preferredOrdinal);
        return preferredOrdinal;
    }

    return freeOrdinalTracker.getFreeOrdinal();
}

recalculateFreeOrdinals

recalculateFreeOrdinals逻辑较为复杂,首先看下是如何使用的使用mapOrdinal方法,将HollowWriteRecord映射到Ordinal。在使用recalculateFreeOrdinals方法重新计算FreeOrdinals


protected final ByteArrayOrdinalMap ordinalMap;

public void mapOrdinal(HollowWriteRecord rec, int newOrdinal, boolean markPreviousCycle, boolean markCurrentCycle) {
    if(!ordinalMap.isReadyForAddingObjects())
        throw new RuntimeException("The HollowWriteStateEngine is not ready to add more Objects.  Did you remember to call stateEngine.prepareForNextCycle()?");

    ByteDataArray scratch = scratch();
    rec.writeDataTo(scratch);
    ordinalMap.put(scratch, newOrdinal);
    if(markPreviousCycle)
        previousCyclePopulated.set(newOrdinal);
    if(markCurrentCycle)
        currentCyclePopulated.set(newOrdinal);
    scratch.reset();
}

/**
* Correct the free ordinal list after using mapOrdinal()
*/
public void recalculateFreeOrdinals() {
    ordinalMap.recalculateFreeOrdinals();
}

借助于FreeOrdinalTracker重新分配Ordinal占用的空间,将内存占用更加紧凑。

public void recalculateFreeOrdinals() {
    BitSet populatedOrdinals = new BitSet();
    AtomicLongArray pao = pointersAndOrdinals;

    for (int i = 0; i < pao.length(); i++) {
        long key = pao.get(i);
        if (key != EMPTY_BUCKET_VALUE) {
            int ordinal = (int) (key >>> BITS_PER_POINTER);
            populatedOrdinals.set(ordinal);
        }
    }

    recalculateFreeOrdinals(populatedOrdinals);
}


private void recalculateFreeOrdinals(BitSet populatedOrdinals) {
    freeOrdinalTracker.reset();

    int length = populatedOrdinals.length();
    int ordinal = populatedOrdinals.nextClearBit(0);

    while (ordinal < length) {
        freeOrdinalTracker.returnOrdinalToPool(ordinal);
        ordinal = populatedOrdinals.nextClearBit(ordinal + 1);
    }

    freeOrdinalTracker.setNextEmptyOrdinal(length);
}

maxOrdinal

计算当前pointersAndOrdinals中最大的序号ordinal

public int maxOrdinal() {
    int maxOrdinal = -1;
    AtomicLongArray pao = pointersAndOrdinals;

    for (int i = 0; i < pao.length(); i++) {
        long key = pao.get(i);
        if (key != EMPTY_BUCKET_VALUE) {
            int ordinal = (int) (key >>> BITS_PER_POINTER);
            if (ordinal > maxOrdinal) {
                maxOrdinal = ordinal;
            }
        }
    }
    return maxOrdinal;
}

prepareForWrite

prepareForWrite创建一个将Ordinal映射到Pointer的数组,以便在写入 blob 流时可以轻松查找它们。

public void prepareForWrite() {
    int maxOrdinal = 0;
    AtomicLongArray pao = pointersAndOrdinals;

    for (int i = 0; i < pao.length(); i++) {
        long key = pao.get(i);
        if (key != EMPTY_BUCKET_VALUE) {
            int ordinal = (int) (key >>> BITS_PER_POINTER);
            if (ordinal > maxOrdinal) {
                maxOrdinal = ordinal;
            }
        }
    }

    long[] pbo = new long[maxOrdinal + 1];
    // 空歌白石:在新的数组中全部填充-1
    Arrays.fill(pbo, -1);

    for (int i = 0; i < pao.length(); i++) {
        long key = pao.get(i);
        if (key != EMPTY_BUCKET_VALUE) {
            int ordinal = (int) (key >>> BITS_PER_POINTER);
            pbo[ordinal] = key & POINTER_MASK;
        }
    }

    pointersByOrdinal = pbo;
}

compact

compact顾名思义,此方法可以充分利用内存池,调整已经释放的空间,让内存占用更加紧凑。compact回收上一个循环使用的字节数组中的在这个循环中没有被引用空间。这是通过在字节数组中向下移动所有使用的字节序列来实现的,然后更新键数组以反映新指针并排除已删除的条目。此方法将未使用的``Ordinal`放回回收站中。

/**
* Reclaim space in the byte array used in the previous cycle, but not referenced in this cycle.<p>
* <p>
* This is achieved by shifting all used byte sequences down in the byte array, then updating
* the key array to reflect the new pointers and exclude the removed entries.  This is also where ordinals
* which are unused are returned to the pool.<p>
*
* @param usedOrdinals a bit set representing the ordinals which are currently referenced by any image.
*/
public void compact(ThreadSafeBitSet usedOrdinals, int numShards, boolean focusHoleFillInFewestShards) {
    long[] populatedReverseKeys = new long[size];

    int counter = 0;
    AtomicLongArray pao = pointersAndOrdinals;

    for (int i = 0; i < pao.length(); i++) {
        long key = pao.get(i);
        if (key != EMPTY_BUCKET_VALUE) {
            populatedReverseKeys[counter++] = key << BITS_PER_ORDINAL | key >>> BITS_PER_POINTER;
        }
    }

    Arrays.sort(populatedReverseKeys);

    SegmentedByteArray arr = byteData.getUnderlyingArray();
    long currentCopyPointer = 0;

    for (int i = 0; i < populatedReverseKeys.length; i++) {
        int ordinal = (int) (populatedReverseKeys[i] & ORDINAL_MASK);

        if (usedOrdinals.get(ordinal)) {
            long pointer = populatedReverseKeys[i] >>> BITS_PER_ORDINAL;
            int length = VarInt.readVInt(arr, pointer);
            length += VarInt.sizeOfVInt(length);

            if (currentCopyPointer != pointer) {
                arr.copy(arr, pointer, currentCopyPointer, length);
            }

            populatedReverseKeys[i] = populatedReverseKeys[i] << BITS_PER_POINTER | currentCopyPointer;

            currentCopyPointer += length;
        } else {
            freeOrdinalTracker.returnOrdinalToPool(ordinal);
            populatedReverseKeys[i] = EMPTY_BUCKET_VALUE;
        }
    }

    byteData.setPosition(currentCopyPointer);

    if(focusHoleFillInFewestShards && numShards > 1)
        freeOrdinalTracker.sort(numShards);
    else
        freeOrdinalTracker.sort();

    // Reset the array then fill with compacted values
    // Volatile store not required, could use plain store
    // See VarHandles for JDK >= 9
    for (int i = 0; i < pao.length(); i++) {
        pao.lazySet(i, EMPTY_BUCKET_VALUE);
    }
    populateNewHashArray(pao, populatedReverseKeys);
    size = usedOrdinals.cardinality();

    pointersByOrdinal = null;
    unusedPreviousOrdinals = null;
}

resize

resize通过增加容量来调整ordinalMap的大小。如果当前容量足以满足给定大小,则不采取任何操作。

特别注意

  1. resize不是线程安全的。
  2. 要增加到的大小size,需要四舍五入到最接近的 2 次方。
/**
* Resize the ordinal map by increasing its capacity.
* <p>
* No action is take if the current capacity is sufficient for the given size.
* <p>
* WARNING: THIS OPERATION IS NOT THREAD-SAFE.
*
* @param size the size to increase to, rounded up to the nearest power of two.
*/
public void resize(int size) {
    size = bucketSize(size);

    if (pointersAndOrdinals.length() < size) {
        growKeyArray(size);
    }
}

growKeyArray

扩容数组长度到原数组的长度的2倍,当前数组中的所有值都必须重新散列并添加到新数组中。

private void growKeyArray() {
    int newSize = pointersAndOrdinals.length() << 1;
    if (newSize < 0) {
        throw new IllegalStateException("New size computed to grow the underlying array for the map is negative. " +
                "This is most likely due to the total number of keys added to map has exceeded the max capacity of the keys map can hold. "
                +
                "Current array size :" + pointersAndOrdinals.length() + " and size to grow :" + newSize);
    }
    growKeyArray(newSize);
}

private void growKeyArray(int newSize) {
    AtomicLongArray pao = pointersAndOrdinals;
    assert (newSize & (newSize - 1)) == 0; // power of 2
    assert pao.length() < newSize;

    // 空歌白石:重新初始化一个新的数组
    AtomicLongArray newKeys = emptyKeyArray(newSize);

    long[] valuesToAdd = new long[size];

    int counter = 0;

    /// do not iterate over these values in the same order in which they appear in the hashed array.
    /// if we do so, we cause large clusters of collisions to appear (because we resolve collisions with linear probing).
    // 空歌白石:不要以它们在散列数组中出现的相同顺序迭代这些值。如果我们这样做,我们会导致出现大量的碰撞(因为我们用线性探测解决了碰撞)。
    for (int i = 0; i < pao.length(); i++) {
        long key = pao.get(i);
        if (key != EMPTY_BUCKET_VALUE) {
            valuesToAdd[counter++] = key;
        }
    }

    // 空歌白石:升序排列values
    Arrays.sort(valuesToAdd);

    // 空歌白石:重新计算Hash
    populateNewHashArray(newKeys, valuesToAdd, counter);

    /// 70% load factor
    sizeBeforeGrow = (int) (((float) newSize) * 0.7);

    // 空歌白石:将新数组指向原地址
    pointersAndOrdinals = newKeys;
}

populateNewHashArray

populateNewHashArray实现了重新Hash所有已有元素的能力。

private void populateNewHashArray(AtomicLongArray newKeys, long[] valuesToAdd) {
    populateNewHashArray(newKeys, valuesToAdd, valuesToAdd.length);
}

private void populateNewHashArray(AtomicLongArray newKeys, long[] valuesToAdd, int length) {
    assert length <= valuesToAdd.length;

    int modBitmask = newKeys.length() - 1;

    for (int i = 0; i < length; i++) {
        long value = valuesToAdd[i];
        if (value != EMPTY_BUCKET_VALUE) {
            int hash = rehashPreviouslyAddedData(value);
            int bucket = hash & modBitmask;
            
            while (newKeys.get(bucket) != EMPTY_BUCKET_VALUE) {
                bucket = (bucket + 1) & modBitmask;
            }
            // Volatile store not required, could use plain store
            // See VarHandles for JDK >= 9
            newKeys.lazySet(bucket, value);
        }
    }
}

/**
* 空歌白石:获取指定键指向的字节数组的哈希码。
*/
private int rehashPreviouslyAddedData(long key) {
    long position = key & POINTER_MASK;

    int sizeOfData = VarInt.readVInt(byteData.getUnderlyingArray(), position);
    position += VarInt.sizeOfVInt(sizeOfData);

    return HashCodes.hashCode(byteData.getUnderlyingArray(), position, sizeOfData);
}

FreeOrdinalTracker

FreeOrdinalTracker负责管理一堆未使用的OrdinalByteArrayOrdinalMap 使用此数据结构来跟踪未使用的Ordinal并将其分配给新记录。设计这个类的目标是确保在服务器处理期间通过删除未使用的Ordinal产生的“空洞”在随后的周期中被重用,而不是无限地增加Ordinal占用的空间,达到节省内存以及降低GC的目的。

构造函数与属性

// 空歌白石:被释放的Ordinal,初始化64个Ordinal
private int freeOrdinals[];
// 空歌白石:size为freeOrdinals的大小+1。
private int size;
// 空歌白石:下一个空的Ordinal的指针
private int nextEmptyOrdinal;

public FreeOrdinalTracker() {
    this(0);
}

private FreeOrdinalTracker(int nextEmptyOrdinal) {
    this.freeOrdinals = new int[64];
    this.nextEmptyOrdinal = nextEmptyOrdinal;
    this.size = 0;
}

getOrdinal & returnOrdinal

这里分析下如何取Ordinal和如何将Ordinal放回pool中。序列 0-n 中先前已释放的序数或下一个空的、先前未分配的序数。

在回收使用时,向后检索以获取被释放的Ordinal。

public int getFreeOrdinal() {
    if(size == 0)
        return nextEmptyOrdinal++;

    return freeOrdinals[--size];
}

将需要释放的Ordinal放回Pool中时,如果已释放的空间已满,会按照1.5倍扩容。

public void returnOrdinalToPool(int ordinal) {
    if(size == freeOrdinals.length) {
        freeOrdinals = Arrays.copyOf(freeOrdinals, freeOrdinals.length * 3 / 2);
    }

    freeOrdinals[size] = ordinal;
    size++;
}

sort

FreeOrdinalTracker需要保证已有的Ordinal是升序排列的,因此在发生回收等情况下,需要进行重新排序。通过重新排序,可以做到尽可能少减少内存的碎片中返回Ordinal。

先看第一种sort方法,此方法很简单,使用Arrays排序方法升序重排freeOrdinals

public void sort() {
    Arrays.sort(freeOrdinals, 0, size);
    reverseFreeOrdinalPool();
}

在分析sort方法先,先看下定义的内部类ShardShard定义了一种分片格式,包含当前释放的Ordinal数量,以及当前的Position。

private static class Shard {
    private int freeOrdinalCount;
    private int currentPos;
}

sort基于shards排序freeOrdinalsnumShards是什么呢?可以这样理解,numShards是存储一个Fields可能用到的分片数量。

public void sort(int numShards) {
    int shardNumberMask = numShards - 1;
    Shard shards[] = new Shard[numShards];
    for(int i=0;i<shards.length;i++)
        shards[i] = new Shard();

    for(int i=0;i<size;i++)
        shards[freeOrdinals[i] & shardNumberMask].freeOrdinalCount++;

    Shard orderedShards[] = Arrays.copyOf(shards, shards.length);
    Arrays.sort(orderedShards, (s1, s2) -> s2.freeOrdinalCount - s1.freeOrdinalCount);

    for(int i=1;i<numShards;i++)
        orderedShards[i].currentPos = orderedShards[i-1].currentPos + orderedShards[i-1].freeOrdinalCount;

    /// each shard will receive the ordinals in ascending order.
    Arrays.sort(freeOrdinals, 0, size);

    int newFreeOrdinals[] = new int[freeOrdinals.length];
    for(int i=0;i<size;i++) {
        Shard shard = shards[freeOrdinals[i] & shardNumberMask];
        newFreeOrdinals[shard.currentPos] = freeOrdinals[i];
        shard.currentPos++;
    }

    freeOrdinals = newFreeOrdinals;

    reverseFreeOrdinalPool();
}

两个sort方法都会依赖与reverseFreeOrdinalPool,在已经升序排列好的Ordinal数组,调整为降序排列。这样有什么作用呢?通过getFreeOrdinal()方法可以看出,取值是从后向前取值,有效地避免了内存的空心化。

private void reverseFreeOrdinalPool() {
    int midpoint = size / 2;
    for(int i=0;i<midpoint;i++) {
        int temp = freeOrdinals[i];
        freeOrdinals[i] = freeOrdinals[size-i-1];
        freeOrdinals[size-i-1] = temp;
    }
}

ThreadSafeBitSet

ThreadSafeBitSetBitSet 的无锁、线程安全版本的实现。此实现使用 AtomicLongArray 代替long[]来保存byte,然后在赋值时执行适当的比较和交换操作。

构造方法和属性

public static final int DEFAULT_LOG2_SEGMENT_SIZE_IN_BITS = 14;

private final int numLongsPerSegment;
private final int log2SegmentSize;
private final int segmentMask;
private final AtomicReference<ThreadSafeBitSetSegments> segments;

public ThreadSafeBitSet() {
    this(DEFAULT_LOG2_SEGMENT_SIZE_IN_BITS); /// 16384 bits, 2048 bytes, 256 longs per segment
}

public ThreadSafeBitSet(int log2SegmentSizeInBits) {
    this(log2SegmentSizeInBits, 0);
}

public ThreadSafeBitSet(int log2SegmentSizeInBits, int numBitsToPreallocate) {
    if(log2SegmentSizeInBits < 6)
        throw new IllegalArgumentException("Cannot specify fewer than 64 bits in each segment!");

    this.log2SegmentSize = log2SegmentSizeInBits;
    this.numLongsPerSegment = (1 << (log2SegmentSizeInBits - 6));
    this.segmentMask = numLongsPerSegment - 1;
    
    long numBitsPerSegment = numLongsPerSegment * 64;
    int numSegmentsToPreallocate = numBitsToPreallocate == 0 ? 1 : (int)(((numBitsToPreallocate - 1) / numBitsPerSegment) + 1);

    segments = new AtomicReference<ThreadSafeBitSetSegments>();
    segments.set(new ThreadSafeBitSetSegments(numSegmentsToPreallocate, numLongsPerSegment));
}

set

public void set(int position) {
    int segmentPosition = position >>> log2SegmentSize; /// which segment -- div by num bits per segment
    int longPosition = (position >>> 6) & segmentMask; /// which long in the segment -- remainder of div by num bits per segment
    int bitPosition = position & 0x3F; /// which bit in the long -- remainder of div by num bits in long (64)

    AtomicLongArray segment = getSegment(segmentPosition);

    long mask = 1L << bitPosition;

    // Thread safety: we need to loop until we win the race to set the long value.
    while(true) {
        // determine what the new long value will be after we set the appropriate bit.
        long currentLongValue = segment.get(longPosition);
        long newLongValue = currentLongValue | mask;

        // if no other thread has modified the value since we read it, we won the race and we are done.
        if(segment.compareAndSet(longPosition, currentLongValue, newLongValue))
            break;
    }
}

get

public boolean get(int position) {
    int segmentPosition = position >>> log2SegmentSize; /// which segment -- div by num bits per segment
    int longPosition = (position >>> 6) & segmentMask; /// which long in the segment -- remainder of div by num bits per segment
    int bitPosition = position & 0x3F; /// which bit in the long -- remainder of div by num bits in long (64)

    AtomicLongArray segment = getSegment(segmentPosition);

    long mask = 1L << bitPosition;

    return ((segment.get(longPosition) & mask) != 0);
}

getSegment

/**
* Get the segment at <code>segmentIndex</code>.  If this segment does not yet exist, create it.
*
* @param segmentIndex the segment index
* @return the segment
*/
private AtomicLongArray getSegment(int segmentIndex) {
    ThreadSafeBitSetSegments visibleSegments = segments.get();

    while(visibleSegments.numSegments() <= segmentIndex) {
        /// Thread safety:  newVisibleSegments contains all of the segments from the currently visible segments, plus extra.
        /// all of the segments in the currently visible segments are canonical and will not change.
        ThreadSafeBitSetSegments newVisibleSegments = new ThreadSafeBitSetSegments(visibleSegments, segmentIndex + 1, numLongsPerSegment);

        /// because we are using a compareAndSet, if this thread "wins the race" and successfully sets this variable, then the segments
        /// which are newly defined in newVisibleSegments become canonical.
        if(segments.compareAndSet(visibleSegments, newVisibleSegments)) {
            visibleSegments = newVisibleSegments;
        } else {
            /// If we "lose the race" and are growing the ThreadSafeBitSet segments larger,
            /// then we will gather the new canonical sets from the update which we missed on the next iteration of this loop.
            /// Newly defined segments in newVisibleSegments will be discarded, they do not get to become canonical.
            visibleSegments = segments.get();
        }
    }

    return visibleSegments.getSegment(segmentIndex);
}

maxSetBit

maxSetBit 表示写入到全量或增量的所需数据。

public long maxSetBit() {
    ThreadSafeBitSetSegments segments = this.segments.get();

    int segmentIdx = segments.numSegments() - 1;

    for (; segmentIdx >= 0; segmentIdx--) {
        AtomicLongArray segment = segments.getSegment(segmentIdx);
        for (int longIdx = segment.length() - 1; longIdx >= 0; longIdx--) {
            long l = segment.get(longIdx);
            if (l != 0)
                return (segmentIdx << log2SegmentSize) + (longIdx * 64) + (63 - Long.numberOfLeadingZeros(l));
        }
    }

    return -1;
}

nextSetBit

public int nextSetBit(int fromIndex) {
    if (fromIndex < 0)
        throw new IndexOutOfBoundsException("fromIndex < 0: " + fromIndex);

    int segmentPosition = fromIndex >>> log2SegmentSize; /// which segment -- div by num bits per segment

    ThreadSafeBitSetSegments segments = this.segments.get();

    if(segmentPosition >= segments.numSegments())
        return -1;

    int longPosition = (fromIndex >>> 6) & segmentMask; /// which long in the segment -- remainder of div by num bits per segment
    int bitPosition = fromIndex & 0x3F; /// which bit in the long -- remainder of div by num bits in long (64)
    AtomicLongArray segment = segments.getSegment(segmentPosition);

    long word = segment.get(longPosition) & (0xffffffffffffffffL << bitPosition);

    while (true) {
        if (word != 0)
            return (segmentPosition << (log2SegmentSize)) + (longPosition << 6) + Long.numberOfTrailingZeros(word);
        if (++longPosition > segmentMask) {
            segmentPosition++;
            if(segmentPosition >= segments.numSegments())
                return -1;
            segment = segments.getSegment(segmentPosition);
            longPosition = 0;
        }

        word = segment.get(longPosition);
    }
}

cardinality

计算整个位集合中数量,也就是计算基数。

/**
* @return the number of bits which are set in this bit set.
*/
public int cardinality() {
    ThreadSafeBitSetSegments segments = this.segments.get();

    int numSetBits = 0;

    for(int i=0;i<segments.numSegments();i++) {
        AtomicLongArray segment = segments.getSegment(i);
        for(int j=0;j<segment.length();j++) {
            numSetBits += Long.bitCount(segment.get(j));
        }
    }

    return numSetBits;
}

clear

基于AtomicLongArray的compareAndSet,也就是CAS算法保证线程安全,清空position位置的数据。

public void clear(int position) {
    int segmentPosition = position >>> log2SegmentSize; /// which segment -- div by num bits per segment
    int longPosition = (position >>> 6) & segmentMask; /// which long in the segment -- remainder of div by num bits per segment
    int bitPosition = position & 0x3F; /// which bit in the long -- remainder of div by num bits in long (64)

    AtomicLongArray segment = getSegment(segmentPosition);

    long mask = ~(1L << bitPosition);

    // Thread safety: we need to loop until we win the race to set the long value.
    while(true) {
        // determine what the new long value will be after we set the appropriate bit.
        long currentLongValue = segment.get(longPosition);
        long newLongValue = currentLongValue & mask;

        // if no other thread has modified the value since we read it, we won the race and we are done.
        if(segment.compareAndSet(longPosition, currentLongValue, newLongValue))
            break;
    }
}

clearAll

将所有的bit位都设置为0,并不清空内存。

/**
* Clear all bits to 0.
*/
public void clearAll() {
    ThreadSafeBitSetSegments segments = this.segments.get();

    for(int i=0;i<segments.numSegments();i++) {
        AtomicLongArray segment = segments.getSegment(i);

        for(int j=0;j<segment.length();j++) {
            segment.set(j, 0L);
        }
    }
}

andNot

返回一个新的bit集合,它是按位的,并且按位不是另一个位集。

/**
* Return a new bit set which contains all bits which are contained in this bit set, and which are NOT contained in the <code>other</code> bit set.<p>
*
* In other words, return a new bit set, which is a bitwise and with the bitwise not of the other bit set.
*
* @param other the other bit set
* @return the resulting bit set
*/
public ThreadSafeBitSet andNot(ThreadSafeBitSet other) {
    if(other.log2SegmentSize != log2SegmentSize)
        throw new IllegalArgumentException("Segment sizes must be the same");

    ThreadSafeBitSetSegments thisSegments = this.segments.get();
    ThreadSafeBitSetSegments otherSegments = other.segments.get();
    ThreadSafeBitSetSegments newSegments = new ThreadSafeBitSetSegments(thisSegments.numSegments(), numLongsPerSegment);

    for(int i=0;i<thisSegments.numSegments();i++) {
        AtomicLongArray thisArray = thisSegments.getSegment(i);
        AtomicLongArray otherArray = (i < otherSegments.numSegments()) ? otherSegments.getSegment(i) : null;
        AtomicLongArray newArray = newSegments.getSegment(i);

        for(int j=0;j<thisArray.length();j++) {
            long thisLong = thisArray.get(j);
            long otherLong = (otherArray == null) ? 0 : otherArray.get(j);

            newArray.set(j, thisLong & ~otherLong);
        }
    }

    ThreadSafeBitSet andNot = new ThreadSafeBitSet(log2SegmentSize);
    andNot.segments.set(newSegments);
    return andNot;
}

orAll


/**
* Return a new bit set which contains all bits which are contained in *any* of the specified bit sets.
*
* @param bitSets the other bit sets
* @return the resulting bit set
*/
public static ThreadSafeBitSet orAll(ThreadSafeBitSet... bitSets) {
    if(bitSets.length == 0)
        return new ThreadSafeBitSet();

    int log2SegmentSize = bitSets[0].log2SegmentSize;
    int numLongsPerSegment = bitSets[0].numLongsPerSegment;

    ThreadSafeBitSetSegments segments[] = new ThreadSafeBitSetSegments[bitSets.length];
    int maxNumSegments = 0;

    for(int i=0;i<bitSets.length;i++) {
        if(bitSets[i].log2SegmentSize != log2SegmentSize)
            throw new IllegalArgumentException("Segment sizes must be the same");

        segments[i] = bitSets[i].segments.get();
        if(segments[i].numSegments() > maxNumSegments)
            maxNumSegments = segments[i].numSegments();
    }

    ThreadSafeBitSetSegments newSegments = new ThreadSafeBitSetSegments(maxNumSegments, numLongsPerSegment);

    AtomicLongArray segment[] = new AtomicLongArray[segments.length];

    for(int i=0;i<maxNumSegments;i++) {
        for(int j=0;j<segments.length;j++) {
            segment[j] = i < segments[j].numSegments() ? segments[j].getSegment(i) : null;
        }

        AtomicLongArray newSegment = newSegments.getSegment(i);

        for(int j=0;j<numLongsPerSegment;j++) {
            long value = 0;
            for(int k=0;k<segments.length;k++) {
                if(segment[k] != null)
                    value |= segment[k].get(j);
            }
            newSegment.set(j, value);
        }
    }

    ThreadSafeBitSet or = new ThreadSafeBitSet(log2SegmentSize);
    or.segments.set(newSegments);
    return or;
}

ThreadSafeBitSetSegments

private static class ThreadSafeBitSetSegments {

    private final AtomicLongArray segments[];

    private ThreadSafeBitSetSegments(int numSegments, int segmentLength) {
        AtomicLongArray segments[] = new AtomicLongArray[numSegments];

        for(int i=0;i<numSegments;i++) {
            segments[i] = new AtomicLongArray(segmentLength);
        }

        /// Thread safety: Because this.segments is final, the preceding operations in this constructor are guaranteed to be visible to any
        /// other thread which accesses this.segments.
        this.segments = segments;
    }

    private ThreadSafeBitSetSegments(ThreadSafeBitSetSegments copyFrom, int numSegments, int segmentLength) {
        AtomicLongArray segments[] = new AtomicLongArray[numSegments];

        for(int i=0;i<numSegments;i++) {
            segments[i] = i < copyFrom.numSegments() ? copyFrom.getSegment(i) : new AtomicLongArray(segmentLength);
        }

        /// see above re: thread-safety of this assignment
        this.segments = segments;
    }

    public int numSegments() {
        return segments.length;
    }

    public AtomicLongArray getSegment(int index) {
        return segments[index];
    }

}

serializeBitsTo

public void serializeBitsTo(DataOutputStream os) throws IOException {
    ThreadSafeBitSetSegments segments = this.segments.get();

    os.writeInt(segments.numSegments() * numLongsPerSegment);

    for(int i=0;i<segments.numSegments();i++) {
        AtomicLongArray arr = segments.getSegment(i);

        for(int j=0;j<arr.length();j++) {
            os.writeLong(arr.get(j));
        }
    }
}

equals & hashCode & toString

@Override
public boolean equals(Object obj) {
    if(!(obj instanceof ThreadSafeBitSet))
        return false;

    ThreadSafeBitSet other = (ThreadSafeBitSet)obj;

    if(other.log2SegmentSize != log2SegmentSize)
        throw new IllegalArgumentException("Segment sizes must be the same");

    ThreadSafeBitSetSegments thisSegments = this.segments.get();
    ThreadSafeBitSetSegments otherSegments = other.segments.get();

    for(int i=0;i<thisSegments.numSegments();i++) {
        AtomicLongArray thisArray = thisSegments.getSegment(i);
        AtomicLongArray otherArray = (i < otherSegments.numSegments()) ? otherSegments.getSegment(i) : null;

        for(int j=0;j<thisArray.length();j++) {
            long thisLong = thisArray.get(j);
            long otherLong = (otherArray == null) ? 0 : otherArray.get(j);

            if(thisLong != otherLong)
                return false;
        }
    }

    for(int i=thisSegments.numSegments();i<otherSegments.numSegments();i++) {
        AtomicLongArray otherArray = otherSegments.getSegment(i);

        for(int j=0;j<otherArray.length();j++) {
            long l = otherArray.get(j);

            if(l != 0)
                return false;
        }
    }

    return true;
}

@Override
public int hashCode() {
    int result = log2SegmentSize;
    result = 31 * result + Arrays.hashCode(segments.get().segments);
    return result;
}

/**
    * @return a new BitSet with same bits set
    */
public BitSet toBitSet() {
    BitSet resultSet = new BitSet();
    int ordinal = this.nextSetBit(0);
    while(ordinal!=-1) {
        resultSet.set(ordinal);
        ordinal = this.nextSetBit(ordinal + 1);
    }
    return resultSet;
}

@Override
public String toString() {
    return toBitSet().toString();
}

HashCodes

HashCodes主要用于Hollow的在Set、Map等集合类型的Hash值计算。

hashcode

// 空歌白石:Hash时的干扰因子`0xeab524b9`。murmurhash_seed
private static final int MURMURHASH_SEED = 0xeab524b9;

public static int hashCode(ByteDataArray data) {
    return hashCode(data.getUnderlyingArray(), 0, (int) data.length());
}

public static int hashCode(final String data) {
    if(data == null)
        return -1;
    
    int arrayLen = calculateByteArrayLength(data);
    
    if(arrayLen == data.length()) {
        return hashCode(new ByteData() {
            @Override
            public byte get(long position) {
                return (byte)(data.charAt((int)position) & 0x7F);
            }
        }, 0, data.length());
    } else {
        byte[] array = createByteArrayFromString(data, arrayLen);

        return hashCode(array);
    }
}

public static int hashCode(byte[] data) {
    return hashCode(new ArrayByteData(data), 0, data.length);
}

private static int calculateByteArrayLength(String data) {
    int length = data.length();
    for(int i=0;i<data.length();i++) {
        if(data.charAt(i) > 0x7F)
            length += VarInt.sizeOfVInt(data.charAt(i)) - 1;
    }
    return length;
}

private static byte[] createByteArrayFromString(String data, int arrayLen) {
    byte array[] = new byte[arrayLen];

    int pos = 0;
    for(int i=0;i<data.length();i++) {
        pos = VarInt.writeVInt(array, pos, data.charAt(i));
    }
    return array;
}

/**
* MurmurHash3.  Adapted from:<p>
*
* https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java<p>
*
* On 11/19/2013 the license for this file read:<p>
*
*  The MurmurHash3 algorithm was created by Austin Appleby.  This java port was authored by
*  Yonik Seeley and is placed into the public domain.  The author hereby disclaims copyright
*  to this source code.
*  <p>
*  This produces exactly the same hash values as the final C++
*  version of MurmurHash3 and is thus suitable for producing the same hash values across
*  platforms.
*  <p>
*  The 32 bit x86 version of this hash should be the fastest variant for relatively short keys like ids.
*  <p>
*  Note - The x86 and x64 versions do _not_ produce the same results, as the
*  algorithms are optimized for their respective platforms.
*  <p>
*  See http://github.com/yonik/java_util for future updates to this file.
*
* @param data the data to hash
* @param offset the offset
* @param len the length
* @return the hash code
*/
public static int hashCode(ByteData data, long offset, int len) {

    final int c1 = 0xcc9e2d51;
    final int c2 = 0x1b873593;

    int h1 = MURMURHASH_SEED;
    long roundedEnd = offset + (len & 0xfffffffffffffffcL); // round down to
                                                            // 4 byte block

    for (long i = offset; i < roundedEnd; i += 4) {
        // little endian load order
        int k1 = (data.get(i) & 0xff) | ((data.get(i + 1) & 0xff) << 8) | ((data.get(i + 2) & 0xff) << 16) | (data.get(i + 3) << 24);
        k1 *= c1;
        k1 = (k1 << 15) | (k1 >>> 17); // ROTL32(k1,15);
        k1 *= c2;

        h1 ^= k1;
        h1 = (h1 << 13) | (h1 >>> 19); // ROTL32(h1,13);
        h1 = h1 * 5 + 0xe6546b64;
    }

    // tail
    int k1 = 0;

    switch (len & 0x03) {
    case 3:
        k1 = (data.get(roundedEnd + 2) & 0xff) << 16;
        // fallthrough
    case 2:
        k1 |= (data.get(roundedEnd + 1) & 0xff) << 8;
        // fallthrough
    case 1:
        k1 |= (data.get(roundedEnd) & 0xff);
        k1 *= c1;
        k1 = (k1 << 15) | (k1 >>> 17); // ROTL32(k1,15);
        k1 *= c2;
        h1 ^= k1;
    }

    // finalization
    h1 ^= len;

    // fmix(h1);
    h1 ^= h1 >>> 16;
    h1 *= 0x85ebca6b;
    h1 ^= h1 >>> 13;
    h1 *= 0xc2b2ae35;
    h1 ^= h1 >>> 16;

    return h1;
}

hash

public static int hashLong(long key) {
    key = (~key) + (key << 18);
    key ^= (key >>> 31);
    key *= 21;
    key ^= (key >>> 11);
    key += (key << 6);
    key ^= (key >>> 22);
    return (int) key;
}

public static int hashInt(int key) {
    key = ~key + (key << 15);
    key = key ^ (key >>> 12);
    key = key + (key << 2);
    key = key ^ (key >>> 4);
    key = key * 2057;
    key = key ^ (key >>> 16);
    return key;
}

hashTableSize

/**
* Determine size of hash table capable of storing the specified number of elements with a load
* factor applied.
*
* @param numElements number of elements to be stored in the table
* @return size of hash table, always a power of 2
* @throws IllegalArgumentException when numElements is negative or exceeds
*                                  {@link com.netflix.hollow.core.HollowConstants#HASH_TABLE_MAX_SIZE}
*/
public static int hashTableSize(int numElements) throws IllegalArgumentException {
    if (numElements < 0) {
        throw new IllegalArgumentException("cannot be negative; numElements="+numElements);
    } else if (numElements > HASH_TABLE_MAX_SIZE) {
        throw new IllegalArgumentException("exceeds maximum number of buckets; numElements="+numElements);
    }

    if (numElements == 0)
        return 1;
    if (numElements < 3)
        return numElements * 2;

    // Apply load factor to number of elements and determine next
    // largest power of 2 that fits in an int
    int sizeAfterLoadFactor = (int)((long)numElements * 10 / 7);
    int bits = 32 - Integer.numberOfLeadingZeros(sizeAfterLoadFactor - 1);
    return 1 << bits;
}

结束语

本文讲解了Hollow中对于Byte进行压缩的工作,其中VarInt算法和ZigZag算法是编码的一些基础,此算法在众多的压缩算法中都有使用。ByteDataArrayThreadSafeBitSetHashCodes则是充分利用了上述压缩算法的长处。

参考文献