BitMap Java 内部实现 - BitSet所谓的 Bit-map 就是用一个 bit 位来标记某个元素对应的 V

友情提示：阅读本文前请保证对位运算有一定了解

BitMap 算法

所谓的 Bit-map 就是用一个 bit 位来标记某个元素对应的 Value，而 Key 即是该元素。由于采用了 bit 为单位来存储数据，因此在存储空间方面，可以大大节省

例如 [1, 2, 5] 用 Bit-map 算法表示为二进制形式为 100110 ，注意观察二进制从右往左的下标（从 0 开始）与要表示的数字的对应关系。下标 0 对应数字 0，因为无数字 0，因此下标 0 对应二进制值为 0；下标 1 对应数字 1，有数字 1，下标 1 对应二进制值置为 1；同理，下标 2 和下标 5 对应二进制值为 1，下标 3 和下标 4 对应二进制值为 0

原理

Java 中最长的整数类型是 long ，拥有 64 位 bit，因此如果将上述示例的二进制值用 long 类型数值表示的话，只能表示 [0 - 63] 的整数

BitSet 底层使用 long[] 存储来表示比 64 位更长的数据类型，例如一个 BitSet 实际存储的数据是 [1, 2, 5, 9, 343, 234] ，对于 BitSet 的视图来说，他会把数组各个 long 类型的元素拆分成 bit ，也就是 [00..1, 00..10, 00..101, 00..1001, ...]

将一个数放入 BitSet ，可以类比 HashMap ，会先计算这个数对应的桶（即 long[] 的下标），然后取出其值，并更新放入数值对应的 bit 位的值为 1；同理，要判断一个数是否存储于 BitSet 中，只需先计算其对应的桶，取出其值，然后看这个数对应的二进制值是否为 1

set() 方法

public void set(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

    // 找到 bitIndex 对应的桶
    int wordIndex = wordIndex(bitIndex);
    expandTo(wordIndex);

    // 精髓代码，建议吃透，下面有示例
    // 将 bitIndex 对应二进制值置为 1
    words[wordIndex] |= (1L << bitIndex); // Restores invariants

    checkInvariants();
}

/**
 * Given a bit index, return word index containing it.
 */
private static int wordIndex(int bitIndex) {
    // ADDRESS_BITS_PER_WORD = 6
    // bitIndex >> 6 = bitIndex / 2^6 = bitIndex / 64
    return bitIndex >> ADDRESS_BITS_PER_WORD;
}

/**
 * Ensures that the BitSet can accommodate a given wordIndex,
 * temporarily violating the invariants.  The caller must
 * restore the invariants before returning to the user,
 * possibly using recalculateWordsInUse().
 * @param wordIndex the index to be accommodated.
 */
private void expandTo(int wordIndex) {
    int wordsRequired = wordIndex+1;
    if (wordsInUse < wordsRequired) {
        // 确保 BitSet 可以容纳足够的数，存不下则扩容
        // 新容量 = Math.max(2 * words.length, wordsRequired)
        ensureCapacity(wordsRequired);
        wordsInUse = wordsRequired;
    }
}

words[wordIndex] |= (1L << bitIndex) 可以说是 BitSet 的精髓所在，它将 bitIndex 对应的二进制值置为了 1，看下面的示例就明白了

System.out.println(1 << 0);
System.out.println(1 << 1);
System.out.println(1 << 2);
System.out.println(1 << 3);
System.out.println(1 << 4);

Out:
1
2
4
8
16

不太好理解，转换成二进制形式再看一下

System.out.println(Integer.toBinaryString(1 << 0));
System.out.println(Integer.toBinaryString(1 << 1));
System.out.println(Integer.toBinaryString(1 << 2));
System.out.println(Integer.toBinaryString(1 << 3));
System.out.println(Integer.toBinaryString(1 << 4));

Out:
1
10
100
1000
10000

完整的式子再看一下

// 假设 words[wordIndex] = bitSet
int bitSet = 0;
for (int element : new int[]{0, 1, 2, 3, 4}) {
    bitSet |= 1 << element;
    System.out.format("words[wordIndex] = %s%n",Integer.toBinaryString(bitSet));
}

Out:
words[wordIndex] = 1
words[wordIndex] = 11
words[wordIndex] = 111
words[wordIndex] = 1111
words[wordIndex] = 11111

是不是很神奇，[0, 1, 2, 3, 4] 经过 words[wordIndex] |= (1L << bitIndex) 运算后，得到了准确的二进制表示形式 11111

get() 方法

public boolean get(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

    checkInvariants();

    int wordIndex = wordIndex(bitIndex);
    return (wordIndex < wordsInUse)
        && ((words[wordIndex] & (1L << bitIndex)) != 0);
}

这个方法里重要的是 (words[wordIndex] & (1L << bitIndex) ，1L << bitIndex 得到的是一个按 BitMap 算法将对应二进制位置为 1 ，其余位为 0 的二进制数值，再与 bitIndex 对应桶的值相与，因为 1L << bitIndex 对应的二进制只有一个 1，所以如果 words[wordIndex] 对应二进制相同位置也为 1，则结果为 1，代表 bitIndex 这个数存在于 BitSet 中；反之，若 words[wordIndex] 对应二进制相同位置为 0，则结果为 0，代表 bitIndex 这个数不存在于 BitSet 中

参考资料：

$~~~~?~~~~$ Java的BitSet原理及应用

$~~~~?~~~~$ 简析Java中BitSet

$~~~~?~~~~$ BitMap原理