hashcode的处理

为什么不直接使用hashCode()的返回值？

static final int hash(Object key) {  
    int h;  
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);  
}

因为有时候会出现hashcode低位大量一致，造成hsah冲突。

public class Main {  
  
    public static void main(String[] args) {  
        int hash1 = Float.valueOf(0.5f).hashCode();  
        int hash2 = Float.valueOf(0.25f).hashCode();  
        int hash3 = Float.valueOf(0.125f).hashCode();  
  
        System.out.println(Integer.toBinaryString(hash1));  
        System.out.println(Integer.toBinaryString(hash2));  
        System.out.println(Integer.toBinaryString(hash3));  
  
        System.out.println(hash1 & 15);  
        System.out.println(hash2 & 15);  
        System.out.println(hash3 & 15);  
    }  
}

out

111111000000000000000000000000
111110100000000000000000000000
111110000000000000000000000000
0
0
0

为什么低位相同会导致hash冲突（collide）？以下为HashMap#putVal方法的部分代码

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,  
               boolean evict) {  
    Node<K,V>[] tab; Node<K,V> p; int n, i;  
    if ((tab = table) == null || (n = tab.length) == 0)  
        n = (tab = resize()).length;  
    if ((p = tab[i = (n - 1) & hash]) == null)  
        tab[i] = newNode(hash, key, value, null);
    ...
}

可以发现HashMap在计算”槽位“的时候并不是使用取余运算。而是(n - 1) & hash。一方面位运算相比取余运算性能更好，另一方面是能使用这个表达式计算计算槽的位置，也是得益于HashMap的容量（capacity）总是2的n次幂（power of two）。

power of two

为什么要控制容量总是2的n次幂，便于计算槽的位置只是一方面，另一方面还能保证在map扩容后重新分配槽位时扩容前在不同槽位的元素不会互相干扰。

// 扩容前 size 16
int hash1 = 4;  // 100
int hash2 = 20;  // 10100
int size = 16; // size - 1 => 1111
Integer.toBinaryString(hash1 & (size - 1));  // 0100
Integer.toBinaryString(hash2 & (size - 1));  // 0100
// 扩容后 size 32
size = 32;  // size - 1 => 11111
Integer.toBinaryString(hash1 & (size - 1));  // 00100
Integer.toBinaryString(hash2 & (size - 1));  // 10100

扩容前size为n元素在m位扩容后size为2n元素只可能在m或m+n位。

参考文献

HashMap#hash注释文档

Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

HashMap#resize注释文档

Initializes or doubles table size. If null, allocates in accord with initial capacity target held in field threshold. Otherwise, because we are using power-of-two expansion, the elements from each bin must either stay at same index, or move with a power of two offset in the new table.

#Java #hash

HashMap 中的设计

hashcode的处理

power of two

参考文献