What is a HashMap?
A HashMap is a data structure that is able to map certain keys to certain values. keys and values could be anything including null value. Every key must be unique.
Data structure of HashMap (JDK 1.8 and above )
HashMap is internally built by Array and LinkedList. while the length of LinkedList is greater than 8 && the length of Array is greater or equal to 64, the linkedList will transform to RedBlack Tree. while the number of the tree nodes less than 6 , it will transform back to linkedList. it use open hashing to avoid hash collision.
search in a tree is O(logn)
why the Bucket (Array) Capacity(length) of a HashMap is multiple of 2 ?
**** (n - 1) & hash to define the index of the bucket that key is stored****
1. the ideal situation is every key in the position of the array or bucket is hashed perfectly, which means without collision.
2. because Hash value is an interger, most situations the length of HashMap is not very long(the higher bits will be 0s), under these situations, the most ideal approach is to use the lower bits to calculate the position.
3. So, The binary form of any interger power of 2 minus 1 is all 1s. eg: 16-1=15, binary form 1111; 32-1=31, bianry form is 1111. Therefore, we let the Hash value do the & operation with (length-1) in order to get the lower bits hash value which is the position of the array or bucket.
00100100 10100101 11000100 00100101 // Hash value
& 00000000 00000000 00000000 00001111 // 16 - 1 = 15
----------------------------------
00000000 00000000 00000000 00000101
// higher bits become all 0s, left the last 4 bits 高位全部归零,只保留末四位。
//because HashMap usually is not long, low bits is enough
补充: 每次扩容都是2倍,初始值是16, 也跟collision有关
How HashMap design the hash method
1. get the hashcode of the key, it is a 32-bit integer
2. then let the higher 16 bits to do the exclusive OR operation with the lower 16bits
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
The Algorithm of Hash method
In the above explaination, we know we want a hash function to avoid collision to get the best performance.
But continue with above explaination, we only get the hash value with the lower bits, but what if the lower bits are the same with others, and the higher bits diffrence, then we still have coliisions even though the keys that we pass in are difference.
// Hash collision:
H1: 00000000 00000000 00000000 00000101 & 1111 = 0101
H2: 00000000 11111111 00000000 00000101 & 1111 = 0101
To avoid this kind of problems, we need to disturb the hash value, which is to left shift the hash value 16bits, we use the higher bits to do the exclusive OR operation(^), in order to get a best hashed value.
00000000 00000000 00000000 00000101 // H1
00000000 00000000 00000000 00000000 // H1 >>> 16
00000000 00000000 00000000 00000101 // hash = H1 ^ (H1 >>> 16) = 5
00000000 11111111 00000000 00000101 // H2
00000000 00000000 00000000 11111111 // H2 >>> 16
00000000 00000000 00000000 11111010 // hash = H2 ^ (H2 >>> 16) = 250
finally:
// without hash collision
index1 = (n - 1) & H1 = (16 - 1) & 5 = 5
index2 = (n - 1) & H2 = (16 - 1) & 250 = 10
Put method
-
判断数组是否为空,为空进行初始化;
-
不为空,计算 k 的 hash 值,通过
(n - 1) & hash计算应当存放在数组中的下标 index; -
查看 table[index] 是否存在数据,没有数据就构造一个Node节点存放在 table[index] 中;
-
存在数据,说明发生了hash冲突(存在二个节点key的hash值一样), 继续判断key是否相等,相等,用新的value替换原数据(onlyIfAbsent为false);
-
如果不相等,判断当前节点类型是不是树型节点,如果是树型节点,创造树型节点插入红黑树中;(如果当前节点是树型节点证明当前已经是红黑树了)
-
如果不是树型节点,创建普通Node加入链表中;判断链表长度是否大于 8并且数组长度大于64, 大于的话链表转换为红黑树;
-
插入完成之后判断当前节点数是否大于阈值,如果大于开始扩容为原数组的二倍。
-
为什么加载因子的默认值是 0.75,并且不推荐我们修改 如果loadFactor太小,那么map中的table需要不断的扩容,扩容是个耗时的过程 如果loadFactor太大,那么map中table放满了也不不会扩容,导致冲突越来越多,解决冲突而起的链表越来越长,效率越来越低 而 0.75 这是一个折中的值,是一个比较理想的值
-
什么时候触发扩容,扩容之后的 table.length、阀值各是多少 当 size >= threshold 的时候进行扩容 扩容之后的 table.length = 旧 table.length * 2, 扩容之后的 threshold = 旧 threshold * 2
-
table 的初始化时机是什么时候 一般情况下,在第一次 put 的时候,调用 resize 方法进行 table 的初始化(懒初始化,懒加载思想在很多框架中都有应用!)
初始化的 table.length 是多少、阀值(threshold)是多少,实际能容下多少元素 默认情况下,table.length = 16; 指定了 initialCapacity 的情况放到问题 5 中分析 默认情况下,threshold = 12; 指定了 initialCapacity 的情况放到问题 5 中分析 默认情况下,能存放 12 个元素,当存放第 13 个元素后进行扩容
get方法
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
// 数组元素相等
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
// 桶中不止一个节点
if ((e = first.next) != null) {
// 在树中get
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
// 在链表中get
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
how hashmap resize ?
1)create a new entry emty array, the length of this array is 2 times of the old array.
2)traverse the old array, rehash every entry and put them to new array, because the size of the array change, we have to rehash.
进行扩容,会伴随着一次重新hash分配,并且会遍历hash表中所有的元素,是非常耗时的。在编写程序中,要尽量避免resize。
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
// 超过最大值就不再扩充了,就只好随你碰撞去吧
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
// 没超过最大值,就扩充为原来的2倍
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else {
// signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// 计算新的resize上限
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
// 把每个bucket都移动到新的buckets中
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else {
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
// 原索引
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// 原索引+oldCap
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
// 原索引放到bucket里
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
// 原索引+oldCap放到bucket里
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Resize in java 8
JAVA8在rehash算法利用了下面的一个特性:
HashMap的扩容使用的是2次幂的扩展(指长度扩为原来2倍),所以,元素的位置要么是在原位置,要么是在原位置再移动2次幂的位置。什么意思呢? 我们举个例子说明。
假设原数组长度 capacity 为 16,扩容之后 new capacity 为 32:
old capacity : 00010000
new capacity : 00100000
根据HashMap如何计算Entry在桶中的下标?一文中,
下标的计算方法,对于一个 Key,如原Hash值 key1 = 0001 1001 key2 = 0000 1001
扩容前 hash & (length - 1) :
key1 : 0001 1001 & 0000 1111 -> 0000 1001
key2 : 0000 1001 & 0000 1111 -> 0000 1001
扩容后 hash & (length - 1) :
key1 : 0001 1001 & 0001 1111 -> 0001 1001
key2 : 0000 1001 & 0001 1111 -> 0000 1001
因此,我们在扩充HashMap的时候,不需要像JDK1.7的实现那样重新计算hash,
**只需要看原来的hash值在扩容后新增的那一位是1还是0,如果是0的话索引没变,是1的话索引变成“原索引+oldCap” 。
作者:汪和呆喵
链接:www.jianshu.com/p/3797c6f83…
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。