介绍 MurmurHash2 算法的特点,优势,缺点,应用场景,基本实现,MurmurHash2 是一种哈希算法,高效,非加密。
特点
- 非加密:
- 不适用于密码学。
- 设计目标不为 高碰撞性与安全性。
- 高性能:
- 使用位运算,混合函数实现,计算速度快。
- 分布均匀:
- 体现输入中各个位的影响,hash 值分布均匀。
算法思想
- 分块处理:
- 把输入按照每 32/64 位分为一组处理。
- 混合操作:
- 对每个分块进行混合计算,包括与常数相乘,移位异或,与当前结果混合。
- 最终混合:
- 将剩下的不足 4/8 字节的数组混合进结果,然后在对结果做混合,确保所有输入位都对结果有影响。
伪代码:
以 32 位版本的 MurmurHash2 为例,伪代码如下:
uint m(const void *key,int len){
uint seed = gen();
const uint m;
const int r;
uint h = len ^ seed;
const uchar *data = (const uchar *)key;
while(len>=4){
uint k = *(uint *)data;
// mix m r h
/*
....
*/
data += 4;
len -= 4;
}
// rest bytes
/*
swich(len)
*/
// final mix m h
/*
...
*/
return h;
}
redis 中的实现:
/* MurmurHash2, by Austin Appleby
* Note - This code makes a few assumptions about how your machine behaves -
* 1. We can read a 4-byte value from any address without crashing
* 2. sizeof(int) == 4
*
* And it has a few limitations -
*
* 1. It will not work incrementally. liyazzi: 也就是key需要一次性完整知道,不适合动态数据流,因为每一次的混合计算都依赖于前面的数据,然后最后的混合依赖于整个数据
* 2. It will not produce the same results on little-endian and big-endian 影响跨平台性
* machines.
*/
unsigned int dictGenHashFunction(const void *key, int len) {
/* 'm' and 'r' are mixing constants generated offline.
They're not really 'magic', they just happen to work well. */
uint32_t seed = dict_hash_function_seed;
const uint32_t m = 0x5bd1e995;
const int r = 24;
/* Initialize the hash to a 'random' value */
uint32_t h = seed ^ len;
/* Mix 4 bytes at a time into the hash */
const unsigned char *data = (const unsigned char *)key;
while(len >= 4) {
uint32_t k = *(uint32_t*)data;
k *= m;
k ^= k >> r;
k *= m;
h *= m;
h ^= k;
data += 4;
len -= 4;
}
/* Handle the last few bytes of the input array */
switch(len) {
case 3: h ^= data[2] << 16;
case 2: h ^= data[1] << 8;
case 1: h ^= data[0]; h *= m;
};
/* Do a few final mixes of the hash to ensure the last few
* bytes are well-incorporated. */
h ^= h >> 13;
h *= m;
h ^= h >> 15;
return (unsigned int)h;
}
哈希函数中大多混淆操作选用
^的原因:
- 异或操作能混合数据的各个位的特征,异或操作拥有自反性
- 消除偏置:避免简单的加法与乘法带来的累计偏置
- 性能高效:CPU级别运算,位运算通常速度较快
Redis中的使用
在 redis 中,MurmurHash2 被用作默认哈希函数实现,主要用于:
- 计算哈希表的 key 索引
- 计算一致性哈希中虚拟节点的位置
选择 MurmurHash2 的原因:
- 快速:适合 redis 高性能的需求
- 均匀:减少哈希冲突
- 简单:实现简单
缺点
MurmurHash2 在某些情况下可能会出现哈希碰撞率略高的问题。
Hash functions can be vulnerable to collision attacks, where a user can choose input data in such a way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and Daniel J. Bernstein were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called HashDoS attacks.[51] With the use of differential cryptanalysis, they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend using their own SipHash instead.