Java中hashCode原理

44 阅读7分钟

前言

哈希算法又称摘要算法,核心在于有一个 hash 函数,将输入映射到一个固定位数的值,这个值就叫哈希值或哈希码(HashCode)。哈希的目的在于将原本复杂、不规则的数据转化为简洁的、固定长度的值,使得数据的存储和检索更加高效。广泛应用于数据存储(如哈希表)、数据完整性验证(如 MD5 、 SHA )、密码学(加密)以及区块链等领域。


下面我们就顺着源码,一步两步(一步一步似爪牙)探索是怎么实现的吧!

一、Java 中 hashCode 是什么?

Java 中的 hashCode ,是 Object 下的一个 native 方法

    /**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@link
     *     equals(Object) equals} method, then calling the {@code
     *     hashCode} method on each of the two objects must produce the
     *     same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link equals(Object) equals} method, then
     *     calling the {@code hashCode} method on each of the two objects
     *     must produce distinct integer results.  However, the programmer
     *     should be aware that producing distinct integer results for
     *     unequal objects may improve the performance of hash tables.
     * </ul>
     *
     * @implSpec
     * As far as is reasonably practical, the {@code hashCode} method defined
     * by class {@code Object} returns distinct integers for distinct objects.
     *
     * @return  a hash code value for this object.
     * @see     java.lang.Object#equals(java.lang.Object)
     * @see     java.lang.System#identityHashCode
     */
public native int hashCode();

根据注释可知

  1. 该方法主要用于支持哈希表,如 HashMap
  2. 对于同一个对象如果中用于比较的字段未被修改,那么多次调用 hashCode() ,应返回相同的哈希值。
  3. 如果两个对象根据 equals() 相等,那么哈希值也应该相等
  4. 如果两个对象根据 equals() 不相等,则不做要求

二、native 具体实现

1.1首先下载源码

jdk 8 源码 hg.openjdk.org/jdk8u/jdk8u… hotspot 8源码 hg.openjdk.org/jdk8u/jdk8u… jdk17及以上版本 github.com/openjdk/jdk

2.1、入口点

OpenJDK的 src/share/vm/prims/jvm.h
/*************************************************************************
 PART 1: Functions for Native Libraries
 ************************************************************************/
/*
 * java.lang.Object
 */
JNIEXPORT jint JNICALL
JVM_IHashCode(JNIEnv *env, jobject obj);

2.2、JVM_ENTRY

hotspot 中的 \src\share\vm\prims\jvm.cpp
 
// java.lang.Object ///////////////////////////////////////////////
JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
  JVMWrapper("JVM_IHashCode");
  // as implemented in the classic virtual machine; return 0 if object is NULL
  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END

2.3、FastHashCode

hotspot 中的 src\share\vm\runtime\synchronizer.cpp FastHashCode 方法有点长核心在于下面这行,如果对象头里没有就尝试生成

hash = get_next_hash(Self, obj);

源码如下

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
  if (UseBiasedLocking) {
    // NOTE: many places throughout the JVM do not expect a safepoint
    // to be taken here, in particular most operations on perm gen
    // objects. However, we only ever bias Java instances and all of
    // the call sites of identity_hash that might revoke biases have
    // been checked to make sure they can handle a safepoint. The
    // added check of the bias pattern is to avoid useless calls to
    // thread-local storage.
    if (obj->mark()->has_bias_pattern()) {
      // Box and unbox the raw reference just in case we cause a STW safepoint.
      Handle hobj (Self, obj) ;
      // Relaxing assertion for bug 6320749.
      assert (Universe::verify_in_progress() ||
              !SafepointSynchronize::is_at_safepoint(),
             "biases should not be seen by VM thread here");
      BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
      obj = hobj() ;
      assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
    }
  }

  // hashCode() is a heap mutator ...
  // Relaxing assertion for bug 6320749.
  assert (Universe::verify_in_progress() ||
          !SafepointSynchronize::is_at_safepoint(), "invariant") ;
  assert (Universe::verify_in_progress() ||
          Self->is_Java_thread() , "invariant") ;
  assert (Universe::verify_in_progress() ||
         ((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant") ;

  ObjectMonitor* monitor = NULL;
  markOop temp, test;
  intptr_t hash;
  markOop mark = ReadStableMark (obj);

  // object should remain ineligible for biased locking
  assert (!mark->has_bias_pattern(), "invariant") ;

  if (mark->is_neutral()) {
    hash = mark->hash();              // this is a normal header
    if (hash) {                       // if it has hash, just return it
      return hash;
    }
    hash = get_next_hash(Self, obj);  // allocate a new hash code
    temp = mark->copy_set_hash(hash); // merge the hash code into header
    // use (machine word version) atomic operation to install the hash
    test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
    if (test == mark) {
      return hash;
    }
    // If atomic operation failed, we must inflate the header
    // into heavy weight monitor. We could add more code here
    // for fast path, but it does not worth the complexity.
  } else if (mark->has_monitor()) {
    monitor = mark->monitor();
    temp = monitor->header();
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();
    if (hash) {
      return hash;
    }
    // Skip to the following code to reduce code size
  } else if (Self->is_lock_owned((address)mark->locker())) {
    temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();              // by current thread, check if the displaced
    if (hash) {                       // header contains hash code
      return hash;
    }
    // WARNING:
    //   The displaced header is strictly immutable.
    // It can NOT be changed in ANY cases. So we have
    // to inflate the header into heavyweight monitor
    // even the current thread owns the lock. The reason
    // is the BasicLock (stack slot) will be asynchronously
    // read by other threads during the inflate() function.
    // Any change to stack may not propagate to other threads
    // correctly.
  }

  // Inflate the monitor to set hash code
  monitor = ObjectSynchronizer::inflate(Self, obj, inflate_cause_hash_code);
  // Load displaced header and check it has hash code
  mark = monitor->header();
  assert (mark->is_neutral(), "invariant") ;
  hash = mark->hash();
  if (hash == 0) {
    hash = get_next_hash(Self, obj);
    temp = mark->copy_set_hash(hash); // merge hash code into header
    assert (temp->is_neutral(), "invariant") ;
    test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
    if (test != mark) {
      // The only update to the header in the monitor (outside GC)
      // is install the hash code. If someone add new usage of
      // displaced header, please update this code
      hash = test->hash();
      assert (test->is_neutral(), "invariant") ;
      assert (hash != 0, "Trivial unexpected object/monitor header usage.");
    }
  }
  // We finally get the hash
  return hash;
}

2.4、get_next_hash

hotspot 中的 src\share\vm\runtime\synchronizer.cpp 恭喜看官,这里就是真正的hash逻辑啦,支持六种模式 0. 随机数 1. 内存地址做移位再和一个随机数做异或 2. 固定值1 3. 自增序列的当前值 4. 内存地址 5. 当前线程有关的一个随机数+三个确定值,运用xorshift随机数算法得到的一个随机数

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = cast_from_oop<intptr_t>(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

线程状态的定义 hotspot \src\share\vm\runtime\thread.cpp

  // thread-specific hashCode stream generator state - Marsaglia shift-xor form
  _hashStateX = os::random() ;
  _hashStateY = 842502087 ;
  _hashStateZ = 0x8767 ;    // (int)(3579807591LL & 0xffff) ;
  _hashStateW = 273326509 ;

2.5、那么,默认是哪种模式呢

hotspot src\share\vm\runtime\globals.hpp

  product(intx, hashCode, 5,                                                
          "(Unstable) select hashCode generation algorithm")  

可以看到默认是5,第六种策略即xorshift基于异或和位移的随机数生成算法,当然也可以通过在启动参数中添加-XX:hashCode=4,改变默认hashCode计算策略 ps:源码都是基于jdk1.8的,jdk-17,以及最新的jdk-24 默认也是此策略

三、String hashcode() 实现

    public static int hashCode(byte[] value) {
        int h = 0;
        for (byte v : value) {
            h = 31 * h + (v & 0xff);
        }
        return h;
    }

31 * h 进行散列 v & 0xff ,将v转换为一个正值 (然而调用 String hashCode() 会返回负值么?评论区见) 最后追加 31 * h + (v & 0xff) 特点就是快速高效,内容相同的字符串,产生的哈希值相同

四、总结

1、Java 中默认 hashcode() 实现是基于 Xorshift 的伪随机数生成算法,受到当前线程有关的一个随机数+三个确定值影响。内容相同的对象,每次new对象,哈希值都不同 2、String 类重写了 hashcode(),对于内容相同字符串,会产出同样的哈希值 3、如果对象需要存到哈希表中,如 HashMap、HashSet,一定要注意重写 equals()和 hashCode() 方法

五、参考

  1. How does the default hashCode() work?
  2. java默认的hashcode方法到底得到的是什么?
  3. 开发中常见的一些Hash函数(一)
  4. blog.csdn.net/little_boy_…