JVM源码剖析之你不知道的HashCode

2023-12-13 04:50:31

?版本信息：

写在前面：

源码论证：

总结：

?版本信息：

jdk版本：jdk8u40

写在前面：

笔者看了很多关于HashCode文章，发现大家很多都是" 面试宝典?" 的内容照搬，这对于喜欢底层研究的程序员来说不是好事，所以在这种情况下促使我写下这篇关于hashCode的文档。

hashCode 作为一个native方法定义在Object类中，方法上给出了方法的注释和定义。

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by java.util.HashMap.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java? programming language.)
Returns:
a hash code value for this object.
See Also:
equals(Object), System.identityHashCode

这里直接给出精简的翻译：

此方法返回一个整数的hash码，此方法提供给HashMap类使用
此方法在同一对象上多次调用都返回同一个hash码
如果2个对象调用对象的equals方法返回true，那么他们的hashcode一定相同。相反，如果对象的hashcode相同，equals不一定返回true。
在hash算法中，正常情况会给每一个对象提供一个不同的整数hash值，通常是通过对象的内存地址来实现，但是Java中并没有使用内存地址来实现（看了很多博客，都是说通过内存地址来实现）
此方法可以被类重写，返回自己逻辑的hash码，但是一定要保证第3点，equals方法返回true，那么他们的hash码一定相同（这一点源码注释中并没有）

那么接下来我们就用hashCode的源码来论证源码中给出的注释是否正确，以及源码中一些" 有意思 "的点。

源码论证：

由于Object类中默认给出的hashCode方法是一个native方法，不少Java程序员望而却步，所以我们需要看到native层面的实现。

src/share/native/java/lang/Object.c 文件中定义了Object类中所有native方法的映射（正常情况下，源码中native层面源码文件名跟Java类名一致）

static JNINativeMethod methods[] = {
    {"hashCode",    "()I",                    (void *)&JVM_IHashCode},
    {"wait",        "(J)V",                   (void *)&JVM_MonitorWait},
    {"notify",      "()V",                    (void *)&JVM_MonitorNotify},
    {"notifyAll",   "()V",                    (void *)&JVM_MonitorNotifyAll},
    {"clone",       "()Ljava/lang/Object;",   (void *)&JVM_Clone},
};

接下来看到 src/share/vm/prims/jvm.cpp 文件中的JVM_IHashCode方法。

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
  JVMWrapper("JVM_IHashCode");
  // 如果是一个空指针的话就返回0
  // 否则调用FastHashCode方法获取到对象的hashcode
  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END

这里特别简单，也得到一个点，如果空对象空指针的情况下hashcode返回0。

我们知道，hash码是存放在对象头中，对象头还同时用于偏向锁、轻量级锁、重量级锁、GC状态，所以肯定会出现竞争的情况，那么在Hotspot中是如何定义这么多种状态以及如何处理竞争的呢，我们看到 src/share/vm/oops/markOop.hpp 文件中的定义。

//    [JavaThread* | epoch | age | 1 | 01]       lock is biased toward given thread
//    [0           | epoch | age | 1 | 01]       lock is anonymously biased
//
//  - the two lock bits are used to describe three states: locked/unlocked and monitor.
//
//    [ptr             | 00]  locked             ptr points to real header on stack
//    [header      | 0 | 01]  unlocked           regular object header
//    [ptr             | 10]  monitor            inflated lock (header is wapped out)
//    [ptr             | 11]  marked             used by markSweep to mark an object
//                                               not valid at any other time

接下来继续看到/src/share/vm/runtime/synchronizer.cpp 文件中FastHashCode方法。

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
  // 默认Hotspot8是开启偏向锁的
  if (UseBiasedLocking) {
    // 是否使用了偏向锁就需要去判断当前对象是否已经作为锁对象。
    // 也有可能当前是匿名偏向锁
    if (obj->mark()->has_bias_pattern()) {
      Handle hobj (Self, obj) ;  
      // 这里把偏向锁 重偏向到空（留出对象头给HashCode使用）
      BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
      obj = hobj() ;
    }
  }

  ObjectMonitor* monitor = NULL;
  markOop temp, test;
  intptr_t hash;
  markOop mark = ReadStableMark (obj);

  // 看末尾是不是为1，如果为1代表当前不是轻量级锁或者重量级锁，
  // 并且上面对偏向锁也做处理了，所以这里末尾为1暂时是安全的，不过处于多线程的考虑，需要用到cas保证
  if (mark->is_neutral()) {
    hash = mark->hash();              
    // 可能是第一次，所以需要判断
    // 在c/c++中非0即真
    if (hash) {                       
      return hash;
    }
    // 如果是第一次，那么就开辟一个hashcode
    hash = get_next_hash(Self, obj);  
    // 获取到最新包含hash的对象头
    temp = mark->copy_set_hash(hash); 
    // cas保证多线程的安全
    test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
    // 如果相等代表此次CAS成功了，直接返回
    // 相反，失败了代表可能其他线程成功了，可能是锁升级，可能是多个线程调用hashCode导致，不管什么情况都往下升级成重量级锁
    if (test == mark) {
      return hash;
    }
  } else if (mark->has_monitor()) {     // 是否有重量级锁。
    // 直接从monitor对象中获取到储存的hash值。
    monitor = mark->monitor();
    temp = monitor->header();
    hash = temp->hash();
    if (hash) {
      return hash;
    }
  } else if (Self->is_lock_owned((address)mark->locker())) {  // 是否有轻量级锁
    // 直接从轻量级锁的载体中获取到hash。
    temp = mark->displaced_mark_helper(); 
    hash = temp->hash();              
    if (hash) {                       
      return hash;
    }
  }

  // 走到这里，可能是极端情况，所以需要尝试去升级到重量级锁，也可能是其他线程已经在升级了，这里需要等待升级完毕。
  monitor = ObjectSynchronizer::inflate(Self, obj);
  mark = monitor->header();
  hash = mark->hash();
  // 这种情况是，当前对象都没有调用hashcode这个方法，就直接去升级锁了，而后续又调用hashcode，所以为0。
  if (hash == 0) {
    // 获取一个。
    hash = get_next_hash(Self, obj);
    // 获取到最新包含hash的对象头。
    temp = mark->copy_set_hash(hash); 
    test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
    if (test != mark) {
      hash = test->hash();
    }
  }
  return hash;
}

FastHashCode方法的篇幅很长，虽然笔者注释写的很详细，但是笔者并不打算在这做详细的介绍，并且上面的对象头的布局笔者也不打算做详细的介绍，因为大部分的操作都是为了处理锁机制，因为?synchronized关键字需要一个对象作为竞争的锁对象，所以也需要用到对象头存储某些内容。对于Hotspot中?synchronized 关键字的实现是非常非常复杂的，所以这里不做介绍～

而我们只需要把重心放在如何创建对象的hashCode，也即看到get_next_hash方法，此方法是创建hash码的核心算法。

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;

  // 这里默认是5，那就是走else。
  if (hashCode == 0) {  // 随机算法
     value = os::random() ;
  } else
  if (hashCode == 1) {  // 对象的内存地址做扰动算法
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {     // 永远为1
     value = 1 ;            
  } else
  if (hashCode == 3) {     // 自增
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {     // 直接返回对象的内存地址    
     value = cast_from_oop<intptr_t>(obj) ;
  } else {                 // 随机数 + 扰动算法
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ; // 本次的给下次用
     Self->_hashStateY = Self->_hashStateZ ; // 本次的给下次用
     Self->_hashStateZ = Self->_hashStateW ; // 本次的给下次用
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;  // 这里是再扰动。
     Self->_hashStateW = v ;                 // 本次的给下次用
     value = v ;
  }
  // 32位机器中保留25位的hash位
  // 64位机器中保留31位的hash位
  value &= markOopDesc::hash_mask;
  // 如果算出来为0，那么给一个默认值。
  if (value == 0) value = 0xBAD ;
  return value;
}

我们可以很清楚的看到这里有很长的if 、else if 代码块
而判断的是hashCode这个变量，此变量在Globals.hpp里面定义，默认值为5，所以开发者可以使用 JVM参数 -XX:hashCode 做控制，比如-XX:hashCode=2，，hash码永远为1。比如-XX:hashCode=4，直接返回对象的内存地址等等.......(这也是看源码才能得到的"有意思"的点)
默认值为5，也即随机算法 + 扰动函数（具体的算法细节不做讲解，一切都是为了生成尽量随机的hash码，这样的话，Java中使用hash算法的效率会更高～）

总结：

因为Java中hash码存放在对象头，而锁机制和GC模块也需要使用到对象头，所以Hotspot 开发者处理他们的并发竞争问题导致获取hash码的代码比较复杂，但是我们只需要把重点放在如何生成hash码上即可～

文章来源:https://blog.csdn.net/qq_43799161/article/details/134942841
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：veading@qq.com进行投诉反馈，一经查实，立即删除！