ART虚拟机 | 锁

4,383 阅读5分钟

本文基于Android 11(R)

Java中对临界区的锁定通常用synchronize代码块完成,因此标题中的“锁”实际上是对synchronize关键字的剖析。Synchronize代码块使用时必须传入一个对象,这个对象可以是this对象,可以是类对象(e.g. Foo.class),也可以是任何其他对象。因此我们可以说,锁的状态和对象关联。亦或者,每个对象天生都是一把锁。

Synchronize生成的字节码会对应两条指令,分别是monitor-entermonitor-exit。下面我们针对monitor_enter,分别从解释执行和机器码执行两个方向去寻找这个指令的最终实现。

解释执行

[art/runtime/interpreter/interpreter_switch_impl-inl.h]

HANDLER_ATTRIBUTES bool MONITOR_ENTER() {
  ...
  ObjPtr<mirror::Object> obj = GetVRegReference(A());
  if (UNLIKELY(obj == nullptr)) {
    ...
  } else {
    DoMonitorEnter<do_assignability_check>(self, &shadow_frame, obj);  <===调用
    ...
  }
}

[art/runtime/interpreter/interpreter_common.h]

static inline void DoMonitorEnter(Thread* self, ShadowFrame* frame, ObjPtr<mirror::Object> ref)
    NO_THREAD_SAFETY_ANALYSIS
    REQUIRES(!Roles::uninterruptible_) {
  ...
  StackHandleScope<1> hs(self);
  Handle<mirror::Object> h_ref(hs.NewHandle(ref));  <===调用
  h_ref->MonitorEnter(self);
  ...
}

[art/runtime/mirror/object-inl.h]

inline ObjPtr<mirror::Object> Object::MonitorEnter(Thread* self) {
  return Monitor::MonitorEnter(self, this, /*trylock=*/false);  <===调用
}

解释执行会使用switch-case方式分别解析每一条指令,由上述代码可知,monitor-enter指令最终会调用Monitor::MonitorEnter静态函数。

机器码执行

[art/runtime/arch/arm64/quick_entrypoints_arm64.S]

ENTRY art_quick_lock_object_no_inline
    // This is also the slow path for art_quick_lock_object.
    SETUP_SAVE_REFS_ONLY_FRAME        // save callee saves in case we block
    mov    x1, xSELF                  // pass Thread::Current
    bl     artLockObjectFromCode      // (Object* obj, Thread*)  <===调用
    ...
END art_quick_lock_object_no_inline

[art/runtime/entrypoints/quick/quick_lock_entrypoints.cc]

extern "C" int artLockObjectFromCode(mirror::Object* obj, Thread* self){
  ...
  if (UNLIKELY(obj == nullptr)) {
    ...
  } else {
    ObjPtr<mirror::Object> object = obj->MonitorEnter(self);  // May block <===调用
    ...
  }
}

[art/runtime/mirror/object-inl.h]

inline ObjPtr<mirror::Object> Object::MonitorEnter(Thread* self) {
  return Monitor::MonitorEnter(self, this, /*trylock=*/false);  <===调用
}

殊途同归,机器码执行时最终也会调用Monitor::MonitorEnter

锁的两种形态

虚拟机中将锁实现为两种形态,一种称为Thin Lock,另一种称为Fat Lock。

Thin Lock用于竞争较弱的场景。在竞争发生时,采用自旋(spin)和让渡CPU(yield)的方式等待锁,而不是进行系统调用和上下文切换。当持有锁的线程很快完成操作时,短暂的自旋会比上下文切换开销更小。

可是如果自旋一段时间发现还无法获取到锁时,Thin Lock就会膨胀为Fat Lock,一方面增加数据结构存储与锁相关的具体信息,另一方面通过系统调用挂起线程。

总结一下,Fat Lock功能健全,但开销较大。而Thin Lock开销虽小,但无法用于长时间等待的情况。所以实际的做法是先使用Thin Lock,当功能无法满足时再膨胀为Fat Lock。

文章开头提到,每个对象天生都是一把锁。那么这个锁的信息到底存在对象的什么位置呢?

答案是存在art::mirror::Object的对象头中(详见ART虚拟机 | Java对象和类的内存结构)。对象头中有一个4字节长的字段monitor_,其中便存储了锁相关的信息。

monitor字段.png

4字节共32bits,高位的两个bits用于标记状态。不同的状态,存储的信息含义也不同。两个bits共4种状态,分别为ThinOrUnlock(Thin/Unlock共用一个状态),Fat,Hash和ForwardingAddress。ThinOrUnlock和Fat表示锁的状态,Hash是为对象生成HashMap中所用的哈希值,ForwardingAddress是GC时使用的状态。

上图中的m表示mark bit state,r表示read barrier state,都是配合GC使用的标志,在讨论锁的时候可以不关心。

当我们对一个空闲对象进行monitor-enter操作时,锁的状态由Unlock切换到Thin。代码如下。

switch (lock_word.GetState()) {
  case LockWord::kUnlocked: {
    // No ordering required for preceding lockword read, since we retest.
    LockWord thin_locked(LockWord::FromThinLockId(thread_id, 0, lock_word.GCState()));
    if (h_obj->CasLockWord(lock_word, thin_locked, CASMode::kWeak, std::memory_order_acquire)) {
      ...
      return h_obj.Get();  // Success!
    }
    continue;  // Go again.
  }

LockWord对象的大小就是4字节,所以可以将它等同于art::mirror::Objectmonitor_字段,只不过它内部实现了很多方法可以灵活操作4字节中的信息。锁状态切换时,将当前线程的thread id(thread id并非tid,对每个进程而言它都从1开始)存入monitor_字段,与GC相关的mr标志保持不变。

当对象被线程锁定后,假设我们在同线程内对该它再次进行monitor-enter操作,那么就会发生Thin Lock的重入。如果在不同线程对该对象进行monitor-enter操作,那么就会发生Thin Lock的竞争。代码和流程图如下。

case LockWord::kThinLocked: {
  uint32_t owner_thread_id = lock_word.ThinLockOwner();
  if (owner_thread_id == thread_id) {
    uint32_t new_count = lock_word.ThinLockCount() + 1;
    if (LIKELY(new_count <= LockWord::kThinLockMaxCount)) {
      LockWord thin_locked(LockWord::FromThinLockId(thread_id,
                                                    new_count,
                                                    lock_word.GCState()));
      if (h_obj->CasLockWord(lock_word,
                             thin_locked,
                             CASMode::kWeak,
                             std::memory_order_relaxed)) {
        AtraceMonitorLock(self, h_obj.Get(), /* is_wait= */ false);
        return h_obj.Get();  // Success!
      }
      continue;  // Go again.
    } else {
      // We'd overflow the recursion count, so inflate the monitor.
      InflateThinLocked(self, h_obj, lock_word, 0);
    }
  } else {
    // Contention.
    contention_count++;
    Runtime* runtime = Runtime::Current();
    if (contention_count <= runtime->GetMaxSpinsBeforeThinLockInflation()) {
      sched_yield();
    } else {
      contention_count = 0;
      // No ordering required for initial lockword read. Install rereads it anyway.
      InflateThinLocked(self, h_obj, lock_word, 0);
    }
  }
  continue;  // Start from the beginning.
}

ThinLock.png

在ThinLock膨胀为FatLock前,需要执行50次sched_yieldsched_yield会将当前线程放到CPU调度队列的末尾,这样既不用挂起线程,也不用一直占着CPU。不过android master分支已经将这个流程再度优化了,在50次sched_yield之前,再执行100次自旋操作。和sched_yield相比,自旋不会释放CPU。由于单次sched_yield耗时也有微秒,对于锁持有时间极短的情况,用自旋更省时间。

接下来介绍锁的膨胀过程。

void Monitor::InflateThinLocked(Thread* self, Handle<mirror::Object> obj, LockWord lock_word,
                                uint32_t hash_code) {
  DCHECK_EQ(lock_word.GetState(), LockWord::kThinLocked);
  uint32_t owner_thread_id = lock_word.ThinLockOwner();
  if (owner_thread_id == self->GetThreadId()) {
    // We own the monitor, we can easily inflate it.
    Inflate(self, self, obj.Get(), hash_code);
  } else {
    ThreadList* thread_list = Runtime::Current()->GetThreadList();
    // Suspend the owner, inflate. First change to blocked and give up mutator_lock_.
    self->SetMonitorEnterObject(obj.Get());
    bool timed_out;
    Thread* owner;
    {
      ScopedThreadSuspension sts(self, kWaitingForLockInflation);
      owner = thread_list->SuspendThreadByThreadId(owner_thread_id,
                                                   SuspendReason::kInternal,
                                                   &timed_out);
    }
    if (owner != nullptr) {
      // We succeeded in suspending the thread, check the lock's status didn't change.
      lock_word = obj->GetLockWord(true);
      if (lock_word.GetState() == LockWord::kThinLocked &&
          lock_word.ThinLockOwner() == owner_thread_id) {
        // Go ahead and inflate the lock.
        Inflate(self, owner, obj.Get(), hash_code);
      }
      bool resumed = thread_list->Resume(owner, SuspendReason::kInternal);
      DCHECK(resumed);
    }
    self->SetMonitorEnterObject(nullptr);
  }
}
void Monitor::Inflate(Thread* self, Thread* owner, ObjPtr<mirror::Object> obj, int32_t hash_code) {
  DCHECK(self != nullptr);
  DCHECK(obj != nullptr);
  // Allocate and acquire a new monitor.
  Monitor* m = MonitorPool::CreateMonitor(self, owner, obj, hash_code);
  DCHECK(m != nullptr);
  if (m->Install(self)) {
    if (owner != nullptr) {
      VLOG(monitor) << "monitor: thread" << owner->GetThreadId()
          << " created monitor " << m << " for object " << obj;
    } else {
      VLOG(monitor) << "monitor: Inflate with hashcode " << hash_code
          << " created monitor " << m << " for object " << obj;
    }
    Runtime::Current()->GetMonitorList()->Add(m);
    CHECK_EQ(obj->GetLockWord(true).GetState(), LockWord::kFatLocked);
  } else {
    MonitorPool::ReleaseMonitor(self, m);
  }
}

膨胀(Inflate)的具体操作比较简单,简言之就是创建一个Monitor对象,存储更多的信息,然后将Monitor Id放入原先的monitor_字段中。

关键的地方在于膨胀的充分条件。如果Thin Lock本来就由本线程持有,那么膨胀不需要经过任何人同意,可以直接进行。但如果该Thin Lock由其他线程持有,那么膨胀之前必须先暂停(这里的暂停并不是指将线程从CPU上调度出去,而是不允许它进入Java世界改变锁状态)持有线程,防止膨胀过程中对锁信息的更新存在竞争。膨胀之后,持有线程恢复运行,此时它看到的Lock已经变成了Fat Lock。

当锁膨胀为Fat Lock后,由于持有锁的动作并未完成,所以该线程会再次尝试。只不过这次走的是Fat Lock分支,执行如下代码。

case LockWord::kFatLocked: {
  // We should have done an acquire read of the lockword initially, to ensure
  // visibility of the monitor data structure. Use an explicit fence instead.
  std::atomic_thread_fence(std::memory_order_acquire);
  Monitor* mon = lock_word.FatLockMonitor();
  if (trylock) {
    return mon->TryLock(self) ? h_obj.Get() : nullptr;
  } else {
    mon->Lock(self);
    DCHECK(mon->monitor_lock_.IsExclusiveHeld(self));
    return h_obj.Get();  // Success!
  }
}
{
  ScopedThreadSuspension tsc(self, kBlocked);  // Change to blocked and give up mutator_lock_.

  // Acquire monitor_lock_ without mutator_lock_, expecting to block this time.
  // We already tried spinning above. The shutdown procedure currently assumes we stop
  // touching monitors shortly after we suspend, so don't spin again here.
  monitor_lock_.ExclusiveLock(self);
}

上述代码的ScopedThreadSuspension对象用于完成线程状态的切换,之所以叫scoped,是因为它是通过构造和析构函数完成状态切换和恢复的。在作用域内的局部变量会随着作用域的结束而自动析构,因此花括号结束,线程状态也就由Blocked切换回Runnable了。

最终调用monitor_lock_(Mutex对象)的ExclusiveLock方法。

void Mutex::ExclusiveLock(Thread* self) {
  if (!recursive_ || !IsExclusiveHeld(self)) {
#if ART_USE_FUTEXES
    bool done = false;
    do {
      int32_t cur_state = state_and_contenders_.load(std::memory_order_relaxed);
      if (LIKELY((cur_state & kHeldMask) == 0) /* lock not held */) {
        done = state_and_contenders_.CompareAndSetWeakAcquire(cur_state, cur_state | kHeldMask);
      } else {
        ...
        if (!WaitBrieflyFor(&state_and_contenders_, self,
                            [](int32_t v) { return (v & kHeldMask) == 0; })) {
          // Increment contender count. We can't create enough threads for this to overflow.
          increment_contenders();
          // Make cur_state again reflect the expected value of state_and_contenders.
          cur_state += kContenderIncrement;
          if (UNLIKELY(should_respond_to_empty_checkpoint_request_)) {
            self->CheckEmptyCheckpointFromMutex();
          }
          do {
            if (futex(state_and_contenders_.Address(), FUTEX_WAIT_PRIVATE, cur_state,
                      nullptr, nullptr, 0) != 0) {
            ...
            cur_state = state_and_contenders_.load(std::memory_order_relaxed);
          } while ((cur_state & kHeldMask) != 0);
          decrement_contenders();
        }
      }
    } while (!done);
    ...
    exclusive_owner_.store(SafeGetTid(self), std::memory_order_relaxed);
    RegisterAsLocked(self);
  }
  recursion_count_++;
  ...
}

Mutex::ExclusiveLock最终通过futex系统调用陷入内核态,在内核态中将当前线程从CPU中调度出去,实现挂起。值得注意的是,FatLock中依然有spin和yield的操作(WaitBrieflyFor函数),这是因为Thin Lock一旦膨胀为Fat Lock就很难deflate回去,而后续对Fat Lock的使用依然会碰到短时持有锁的情况,这也意味先前的优化此处依然可用。

上面这一块代码算是锁的核心实现,被调用的次数也非常多,因此任何一点微小的优化都很重要。我之前写过一篇文章调试经验 | C++ memory order和一个相关的稳定性问题详细分析了一个由memory order使用错误导致的线程卡死的问题,其中还介绍了C++的memory order,它也正是Java volatile关键字的(ART)底层实现。

此外我还给谷歌提过ExclusiveLock的bug,这个bug既会消耗battery,也会在某些情况下导致系统整体卡死。下面是谷歌的具体回复,感兴趣的可以查看修复

Hans Reply.png