【记录】了解V8

254 阅读2分钟

🏊‍ deep dive into V8 Engine source code

V8是谷歌开源的用c++语言编写的高性能javascript 和 webAssembly引擎。

V8 Engine 既可以单独运行也可以集成到C++应用中。

V8 编译并运行javascript代码,给对象分配内存,并回收垃圾。V8’s stop-the-world, generational, accurate garbage collector is one of the keys to V8’s performance.

V8提供了ECMA标准中指定的所有数据类型,运算符,对象和函数。

V8能让任何C++应用给Javascript暴露对象或者函数.

v8 Torque 语言

V8 Torque是一种领域专用语言,类似于TypeScript的语法(可以简化编写和理解V8代码),它也可以与CodeStubAssembler内置程序和C++编写的macro进行交互操作。
CodeStubAssembler,简称CSA,是V8的一个组件,允许我们直接用C++编写低级别的TurboFan中间层,随后用TurboFan后端将其生成合理结构的机器码。

关于排序算法在v8中的实现

原来的v8中array.sort排序使用js实现的,且应用了快排和插入排序相结合的方式。

  • In-place QuickSort algorithm.
  • For short (length <= 22) arrays, insertion sort is used for efficiency.

但在最新版的v8源码中使用Torque(类似TypeScript)语言实现的一个叫TimSort的排序算法,其实就是归并排序和折半插入排序的混合算法。

BinaryInsertionSort is the best method for sorting small arrays:
it does few compares, but can do data movement quadratic in the number of elements.
This is an advantage since comparisons are more expensive due to calling into JS.
[low, high) is a contiguous range of a array, and is sorted via binary insertion. This sort is stable. On entry, must have low <= start <= high, and that [low, start) is already sorted. Pass start == low if you do not know!.

注意: 原来的sort源码在采用插入排序策略的是临界条件是length <= 22, 在最新的sort源码实现中,采用折半插入排序的临界条件是length <= 64 .

// Compute a good value for the minimum run length; natural runs shorter
// than this are boosted artificially via binary insertion sort.
//
// If n < 64, return n (it's too small to bother with fancy stuff).
// Else if n is an exact power of 2, return 32.
// Else return an int k, 32 <= k <= 64, such that n/k is close to, but
// strictly less than, an exact power of 2.
//
// See listsort.txt for more info.
macro ComputeMinRunLength(nArg: Smi): Smi {
  let n: Smi = nArg;
  let r: Smi = 0;  // Becomes 1 if any 1 bits are shifted off.

  assert(n >= 0);
  while (n >= 64) {
    r = r | (n & 1);
    n = n >> 1;
  }

  const minRunLength: Smi = n + r;
  assert(nArg < 64 || (32 <= minRunLength && minRunLength <= 64));
  return minRunLength;
}

排序算法实现的源码路径:
third_party/v8/builtins/array-sort.tq

indexOf 源码

Object* String::IndexOf(Isolate* isolate, Handle<Object> receiver,
                        Handle<Object> search, Handle<Object> position) {
  if (receiver->IsNullOrUndefined(isolate)) {
    THROW_NEW_ERROR_RETURN_FAILURE(
        isolate, NewTypeError(MessageTemplate::kCalledOnNullOrUndefined,
                              isolate->factory()->NewStringFromAsciiChecked(
                                  "String.prototype.indexOf")));
  }
  Handle<String> receiver_string;
  ASSIGN_RETURN_FAILURE_ON_EXCEPTION(isolate, receiver_string,
                                     Object::ToString(isolate, receiver));

  Handle<String> search_string;
  ASSIGN_RETURN_FAILURE_ON_EXCEPTION(isolate, search_string,
                                     Object::ToString(isolate, search));

  ASSIGN_RETURN_FAILURE_ON_EXCEPTION(isolate, position,
                                     Object::ToInteger(isolate, position));

  uint32_t index = receiver_string->ToValidIndex(*position);
  return Smi::FromInt(
      String::IndexOf(isolate, receiver_string, search_string, index));
}

确定字符编码对应的查找方法

int String::IndexOf(Isolate* isolate, Handle<String> receiver,
                    Handle<String> search, int start_index) {
  DCHECK_LE(0, start_index);
  DCHECK(start_index <= receiver->length());

  uint32_t search_length = search->length();
  if (search_length == 0) return start_index;

  uint32_t receiver_length = receiver->length();
  if (start_index + search_length > receiver_length) return -1;

  receiver = String::Flatten(isolate, receiver);
  search = String::Flatten(isolate, search);

  // 不开gc vectors保持有效
  DisallowHeapAllocation no_gc;  // ensure vectors stay valid
  // Extract flattened substrings of cons strings before getting encoding.
  // 获取扁平字串?
  String::FlatContent receiver_content = receiver->GetFlatContent();
  String::FlatContent search_content = search->GetFlatContent();

  // dispatch on type of strings
  // 根据字符串编码类型
  if (search_content.IsOneByte()) {
    Vector<const uint8_t> pat_vector = search_content.ToOneByteVector();
    return SearchString<const uint8_t>(isolate, receiver_content, pat_vector,
                                       start_index);
  }
  Vector<const uc16> pat_vector = search_content.ToUC16Vector();
  return SearchString<const uc16>(isolate, receiver_content, pat_vector,
                                  start_index);
}

我们进到 src/string-search.h 中来,

template <typename SubjectChar, typename PatternChar>
int SearchString(Isolate* isolate,
                 Vector<const SubjectChar> subject,
                 Vector<const PatternChar> pattern,
                 int start_index) {
  StringSearch<PatternChar, SubjectChar> search(isolate, pattern);
  return search.Search(subject, start_index);
}

里面定义了几种搜索算法

LinearSearch
BoyerMooreSearch
BoyerMooreHorspoolSearch
InitialSearch
SingleCharSearch

具体使用哪种,是由初始化StringSearch时定义的

StringSearch(Isolate* isolate, Vector<const PatternChar> pattern)
      : isolate_(isolate),
        pattern_(pattern),
        start_(Max(0, pattern.length() - kBMMaxShift)) {
    if (sizeof(PatternChar) > sizeof(SubjectChar)) {
      if (!IsOneByteString(pattern_)) {
        strategy_ = &FailSearch;
        return;
      }
    }
    int pattern_length = pattern_.length();
    if (pattern_length < kBMMinPatternLength) {
      if (pattern_length == 1) {
        strategy_ = &SingleCharSearch;
        return;
      }
      strategy_ = &LinearSearch;
      return;
    }
    strategy_ = &InitialSearch;
  }

  int Search(Vector<const SubjectChar> subject, int index) {
    return strategy_(this, subject, index);
  }

split 源码

split在v8中实现