InnoDB索引同层非叶子节点间,也是双向链表吗?

440 阅读4分钟

我们在看索引原理时,可以了解到InnoDB的索引实现是通过B+树实现的。
针对主键索引,非叶子结点存储主键字段,叶子节点存储真正的行数据。
针对非主键索引,如联合索引,非叶子结点存储索引字段,叶子节点存储主键ID,如果需要查询索引列及主键列之外的字段,还需要根据主键ID回表在主键索引树中找到叶子节点。 我们还知道,为了便于范围遍历,索引的叶子节点间用双向链表形式连接,叶子节点内部用单向链表连接。

看到这里,我萌生了一个疑问,

索引同层的非叶子结点间,是否也用双线链表维系呢?

在很多讲解配图中,同层的非叶子结点间,没有建立直接联系 image.png

猜测

首先自己提出猜测,作为一个树形结构,每个节点的结构体或属性应该是一样的,我尝试用伪代码描述一下我猜测的结构体形式。

class TreeNode {
    int index; //索引列实际值
    List<TreeNode> subNodeList; // 子节点
    TreeNode pre, next; // 前驱,后续节点
    Row row; // 行数据,主键索引为行数据,非主键索引为主键ID
}

下载MySQL源码
访问dev.mysql.com/downloads/m… ,选择下载源码 image.png

在storage/innobase/btr这个索引目录中,找到btr0btr.cc,找到创建索引的过程,略读代码发现,前驱、后置指针的建立,没有叶子节点、非叶子结点的判断,因此我们的猜测成立,InnoDB B+树索引的每一层节点间,都通过前后指针组成双向链表相互连接

ulint btr_create(ulint type, space_id_t space, space_index_t index_id,
                 dict_index_t *index, mtr_t *mtr) {
  page_no_t page_no;
  buf_block_t *block;
  buf_frame_t *frame;
  page_t *page;
  page_zip_des_t *page_zip;

  ut_ad(index_id != BTR_FREED_INDEX_ID);

  /* Create the two new segments (one, in the case of an ibuf tree) for
  the index tree; the segment headers are put on the allocated root page
  (for an ibuf tree, not in the root, but on a separate ibuf header
  page) */

  if (type & DICT_IBUF) {
    /* Allocate first the ibuf header page */
    buf_block_t *ibuf_hdr_block =
        fseg_create(space, 0, IBUF_HEADER + IBUF_TREE_SEG_HEADER, mtr);

    if (ibuf_hdr_block == nullptr) {
      return (FIL_NULL);
    }

    buf_block_dbg_add_level(ibuf_hdr_block, SYNC_IBUF_TREE_NODE_NEW);

    ut_ad(ibuf_hdr_block->page.id.page_no() == IBUF_HEADER_PAGE_NO);
    /* Allocate then the next page to the segment: it will be the
    tree root page */

    block = fseg_alloc_free_page(buf_block_get_frame(ibuf_hdr_block) +
                                     IBUF_HEADER + IBUF_TREE_SEG_HEADER,
                                 IBUF_TREE_ROOT_PAGE_NO, FSP_UP, mtr);
    ut_ad(block->page.id.page_no() == IBUF_TREE_ROOT_PAGE_NO);
  } else {
    block = fseg_create(space, 0, PAGE_HEADER + PAGE_BTR_SEG_TOP, mtr);
  }

  if (block == nullptr) {
    return (FIL_NULL);
  }

  page_no = block->page.id.page_no();
  frame = buf_block_get_frame(block);

  if (type & DICT_IBUF) {
    /* It is an insert buffer tree: initialize the free list */
    buf_block_dbg_add_level(block, SYNC_IBUF_TREE_NODE_NEW);

    ut_ad(page_no == IBUF_TREE_ROOT_PAGE_NO);

    flst_init(frame + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST, mtr);
  } else {
    /* It is a non-ibuf tree: create a file segment for leaf
    pages */
    buf_block_dbg_add_level(block, SYNC_TREE_NODE_NEW);

    if (!fseg_create(space, page_no, PAGE_HEADER + PAGE_BTR_SEG_LEAF, mtr)) {
      /* Not enough space for new segment, free root
      segment before return. */
      btr_free_root(block, mtr);
      if (!index->table->is_temporary()) {
        btr_free_root_invalidate(block, mtr);
      }

      return (FIL_NULL);
    }

    /* The fseg create acquires a second latch on the page,
    therefore we must declare it: */
    buf_block_dbg_add_level(block, SYNC_TREE_NODE_NEW);
  }

  uint16_t page_create_type;
  if (dict_index_is_spatial(index)) {
    page_create_type = FIL_PAGE_RTREE;
  } else if (dict_index_is_sdi(index)) {
    page_create_type = FIL_PAGE_SDI;
  } else {
    page_create_type = FIL_PAGE_INDEX;
  }

  /* Create a new index page on the allocated segment page */
  page_zip = buf_block_get_page_zip(block);

  if (page_zip) {
    page = page_create_zip(block, index, 0, 0, mtr, page_create_type);
  } else {
    page = page_create(block, mtr, dict_table_is_comp(index->table),
                       page_create_type);
    /* Set the level of the new index page */
    btr_page_set_level(page, nullptr, 0, mtr);
  }

  /* Set the index id of the page */
  btr_page_set_index_id(page, page_zip, index_id, mtr);

  /* Set the next node and previous node fields */
  /* 我们要找的在这 */
  btr_page_set_next(page, page_zip, FIL_NULL, mtr);
  btr_page_set_prev(page, page_zip, FIL_NULL, mtr);

  /* We reset the free bits for the page to allow creation of several
  trees in the same mtr, otherwise the latch on a bitmap page would
  prevent it because of the latching order.

  Note: Insert Buffering is disabled for temporary tables given that
  most temporary tables are smaller in size and short-lived. */
  if (!(type & DICT_CLUSTERED) && !index->table->is_temporary()) {
    ibuf_reset_free_bits(block);
  }

  /* In the following assertion we test that two records of maximum
  allowed size fit on the root page: this fact is needed to ensure
  correctness of split algorithms */

  ut_ad(page_get_max_insert_size(page, 2) > 2 * BTR_PAGE_MAX_REC_SIZE);

  buf_stat_per_index->inc(index_id_t(space, index_id));

  return (page_no);
}

这里又来了另一个问题

非叶子结点各层的双向链表,连起来起了什么作用呢?

MySQL的执行过程

客户端连接MySQL,执行一条SQL语句,大致过程为:

  • 连接管理
  • 解析与优化
    • 查询缓存(表结构或表数据一旦变化,需要重新构建缓存)
    • 语法解析,分析对那个表做什么操作
    • 查询优化,可通过EXPLAIN分析执行计划
  • 存储引擎
    • InnoDB、MyIASM等

image.png

这里还是做一个推测,因为EXPLAIN分析执行计划,有可能没有用到你原本设定的索引,反而走了全表扫描,因为MySQL认定走索引的效率和走全表扫描接近或全表扫描效率更高。
MySQL会结合IO成本、CPU成本,判断最终是走索引还是走全表扫描
MySQL如何快速做这种判断呢,推测可能是通过非叶子结点的双向链表,做初步估计,大致会扫描多少字段。

个人能力有限,MySQL源码完全啃不动,看完知乎大佬的解答,恍然大悟 image.png

参考:

小册《MySQL是怎样运行的:从根儿上理解MySQL》 www.zhihu.com/question/47…