我们在看索引原理时,可以了解到InnoDB的索引实现是通过B+树实现的。
针对主键索引,非叶子结点存储主键字段,叶子节点存储真正的行数据。
针对非主键索引,如联合索引,非叶子结点存储索引字段,叶子节点存储主键ID,如果需要查询索引列及主键列之外的字段,还需要根据主键ID回表在主键索引树中找到叶子节点。
我们还知道,为了便于范围遍历,索引的叶子节点间用双向链表形式连接,叶子节点内部用单向链表连接。
看到这里,我萌生了一个疑问,
索引同层的非叶子结点间,是否也用双线链表维系呢?
在很多讲解配图中,同层的非叶子结点间,没有建立直接联系
猜测
首先自己提出猜测,作为一个树形结构,每个节点的结构体或属性应该是一样的,我尝试用伪代码描述一下我猜测的结构体形式。
class TreeNode {
int index; //索引列实际值
List<TreeNode> subNodeList; // 子节点
TreeNode pre, next; // 前驱,后续节点
Row row; // 行数据,主键索引为行数据,非主键索引为主键ID
}
下载MySQL源码
访问dev.mysql.com/downloads/m… ,选择下载源码
在storage/innobase/btr这个索引目录中,找到btr0btr.cc,找到创建索引的过程,略读代码发现,前驱、后置指针的建立,没有叶子节点、非叶子结点的判断,因此我们的猜测成立,InnoDB B+树索引的每一层节点间,都通过前后指针组成双向链表相互连接。
ulint btr_create(ulint type, space_id_t space, space_index_t index_id,
dict_index_t *index, mtr_t *mtr) {
page_no_t page_no;
buf_block_t *block;
buf_frame_t *frame;
page_t *page;
page_zip_des_t *page_zip;
ut_ad(index_id != BTR_FREED_INDEX_ID);
/* Create the two new segments (one, in the case of an ibuf tree) for
the index tree; the segment headers are put on the allocated root page
(for an ibuf tree, not in the root, but on a separate ibuf header
page) */
if (type & DICT_IBUF) {
/* Allocate first the ibuf header page */
buf_block_t *ibuf_hdr_block =
fseg_create(space, 0, IBUF_HEADER + IBUF_TREE_SEG_HEADER, mtr);
if (ibuf_hdr_block == nullptr) {
return (FIL_NULL);
}
buf_block_dbg_add_level(ibuf_hdr_block, SYNC_IBUF_TREE_NODE_NEW);
ut_ad(ibuf_hdr_block->page.id.page_no() == IBUF_HEADER_PAGE_NO);
/* Allocate then the next page to the segment: it will be the
tree root page */
block = fseg_alloc_free_page(buf_block_get_frame(ibuf_hdr_block) +
IBUF_HEADER + IBUF_TREE_SEG_HEADER,
IBUF_TREE_ROOT_PAGE_NO, FSP_UP, mtr);
ut_ad(block->page.id.page_no() == IBUF_TREE_ROOT_PAGE_NO);
} else {
block = fseg_create(space, 0, PAGE_HEADER + PAGE_BTR_SEG_TOP, mtr);
}
if (block == nullptr) {
return (FIL_NULL);
}
page_no = block->page.id.page_no();
frame = buf_block_get_frame(block);
if (type & DICT_IBUF) {
/* It is an insert buffer tree: initialize the free list */
buf_block_dbg_add_level(block, SYNC_IBUF_TREE_NODE_NEW);
ut_ad(page_no == IBUF_TREE_ROOT_PAGE_NO);
flst_init(frame + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST, mtr);
} else {
/* It is a non-ibuf tree: create a file segment for leaf
pages */
buf_block_dbg_add_level(block, SYNC_TREE_NODE_NEW);
if (!fseg_create(space, page_no, PAGE_HEADER + PAGE_BTR_SEG_LEAF, mtr)) {
/* Not enough space for new segment, free root
segment before return. */
btr_free_root(block, mtr);
if (!index->table->is_temporary()) {
btr_free_root_invalidate(block, mtr);
}
return (FIL_NULL);
}
/* The fseg create acquires a second latch on the page,
therefore we must declare it: */
buf_block_dbg_add_level(block, SYNC_TREE_NODE_NEW);
}
uint16_t page_create_type;
if (dict_index_is_spatial(index)) {
page_create_type = FIL_PAGE_RTREE;
} else if (dict_index_is_sdi(index)) {
page_create_type = FIL_PAGE_SDI;
} else {
page_create_type = FIL_PAGE_INDEX;
}
/* Create a new index page on the allocated segment page */
page_zip = buf_block_get_page_zip(block);
if (page_zip) {
page = page_create_zip(block, index, 0, 0, mtr, page_create_type);
} else {
page = page_create(block, mtr, dict_table_is_comp(index->table),
page_create_type);
/* Set the level of the new index page */
btr_page_set_level(page, nullptr, 0, mtr);
}
/* Set the index id of the page */
btr_page_set_index_id(page, page_zip, index_id, mtr);
/* Set the next node and previous node fields */
/* 我们要找的在这 */
btr_page_set_next(page, page_zip, FIL_NULL, mtr);
btr_page_set_prev(page, page_zip, FIL_NULL, mtr);
/* We reset the free bits for the page to allow creation of several
trees in the same mtr, otherwise the latch on a bitmap page would
prevent it because of the latching order.
Note: Insert Buffering is disabled for temporary tables given that
most temporary tables are smaller in size and short-lived. */
if (!(type & DICT_CLUSTERED) && !index->table->is_temporary()) {
ibuf_reset_free_bits(block);
}
/* In the following assertion we test that two records of maximum
allowed size fit on the root page: this fact is needed to ensure
correctness of split algorithms */
ut_ad(page_get_max_insert_size(page, 2) > 2 * BTR_PAGE_MAX_REC_SIZE);
buf_stat_per_index->inc(index_id_t(space, index_id));
return (page_no);
}
这里又来了另一个问题
非叶子结点各层的双向链表,连起来起了什么作用呢?
MySQL的执行过程
客户端连接MySQL,执行一条SQL语句,大致过程为:
- 连接管理
- 解析与优化
- 查询缓存(表结构或表数据一旦变化,需要重新构建缓存)
- 语法解析,分析对那个表做什么操作
- 查询优化,可通过EXPLAIN分析执行计划
- 存储引擎
- InnoDB、MyIASM等
这里还是做一个推测,因为EXPLAIN分析执行计划,有可能没有用到你原本设定的索引,反而走了全表扫描,因为MySQL认定走索引的效率和走全表扫描接近或全表扫描效率更高。
MySQL会结合IO成本、CPU成本,判断最终是走索引还是走全表扫描
MySQL如何快速做这种判断呢,推测可能是通过非叶子结点的双向链表,做初步估计,大致会扫描多少字段。
个人能力有限,MySQL源码完全啃不动,看完知乎大佬的解答,恍然大悟
参考:
小册《MySQL是怎样运行的:从根儿上理解MySQL》 www.zhihu.com/question/47…