有这样两条深分页的语句
mysql> select * from t where c1='a' order by id limit 10000,10;
+--------+------+----+-------+-------+
| id | c1 | c2 | c3 | c4 |
+--------+------+----+-------+-------+
| 261261 | a | b | 36f2a | f0781 |
| 261262 | a | b | 36f2a | 6dd61 |
| 261263 | a | b | 36f2a | 5f112 |
| 261264 | a | b | 36f2a | e3f79 |
| 261265 | a | b | 36f2a | 4b09b |
| 261266 | a | b | 36f2a | d70bc |
| 261267 | a | b | 36f2a | 21698 |
| 261268 | a | b | 36f2a | 61687 |
| 261269 | a | b | 37e56 | ea453 |
| 261270 | a | b | 37e56 | 81790 |
+--------+------+----+-------+-------+
10 rows in set (0.36 sec)
mysql> select * from t where c1='a' and c3='a' order by id limit 10000,10;
+--------+------+----+----+-------+
| id | c1 | c2 | c3 | c4 |
+--------+------+----+----+-------+
| 607993 | a | b | a | bfc97 |
| 607994 | a | b | a | 6ddf3 |
| 607995 | a | b | a | ec69e |
| 607996 | a | b | a | 23187 |
| 607997 | a | b | a | 473aa |
| 607998 | a | b | a | f30bb |
| 607999 | a | b | a | 134a0 |
| 608000 | a | b | a | a80a5 |
| 608001 | a | b | a | b6bb0 |
| 608002 | a | b | a | 6490b |
+--------+------+----+----+-------+
10 rows in set (1.16 sec)
表结构
mysql> show create table t\G
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE `t` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`c1` char(64) DEFAULT '00' COMMENT 'testsss',
`c2` char(64) NOT NULL DEFAULT '00000',
`c3` char(64) NOT NULL,
`c4` char(64) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`),
KEY `idx_c2` (`c2`)
) ENGINE=InnoDB AUTO_INCREMENT=3469244 DEFAULT CHARSET=utf8mb4 COMMENT='test'
1 row in set (0.00 sec)
LIMIT执行方式
以第一条为例,来看下LIMIT的执行方式
先来看一下SQL的执行计划:
mysql> explain select * from t where c1='a' and c3='a' order by id limit 10000,10;
+----+-------------+-------+------------+------+---------------+--------+---------+-------+---------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+---------+----------+------------------------------------+
| 1 | SIMPLE | t | NULL | ref | idx_c1 | idx_c1 | 257 | const | 1503140 | 10.00 | Using index condition; Using where |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+---------+----------+------------------------------------+
1 row in set, 1 warning (0.00 sec)
其执行过程大致为:
- Server收到客户端请求后,对SQL语句进行解析,并最终生成执行计划,决定走idx_c1索引
- 引擎层通过idx_c1快速定位到满足c1='a'的第一条记录,然后回表获取到完整的数据,返回给Server
- Server收到记录后,如果有其他的条件则进行判断(本例中没有其他条件,进行b)
- 不满足则直接丢弃
- 满足的话,由于有limit 10000,10的存在,如果是前10000行,直接丢弃;如果是10000行之后,则返回给客户端
- 引擎层通过idx_c1定位到满足c1='a'的下一条记录,回表获取到完整的数据,返回给Server层
- 重复3-4步骤,直到返回给客户端10条记录,则停止
可以看到即使客户端只需要10000行后的记录,MySQL也需要到引擎层将前10000行记录读取到Server层,然后丢弃。
因此,会发现在offset值很小的时候,SQL执行很快,随着offset值变大,SQL扫描量的行数越来越多,执行越来越慢
了解了其慢的消耗点之后,再看看下如何进行优化
优化方案
方案一:使用覆盖索引+子查询优化
有两种写法
写法一:
- 执行时间由原来的360ms变为20ms
SELECT *
FROM t
JOIN
(SELECT id
FROM t
WHERE c1='a'
ORDER BY id limit 10000,10) a
ON t.id=a.id;
+-------+------+----+-------+-------+-------+
| id | c1 | c2 | c3 | c4 | id |
+-------+------+----+-------+-------+-------+
| 20225 | a | c | 66d78 | 83f51 | 20225 |
| 20226 | a | c | 66d78 | fe957 | 20226 |
| 20227 | a | c | 66d78 | a0c1f | 20227 |
| 20228 | a | c | 66d78 | dedc1 | 20228 |
| 20229 | a | c | 66d78 | 164f7 | 20229 |
| 20230 | a | c | 66d78 | 29a58 | 20230 |
| 20231 | a | c | 66d78 | fae0a | 20231 |
| 20232 | a | c | 66d78 | 8e51d | 20232 |
| 20233 | a | c | 66d78 | 00e64 | 20233 |
| 20234 | a | c | 66d78 | ff6ad | 20234 |
+-------+------+----+-------+-------+-------+
10 rows in set (0.02 sec)
其执行计划如下:
- 和原SQL不同的是,改写后的SQL先执行子查询并存储到物化表(临时表),这个过程使用idx_c1索引获取id字段,由于普通索引本身上存储主键值,不需要再回表(Extra:Using index)
- 外层表和临时表进行关联,关联后的记录再进行回表,这里临时表只有10条记录,则回表的记录数为10
mysql> explain SELECT * FROM t JOIN (SELECT id FROM t WHERE c1='a' ORDER BY id limit 10000,10) a ON t.id=a.id;
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 10010 | 100.00 | NULL |
| 1 | PRIMARY | t | NULL | eq_ref | PRIMARY | PRIMARY | 4 | a.id | 1 | 100.00 | NULL |
| 2 | DERIVED | t | NULL | ref | idx_c1 | idx_c1 | 257 | const | 1503140 | 100.00 | Using where; Using index |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+--------------------------+
3 rows in set, 1 warning (0.00 sec)
需要注意的是:
- 这种改写方式相对于原SQL减少了回表的成本,但在offset值很大的时候,扫描仍然行数很多,仍然可能效率差
- 改写后的子查询能使用覆盖索引的情况才能这么改,比如,对于SQL2,由于还有c3的条件,则仍然需要回表,并不会起到优化的效果,如下,执行时间相差不大
mysql> select * from t join (select id from t where c1='a' and c3='a' order by id limit 10000,10) a on t.id=a.id;
+--------+------+----+----+-------+--------+
| id | c1 | c2 | c3 | c4 | id |
+--------+------+----+----+-------+--------+
| 607993 | a | b | a | bfc97 | 607993 |
| 607994 | a | b | a | 6ddf3 | 607994 |
| 607995 | a | b | a | ec69e | 607995 |
| 607996 | a | b | a | 23187 | 607996 |
| 607997 | a | b | a | 473aa | 607997 |
| 607998 | a | b | a | f30bb | 607998 |
| 607999 | a | b | a | 134a0 | 607999 |
| 608000 | a | b | a | a80a5 | 608000 |
| 608001 | a | b | a | b6bb0 | 608001 |
| 608002 | a | b | a | 6490b | 608002 |
+--------+------+----+----+-------+--------+
10 rows in set (1.11 sec)
-- 执行计划
mysql> explain select * from t join (select id from t where c1='a' and c3='a' order by id limit 10000,10) a on t.id=a.id;
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 10010 | 100.00 | NULL |
| 1 | PRIMARY | t | NULL | eq_ref | PRIMARY | PRIMARY | 4 | a.id | 1 | 100.00 | NULL |
| 2 | DERIVED | t | NULL | ref | idx_c1 | idx_c1 | 257 | const | 1503140 | 10.00 | Using index condition; Using where |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+------------------------------------+
3 rows in set, 1 warning (0.00 sec)
写法二:
- 耗时和写法1相同
SELECT *
FROM t
WHERE id>=
(SELECT id
FROM t
WHERE c1='a'
ORDER BY id limit 10000,1)
+-------+------+----+-------+-------+
| id | c1 | c2 | c3 | c4 |
+-------+------+----+-------+-------+
| 20225 | a | c | 66d78 | 83f51 |
| 20226 | a | c | 66d78 | fe957 |
| 20227 | a | c | 66d78 | a0c1f |
| 20228 | a | c | 66d78 | dedc1 |
| 20229 | a | c | 66d78 | 164f7 |
| 20230 | a | c | 66d78 | 29a58 |
| 20231 | a | c | 66d78 | fae0a |
| 20232 | a | c | 66d78 | 8e51d |
| 20233 | a | c | 66d78 | 00e64 |
| 20234 | a | c | 66d78 | ff6ad |
+-------+------+----+-------+-------+
10 rows in set (0.02 sec)
-- 执行计划
mysql> explain select * from t where id>=(select id from t where c1='a' order by id limit 10000,1) limit 10;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
| 1 | PRIMARY | t | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 1503140 | 100.00 | Using where |
| 2 | SUBQUERY | t | NULL | ref | idx_c1 | idx_c1 | 257 | const | 1503140 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
2 rows in set, 1 warning (0.02 sec)
方案二:起始位置重定义
-- 首次执行,记录最大ID
select * from t where c1='a' order by id limit 10;
-- 后续执行
select * from t where c1='a' and id>20224 order by id limit 10;
+-------+------+----+-------+-------+
| id | c1 | c2 | c3 | c4 |
+-------+------+----+-------+-------+
| 20225 | a | c | 66d78 | 83f51 |
| 20226 | a | c | 66d78 | fe957 |
| 20227 | a | c | 66d78 | a0c1f |
| 20228 | a | c | 66d78 | dedc1 |
| 20229 | a | c | 66d78 | 164f7 |
| 20230 | a | c | 66d78 | 29a58 |
| 20231 | a | c | 66d78 | fae0a |
| 20232 | a | c | 66d78 | 8e51d |
| 20233 | a | c | 66d78 | 00e64 |
| 20234 | a | c | 66d78 | ff6ad |
+-------+------+----+-------+-------+
10 rows in set (0.01 sec)
在适用场景上:
- 由于该种写法依赖于上一次查到的id值,在跳页的情况下的情况下,会有问题。比如刚刚查完第25页,马上调到35页,数据就会不对。
- 因此比较适合不会跳页的页面加载场景或JOB种分批拉取数据的场景。
在执行效率上:
- 该种方式的好处是在走二级索引idx_c1的情况下,当c1='a'时,主键是有序的,可以快速定位到id>的边界,顺序扫描即可,执行效率不会受到页偏移的影响
需要注意的是:
- 对于还有非索引上的条件,例如第二条SQL,需要回表进行判断记录是否满足条件,执行的时间还可能受到c3的分布的影响,在找到满足条件的记录则停止查找,否则将一直向后扫描
另外,该种写法,在某些情况下会可能会走到索引合并或主键索引,执行方式有所变化,详细可参见:
方案三:业务限制
前面两种方式都有其一定的使用限制,在不得不深分页的情况下,还可以在业务逻辑上进行限制,如:
- 限制业务允许查看的最大页数
- 或limit的偏移量超过某个值时返回空数据
这种方式认为当查询的偏移量超过某个值,就不是在分页了,而是在刷数据,如果是要找数据,应该输入合适的条件来缩小范围