有这样两条深分页的语句

mysql> select * from t where c1='a'  order by id limit 10000,10;
+--------+------+----+-------+-------+
| id     | c1   | c2 | c3    | c4    |
+--------+------+----+-------+-------+
| 261261 | a    | b  | 36f2a | f0781 |
| 261262 | a    | b  | 36f2a | 6dd61 |
| 261263 | a    | b  | 36f2a | 5f112 |
| 261264 | a    | b  | 36f2a | e3f79 |
| 261265 | a    | b  | 36f2a | 4b09b |
| 261266 | a    | b  | 36f2a | d70bc |
| 261267 | a    | b  | 36f2a | 21698 |
| 261268 | a    | b  | 36f2a | 61687 |
| 261269 | a    | b  | 37e56 | ea453 |
| 261270 | a    | b  | 37e56 | 81790 |
+--------+------+----+-------+-------+
10 rows in set (0.36 sec)

mysql> select * from t where c1='a' and c3='a' order by id limit 10000,10;
+--------+------+----+----+-------+
| id     | c1   | c2 | c3 | c4    |
+--------+------+----+----+-------+
| 607993 | a    | b  | a  | bfc97 |
| 607994 | a    | b  | a  | 6ddf3 |
| 607995 | a    | b  | a  | ec69e |
| 607996 | a    | b  | a  | 23187 |
| 607997 | a    | b  | a  | 473aa |
| 607998 | a    | b  | a  | f30bb |
| 607999 | a    | b  | a  | 134a0 |
| 608000 | a    | b  | a  | a80a5 |
| 608001 | a    | b  | a  | b6bb0 |
| 608002 | a    | b  | a  | 6490b |
+--------+------+----+----+-------+
10 rows in set (1.16 sec)

表结构


mysql> show create table t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `c1` char(64) DEFAULT '00' COMMENT 'testsss',
  `c2` char(64) NOT NULL DEFAULT '00000',
  `c3` char(64) NOT NULL,
  `c4` char(64) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_c1` (`c1`),
  KEY `idx_c2` (`c2`)
) ENGINE=InnoDB AUTO_INCREMENT=3469244 DEFAULT CHARSET=utf8mb4 COMMENT='test'
1 row in set (0.00 sec)

LIMIT执行方式

以第一条为例，来看下LIMIT的执行方式

先来看一下SQL的执行计划：


mysql> explain select * from t where c1='a' and c3='a' order by id limit 10000,10;
+----+-------------+-------+------------+------+---------------+--------+---------+-------+---------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key    | key_len | ref   | rows    | filtered | Extra                              |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+---------+----------+------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ref  | idx_c1        | idx_c1 | 257     | const | 1503140 |    10.00 | Using index condition; Using where |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+---------+----------+------------------------------------+
1 row in set, 1 warning (0.00 sec)

其执行过程大致为：

Server收到客户端请求后，对SQL语句进行解析，并最终生成执行计划，决定走idx_c1索引
引擎层通过idx_c1快速定位到满足c1='a'的第一条记录，然后回表获取到完整的数据，返回给Server
Server收到记录后，如果有其他的条件则进行判断（本例中没有其他条件，进行b）

不满足则直接丢弃
满足的话，由于有limit 10000,10的存在，如果是前10000行，直接丢弃；如果是10000行之后，则返回给客户端

引擎层通过idx_c1定位到满足c1='a'的下一条记录，回表获取到完整的数据，返回给Server层
重复3-4步骤，直到返回给客户端10条记录，则停止

可以看到即使客户端只需要10000行后的记录，MySQL也需要到引擎层将前10000行记录读取到Server层，然后丢弃。

因此，会发现在offset值很小的时候，SQL执行很快，随着offset值变大，SQL扫描量的行数越来越多，执行越来越慢

了解了其慢的消耗点之后，再看看下如何进行优化

优化方案

方案一：使用覆盖索引+子查询优化

有两种写法

写法一：

执行时间由原来的360ms变为20ms

SELECT *
FROM t
JOIN 
    (SELECT id
    FROM t
    WHERE c1='a'
    ORDER BY  id limit 10000,10) a
    ON t.id=a.id;
+-------+------+----+-------+-------+-------+
| id    | c1   | c2 | c3    | c4    | id    |
+-------+------+----+-------+-------+-------+
| 20225 | a    | c  | 66d78 | 83f51 | 20225 |
| 20226 | a    | c  | 66d78 | fe957 | 20226 |
| 20227 | a    | c  | 66d78 | a0c1f | 20227 |
| 20228 | a    | c  | 66d78 | dedc1 | 20228 |
| 20229 | a    | c  | 66d78 | 164f7 | 20229 |
| 20230 | a    | c  | 66d78 | 29a58 | 20230 |
| 20231 | a    | c  | 66d78 | fae0a | 20231 |
| 20232 | a    | c  | 66d78 | 8e51d | 20232 |
| 20233 | a    | c  | 66d78 | 00e64 | 20233 |
| 20234 | a    | c  | 66d78 | ff6ad | 20234 |
+-------+------+----+-------+-------+-------+
10 rows in set (0.02 sec)

其执行计划如下：

和原SQL不同的是，改写后的SQL先执行子查询并存储到物化表（临时表），这个过程使用idx_c1索引获取id字段，由于普通索引本身上存储主键值，不需要再回表（Extra:Using index）
外层表和临时表进行关联，关联后的记录再进行回表，这里临时表只有10条记录，则回表的记录数为10


mysql> explain SELECT * FROM t JOIN      (SELECT id     FROM t     WHERE c1='a'     ORDER BY  id limit 10000,10) a     ON t.id=a.id;
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref   | rows    | filtered | Extra                    |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+--------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL  |   10010 |   100.00 | NULL                     |
|  1 | PRIMARY     | t          | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | a.id  |       1 |   100.00 | NULL                     |
|  2 | DERIVED     | t          | NULL       | ref    | idx_c1        | idx_c1  | 257     | const | 1503140 |   100.00 | Using where; Using index |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+--------------------------+
3 rows in set, 1 warning (0.00 sec)

需要注意的是：

这种改写方式相对于原SQL减少了回表的成本，但在offset值很大的时候，扫描仍然行数很多，仍然可能效率差
改写后的子查询能使用覆盖索引的情况才能这么改，比如，对于SQL2，由于还有c3的条件，则仍然需要回表，并不会起到优化的效果，如下，执行时间相差不大


mysql> select * from t  join (select id from t where c1='a' and c3='a' order by id limit 10000,10) a on t.id=a.id;
+--------+------+----+----+-------+--------+
| id     | c1   | c2 | c3 | c4    | id     |
+--------+------+----+----+-------+--------+
| 607993 | a    | b  | a  | bfc97 | 607993 |
| 607994 | a    | b  | a  | 6ddf3 | 607994 |
| 607995 | a    | b  | a  | ec69e | 607995 |
| 607996 | a    | b  | a  | 23187 | 607996 |
| 607997 | a    | b  | a  | 473aa | 607997 |
| 607998 | a    | b  | a  | f30bb | 607998 |
| 607999 | a    | b  | a  | 134a0 | 607999 |
| 608000 | a    | b  | a  | a80a5 | 608000 |
| 608001 | a    | b  | a  | b6bb0 | 608001 |
| 608002 | a    | b  | a  | 6490b | 608002 |
+--------+------+----+----+-------+--------+
10 rows in set (1.11 sec)

-- 执行计划
mysql> explain select * from t  join (select id from t where c1='a' and c3='a' order by id limit 10000,10) a on t.id=a.id;
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+------------------------------------+
| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref   | rows    | filtered | Extra                              |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+------------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL  |   10010 |   100.00 | NULL                               |
|  1 | PRIMARY     | t          | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | a.id  |       1 |   100.00 | NULL                               |
|  2 | DERIVED     | t          | NULL       | ref    | idx_c1        | idx_c1  | 257     | const | 1503140 |    10.00 | Using index condition; Using where |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------+---------+----------+------------------------------------+
3 rows in set, 1 warning (0.00 sec)

写法二：

耗时和写法1相同

SELECT *
FROM t
WHERE id>=
    (SELECT id
    FROM t
    WHERE c1='a'
    ORDER BY  id limit 10000,1)
+-------+------+----+-------+-------+
| id    | c1   | c2 | c3    | c4    |
+-------+------+----+-------+-------+
| 20225 | a    | c  | 66d78 | 83f51 |
| 20226 | a    | c  | 66d78 | fe957 |
| 20227 | a    | c  | 66d78 | a0c1f |
| 20228 | a    | c  | 66d78 | dedc1 |
| 20229 | a    | c  | 66d78 | 164f7 |
| 20230 | a    | c  | 66d78 | 29a58 |
| 20231 | a    | c  | 66d78 | fae0a |
| 20232 | a    | c  | 66d78 | 8e51d |
| 20233 | a    | c  | 66d78 | 00e64 |
| 20234 | a    | c  | 66d78 | ff6ad |
+-------+------+----+-------+-------+
10 rows in set (0.02 sec)


-- 执行计划
mysql> explain select * from t  where id>=(select id from t where c1='a' order by id limit 10000,1) limit 10;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows    | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
|  1 | PRIMARY     | t     | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL  | 1503140 |   100.00 | Using where              |
|  2 | SUBQUERY    | t     | NULL       | ref   | idx_c1        | idx_c1  | 257     | const | 1503140 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
2 rows in set, 1 warning (0.02 sec)

方案二：起始位置重定义

-- 首次执行,记录最大ID
select * from t where c1='a' order by id limit 10;


-- 后续执行
select * from t where c1='a' and id>20224 order by id limit 10;
+-------+------+----+-------+-------+
| id    | c1   | c2 | c3    | c4    |
+-------+------+----+-------+-------+
| 20225 | a    | c  | 66d78 | 83f51 |
| 20226 | a    | c  | 66d78 | fe957 |
| 20227 | a    | c  | 66d78 | a0c1f |
| 20228 | a    | c  | 66d78 | dedc1 |
| 20229 | a    | c  | 66d78 | 164f7 |
| 20230 | a    | c  | 66d78 | 29a58 |
| 20231 | a    | c  | 66d78 | fae0a |
| 20232 | a    | c  | 66d78 | 8e51d |
| 20233 | a    | c  | 66d78 | 00e64 |
| 20234 | a    | c  | 66d78 | ff6ad |
+-------+------+----+-------+-------+
10 rows in set (0.01 sec)

在适用场景上：

由于该种写法依赖于上一次查到的id值，在跳页的情况下的情况下，会有问题。比如刚刚查完第25页，马上调到35页，数据就会不对。
因此比较适合不会跳页的页面加载场景或JOB种分批拉取数据的场景。

在执行效率上：

该种方式的好处是在走二级索引idx_c1的情况下，当c1='a'时，主键是有序的，可以快速定位到id>的边界，顺序扫描即可，执行效率不会受到页偏移的影响

需要注意的是：

对于还有非索引上的条件，例如第二条SQL，需要回表进行判断记录是否满足条件，执行的时间还可能受到c3的分布的影响，在找到满足条件的记录则停止查找，否则将一直向后扫描

另外，该种写法，在某些情况下会可能会走到索引合并或主键索引，执行方式有所变化，详细可参见：

索引合并原理

方案三：业务限制

前面两种方式都有其一定的使用限制，在不得不深分页的情况下，还可以在业务逻辑上进行限制，如：

限制业务允许查看的最大页数
或limit的偏移量超过某个值时返回空数据

这种方式认为当查询的偏移量超过某个值，就不是在分页了，而是在刷数据，如果是要找数据，应该输入合适的条件来缩小范围

参考链接

juejin.cn/post/701817…

LIMIT深分页优化