limit的用法和作用
Mysql 中的limit子句主要用来做分页,比如需要展现一个结果集中第100到第120的数据记录,具体的使用语法为 limit M,N。其中的M用于表示便宜量也就是从第几条开始,N用来表示去多少行。
Limit带来的问题
在数据量较小或者是中等的时候,使用limit可以很方便的解决分页的问题,但是如果在数据量较大,尤其是在M这个偏移量较大的时候,那么limit子句的性能就会明显的下降。可以看下面这个例子:
CREATE TABLE `member` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(10) NOT NULL COMMENT '姓名',
`gender` tinyint(3) unsigned NOT NULL COMMENT '性别',
PRIMARY KEY (`id`),
KEY `gender` (`gender`)
);
mh01@3306>desc member;
+--------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(10) | NO | | NULL | |
| gender | tinyint(3) unsigned | NO | MUL | NULL | |
+--------+---------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mh01@3306>select count(*) from member;
+----------+
| count(*) |
+----------+
| 2000000 |
+----------+
1 row in set (0.27 sec)
上面是一个200W的表,现在通过设置不同的limit偏移量来展示不同偏移量对性能的影响。
mh01@3306>select * from member where gender=1 limit 10,2;
+----+------------+--------+
| id | name | gender |
+----+------------+--------+
| 21 | d2fe4476b8 | 1 |
| 24 | afd9592b48 | 1 |
+----+------------+--------+
2 rows in set (0.00 sec)
mh01@3306>select * from member where gender=1 limit 200000,2;
+--------+------------+--------+
| id | name | gender |
+--------+------------+--------+
| 399604 | 673869a827 | 1 |
| 399605 | fe46b99c08 | 1 |
+--------+------------+--------+
2 rows in set (0.18 sec)
mh01@3306>select * from member where gender=1 limit 800000,2;
+---------+------------+--------+
| id | name | gender |
+---------+------------+--------+
| 1599363 | e1bde81632 | 1 |
| 1599364 | c6f694dc1e | 1 |
+---------+------------+--------+
2 rows in set (1.14 sec)
从上面的展示中可以发现,当偏移量增加的时候,同样是查询2行,性能是越来越差的,为什么会出现上面的这种情况?这个就需要从limit本身的执行方式说起了。
- 首先通过gender索引获取gender=1的值,获取主键的值。
- 通过第一步获取到的主键回表去查询其它的数据,因为在gender这个索引上面是只有gender的值和主键值的,其它的数据都需要通过回表查询。
- 抛弃offset前的数据,也就是说,虽然我们只是取出2条数据,但是800000,2我们就需要处理了8000002条数据。而且在处理的过程中产生了大量的随机读。
验证
可以通过两个方面去验证,首先可以通过information_schema.INNODB_BUFFER_PAGE这个视图去查看缓存的情况,也可以通过sys.io_global_by_file_by_bytes 去查看对应ibd文件被读取的情况
这个是重启数据库后的初始情况
mh01@3306>select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('primary','gender') and TABLE_NAME like '%member%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| PRIMARY | 399 |
+------------+----------+
mh01@3306>select * from test.member where gender=1 limit 800000,2;
+---------+------------+--------+
| id | name | gender |
+---------+------------+--------+
| 1599363 | e1bde81632 | 1 |
| 1599364 | c6f694dc1e | 1 |
+---------+------------+--------+
2 rows in set (2.02 sec)
发现在执行了800000,2之后,primary的数据明显增加
mh01@3306>select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('primary','gender') and TABLE_NAME like '%member%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| gender | 577 |
| PRIMARY | 3687 |
+------------+----------+
2 rows in set (0.23 sec)
使用io_global_by_file_by_bytes也能发现开销真不小
mh01@3306>select count_read from sys.io_global_by_file_by_bytes where file like '%member.ibd%';
+------------+
| count_read |
+------------+
| 1351 |
+------------+
1 row in set (0.19 sec)
mh01@3306>select count_read from sys.io_global_by_file_by_bytes where file like '%member.ibd%';
+------------+
| count_read |
+------------+
| 4266 |
+------------+
1 row in set (0.20 sec)
解决方式
既然已经知道了问题参数的原因,那么就不能知道解决的办法了。可以通过覆盖索引的方式先找到对应的主键,就相当于直接告诉数据库,你不需要去一条条的算offset了,直接告诉数据库第8000000行的主键id就好。
通过join的方式
mh01@3306>select a.* from member as a inner join (select id from member where gender=1 limit 800000,2) as b on a.id=b.id;
+---------+------------+--------+
| id | name | gender |
+---------+------------+--------+
| 1599363 | e1bde81632 | 1 |
| 1599364 | c6f694dc1e | 1 |
+---------+------------+--------+
2 rows in set (0.15 sec)
普通的方式
mh01@3306>select * from test.member where gender=1 limit 800000,2; +---------+------------+--------+
| id | name | gender |
+---------+------------+--------+
| 1599363 | e1bde81632 | 1 |
| 1599364 | c6f694dc1e | 1 |
+---------+------------+--------+
2 rows in set (0.93 sec)
通过用join的方式能大大的减少性能的开销,主要是通过select id from member where gender=1 limit 800000,2的方式获取了需要记录的主键。整个过程就变成了,先通过gender索引获取对应的id,因为在gender上面已经记录了对应的id值,不需要回表也就不会产生大量的随机IO,就算处理了800000条记录性能依然非常好。在这一部分最终获取了2个id,最后只需要对着2个id做回表查询就好。大大的提高了性能。
对比一下性能的开销
mh01@3306>select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('primary','gender') and TABLE_NAME like '%member%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| gender | 246 |
| PRIMARY | 208 |
+------------+----------+
2 rows in set (0.40 sec)
mh01@3306>select a.* from member as a inner join (select id from member where gender=1 limit 800000,2) as b on a.id=b.id;
+---------+------------+--------+
| id | name | gender |
+---------+------------+--------+
| 1599363 | e1bde81632 | 1 |
| 1599364 | c6f694dc1e | 1 |
+---------+------------+--------+
2 rows in set (0.22 sec)
mh01@3306>
mh01@3306>select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('primary','gender') and TABLE_NAME like '%member%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| gender | 577 |
| PRIMARY | 212 |
+------------+----------+
2 rows in set (0.23 sec)
从对应的数据中可以看出,使用join的方式来实现相同的功能,主要还是对gender索引的访问,对primary的访问相对较少。