前言

关于 mysql 的 sql 语句优化在我的关于mysql索引分析和优化(一) 中讲解了很多方法, 由于篇幅过长, 已经5000多字, 有点卡, 所以新开篇幅再讲

数据量

select count(*) from emp; # 500000
select count(*) from dept; # 10000

sql 优化方案

大批量插入数据

当使用 load 命令导入数据的时候，适当的设置可以提高导入的效率

但是我们需要注意下数据源的数据各种

发现以 , 做数据字段分割, 以 \n 做行分割, 这种规律的数据才可以, 并且 id 还是有序的, 可以提交效率

现在我们开始:

load data local infile 'sql1.log' into table `tb_user` fields terminated by ',' lines terminated by '\n';

发现报错了,

ERROR 3948 (42000): Loading local data is disabled; this must be enabled on both the client and server sides

show global variables like 'local_infile';
set global local_infile = 'ON';
show global variables like 'local_infile';

设置完成, 但还是出现了问题

ERROR 1148 (42000): The used command is not allowed with this MySQL version

需要使用下面这种方式完成:

mysql --local-infile=1 -u root -p123123

完成之后

load data local infile 'sql1.log' into table `tb_user_1` fields terminated by ',' lines terminated by '\n';

成功提醒了:

Query OK, 1000000 rows affected, 65535 warnings (16.13 sec)
Records: 1000000  Deleted: 0  Skipped: 0  Warnings: 4000000

关闭掉这个

set global local_infile = 'OFF';

你会发现这样导入效率太慢了, 可以适当的关闭掉一些功能再导入

所以第一个提高大量数据导入的方法有

主键顺序插入

有序:

无序:

关闭唯一性校验

SET UNIQUE_CHECKS=0

导入结束之后

SET UNIQUE_CHECKS=1

手动提交事务

先手动关闭事务

SET AUTOCOMMIT=0

然后开启事务

SET AUTOCOMMIT=1

现在我们试试 load 无序的数据看看

load data local infile 'sql2.log' into table `tb_user_2^Cfields terminated by ',' lines terminated by '\n';

优化insert语句

insert 存在优化的可能性, 比如

insert into tablex into values(xxxx)
insert into tablex into values(xxxx)
insert into tablex into values(xxxx)
insert into tablex into values(xxxx)

insert into tablex values(xxx),(yyy)，(zzz)

同时还可以在事务中 insert 这样速度会非常快

start transaction;
insert into tb_test values(1,'Tom');
insert into tb_test values(2,'Cat');
insert into tb_test values(3,'Jerry');
commit;

数据有序插入

insert into tb_test values(4,'Tim');
insert into tb_test values(1,'Tom');
insert into tb_test values(3,'Jerry');
insert into tb_test values(5,'Rose');
insert into tb_test values(2,'Cat');

优化之后按照顺序

insert into tb_test values(1,'Tom');
insert into tb_test values(2,'Cat');
insert into tb_test values(3,'Jerry');
insert into tb_test values(4,'Tim');
insert into tb_test values(5,'Rose');

优化order by语句

单字段排序

explain select sql_no_cache *
from emp e order by e.age;

在控制台 extra 里会显示 using filesort

出现这种情况需要着重处理, 一般加索引就行, 但我们对主表的索引表现的不是很明显, 但实际上效率却有提升

多字段排序, Filesort 的优化

对于 Filesort , mysql 有两种排序方式:

两次扫描法: mysql 4.1 之前, 使用该方法排序
1. 根据条件拿出所有排序字段和行指针信息
2. 然后在排序区间 sort buffer 中排序, 如果 sort buffer 不够, 则在临时表中储存排序结果
3. 排序完毕, 在根据行指针回表读取记录, 该操作可能产生大量 IO 操作
一次扫描算法：一次性取出满足条件的所有字段，然后在排序区 sort buffer 中排序后直接输出结果集。排序时内存开销较大，但是排序效率比两次扫描算法要高

mysql 通过判断 max_length_for_sort_data 系统变量的大小和查询语句字段的总大小判断, 使用哪种排序方法

如果 max_length_for_sort_data 那么使用第二种方法, 否则使用第一种方法

可以适当提高 sort_buffer_size 和 max_length_for_sort_data 系统变量，来增大排序区的大小，提高排序的效率

在多字段排序中

create index idx_age_empno on emp (age, empno);
explain select *
from emp e order by e.age, e.empno;

这样虽然你还是无可避免的还是会遇到 extra 的 Using filesort 的问题, 但加索引前后的效率提升非常的大

那么现在需要探索下.

如果 order 前后字段的位置不同索引会不会生效?

答: 不会, 之所以不会生效的原因其实很简单, 调换 order 字段的位置会变成两个 sql 语句, 他们的结果本身就不一样

explain select *
from emp e order by e.empno, e.age;

其实我还挺好奇的, 它到底是哪个字段使用了索引, 所以又创建了另一个索引

create index idx_age_name on emp(age, name);
# 如果是 order by e.age, e.name 这样的索引, 大小是: 3*20+2+1+4+1 = 68
explain select id, age, name
from emp e order by e.name, e.age;

而上面这段 sql 执行完毕后的执行结果还是 68 但在 extra 中会多出 filesort 文件系统排序字符, 说明两个索引同时使用上了, 但多了需要排序的功能

如果对 age 的排序和 empno 的排序方向不同怎么样?

explain select id, age, name
from emp e order by e.age desc, e.name;

执行索引还是 68 但仍然需要排序, 所以结果很明朗了

优化 group by 语句

group by 和 order by 的区别在于, 他在排序之后需要对数据进行分组, 但我们可以选择使用在 sql 语句之后添加 order by null 消除排序工作

但我在 mysql8.0 中发现 group by 分组并没有出现 filesort 字段, 所以我怀疑他已经做了优化, 所以在我的测试中, 效果变化不大, 反而更慢了

explain
select sql_no_cache age, count(*)
from emp e
group by e.age;

+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra           |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
|  1 | SIMPLE      | e     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 456960 |   100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
1 row in set, 2 warnings (0.00 sec)

而去掉 explain 之后的执行时间是

41 rows in set, 1 warning (0.35 sec)

现在加上 order by null 进行优化之后

explain
select sql_no_cache age, count(*)
from emp e
group by e.age
order by null;

+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra           |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
|  1 | SIMPLE      | e     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 456960 |   100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
1 row in set, 2 warnings (0.00 sec)

取消 explain 之后

41 rows in set, 1 warning (0.34 sec)

差别看起来不大

现在创建索引

create index idx_age on emp (age);

explain
select sql_no_cache age, count(*)
from emp e
group by e.age;

+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | e     | NULL       | index | idx_age       | idx_age | 5       | NULL | 456960 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
1 row in set, 2 warnings (0.00 sec)

or 的优化

or 的出现, 在 or 左右两边临近的两个字段(不是左边两个, 右边两个)是无法使用上复合索引的, 在 sql 中如果出现 or, 那么可以当做两个 sql 来看, 连索引都是以两个的形式显示出来的

+----+-------------+-------+------------+-------------+----------------------------------------------+------------------------+---------+------+------+----------+--------------------------------------------------+
| id | select_type | table | partitions | type        | possible_keys                                | key                    | key_len | ref  | rows | filtered | Extra                                            |
+----+-------------+-------+------------+-------------+----------------------------------------------+------------------------+---------+------+------+----------+--------------------------------------------------+
|  1 | SIMPLE      | e     | NULL       | index_merge | idx_age,idx_empno,idx_age_empno,idx_age_dept | idx_age_dept,idx_empno | 10,4    | NULL |    2 |   100.00 | Using union(idx_age_dept,idx_empno); Using where |
+----+-------------+-------+------------+-------------+----------------------------------------------+------------------------+---------+------+------+----------+--------------------------------------------------+
1 row in set, 2 warnings (0.00 sec)

而且从 extra 中显示的来看, 他在内部可能改造成 union

在这里需要看下 type 类型的优先级 system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

而图上的 type 是 index_merge 类型, 所以效果还不错吧

他们说使用这种方法还不容我们主动修改成 union 方式的 sql 进行查询呢? 那试试就试试

create index idx_age on emp (age);
create index idx_empno on emp (empno);
create index idx_age_empno on emp (age, empno);
create index idx_age_dept on emp (age, deptId);

explain select sql_no_cache *
from emp e where (e.age = 20 and e.deptId = 32) or e.empno = 100717;

explain select sql_no_cache *
from emp e
where e.age = 20
  and e.deptId = 32
union
select *
from emp e2
where e2.empno = 100717;

打印:

+----+--------------+------------+------------+------+------------------------------------+--------------+---------+-------------+------+----------+-----------------+
| id | select_type  | table      | partitions | type | possible_keys                      | key          | key_len | ref         | rows | filtered | Extra           |
+----+--------------+------------+------------+------+------------------------------------+--------------+---------+-------------+------+----------+-----------------+
|  1 | PRIMARY      | e          | NULL       | ref  | idx_age,idx_age_empno,idx_age_dept | idx_age_dept | 10      | const,const |    1 |   100.00 | NULL            |
|  2 | UNION        | e2         | NULL       | ref  | idx_empno                          | idx_empno    | 4       | const       |    1 |   100.00 | NULL            |
| NULL | UNION RESULT | <union1,2> | NULL       | ALL  | NULL                               | NULL         | NULL    | NULL        | NULL |     NULL | Using temporary |
+----+--------------+------------+------------+------+------------------------------------+--------------+---------+-------------+------+----------+-----------------+
3 rows in set, 2 warnings (0.00 sec)

在不使用 union 之前为 index_merge 使用了之后 ref ref all 不知道速度提高多少幅度

这里只提供方法, 但不给你具体的效率

优化分页查询

分页查询其实是很耗费速度的, 比如 limit 10000000000000, 10 , 此时会先排序 10000000000010 然后在只返回 10000000000000 ~ 10000000000010 的 10 条数据

这样的效率是非常慢的, 所以一般需要添加索引

explain select *
from emp e limit 490000, 10;

优化思路一

我们可以充分利用覆盖索引方案, 现在索引树上将需要的数据找出, 然后在拿着这里返回的 ids 跟主表的 id 进行一一比较, 试试看效率提高多少

explain select *
from emp e
         inner join (select id from emp e2 limit 490000, 10) as a on a.id = e.id;

读者可以看下效率提高如何?

+----+-------------+------------+------------+--------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+------------+------------+--------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL | 456960 |   100.00 | NULL        |
|  1 | PRIMARY     | e          | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | a.id |      1 |   100.00 | NULL        |
|  2 | DERIVED     | e2         | NULL       | index  | NULL          | PRIMARY | 4       | NULL | 456960 |   100.00 | Using index |
+----+-------------+------------+------------+--------+---------------+---------+---------+------+--------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

发现效率提高了很多

优化思路二

针对主键 ID 自动自增方案

explain
select *
from emp e
where e.id > 990001
limit 10;

发现了么? 效率提高极大

使用SQL提示

SQL提示，是优化数据库的一个重要手段，简单来说，就是在SQL语句中加入一些人为的提示来达到优化操作的目的

USE INDEX

use index 主要的功能就是推荐一些索引的使用方案给 mysql 做参考

explain select *
from emp e use index (idx_name)
where e.name = 'LCBdLv';

explain
select *
from emp e use index (idx_name)
where e.name = 'LCBdLv';

IGNORE INDEX

使用这种方式 ignore index 它的功能就是忽略一个或者多个索引的自动使用

explain select *
from emp e ignore index (idx_name)
where e.name = 'LCBdLv';

FORCE INDEX

为 mysql 强制使用特定的索引

explain select *
from emp e force index (idx_name)
where e.name = 'LCBdLv';

sql优化大杀器 `soar`

开源:soar github

怎么用? github 上有, 去下载个版本, 直接打开控制台运行就行, 不懂用可以带上参数 --help 里面一堆参数提示

配置文件soar.yaml:

# 线上环境配置
online-dsn:
  addr: 127.0.0.1:3306
  schema: sell
  user: root
  password: root
  disable: true
# 测试环境配置
test-dsn:
  addr: localhost:3306
  schema: test_db
  user: root
  password: 123456
  disable: false
# 是否允许测试环境与线上环境配置相同
allow-online-as-test: true
# 是否清理测试时产生的临时文件
drop-test-temporary: true
# 语法检查小工具
only-syntax-check: false
sampling-statistic-target: 100
sampling: false
# 日志级别，[0:Emergency, 1:Alert, 2:Critical, 3:Error, 4:Warning, 5:Notice, 6:Informational, 7:Debug]
log-level: 7
log-output: soar.log
# 优化建议输出格式
#report-type: markdown
report-type: html
ignore-rules:
- ""
# 黑名单中的 SQL 将不会给评审意见。一行一条 SQL，可以是正则也可以是指纹，填写指纹时注意问号需要加反斜线转义。
blacklist: ${your_config_dir}/soar.blacklist
# 启发式算法相关配置
max-join-table-count: 5
max-group-by-cols-count: 5
max-distinct-count: 5
max-index-cols-count: 5
max-total-rows: 9999999
spaghetti-query-length: 2048
allow-drop-index: false
# EXPLAIN相关配置
explain-sql-report-type: pretty
explain-type: extended
explain-format: traditional
explain-warn-select-type:
- ""
explain-warn-access-type:
- ALL
explain-max-keys: 3
explain-min-keys: 0
explain-max-rows: 10000
explain-warn-extra:
- ""
explain-max-filtered: 100
explain-warn-scalability:
- O(n)
query: ""
list-heuristic-rules: false
list-test-sqls: false
verbose: true

局限: 部分 sql 优化提示没有, 比如 order by x.age , 这里的 x.age 加索引效率会变高, 但soar 没有提示, 也可能考虑的是收益问题, 加上索引速度是快了一点, 但没快很多, 暂时发现这个问题

关于mysql索引分析和优化(二)

前言

数据量