先抛出问题
区分度低的索引到底会带来哪方面的影响?
DDL
#数据量约700w
CREATE TABLE `order` (
`OrderID` varchar(32) NOT NULL COMMENT '订单编号',
`type` char(1) DEFAULT '0' COMMENT '订单类型:0 流量订单,1 话费订单',
`Caller` varchar(16) NOT NULL COMMENT '手机号码',
`Status` smallint(6) NOT NULL DEFAULT '0' COMMENT '-1下单失败、0初始、1待支付、2支付超时、3支付成功、4充值中、5充值成功、6充值失败 7充值超时 8退款中 9退款失败 10退款成功',
`OrderTime` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '订单下单的时间',
`appid` varchar(32) DEFAULT NULL COMMENT '应用唯一标识',
`bizType` varchar(6) DEFAULT NULL COMMENT '应用下业务标识',
PRIMARY KEY (`OrderID`),
KEY `Index_OrderID` (`OrderID`),
KEY `Index_Status` (`Status`),
KEY `Index_Caller` (`Caller`),
KEY `Index_OrderTime` (`OrderTime`),
KEY `index_Type` (`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='订单表';
慢查询sql
#耗时约25s
SELECT
count( * ) AS col_0_0_
FROM
`order` orderentit0_
WHERE
1 = 1
AND orderentit0_.type = '0'
AND STATUS = '5'
AND OrderTime >= '2017-08-01 16:17:00'
AND OrderTime <= '2017-08-30 16:17:00';
分析
从上面的DDL我们可以看到能利用上的索引字段有:
- type
- status
- orderTime
使用Explain命令来进行分析
可以看到利用了status,type两个字段的索引,使用了索引合并(同时走两个索引,取交集)
这边有几个疑问?
问题一: 为何走了索引还这么慢
show index from order
重点关注Cardinality字段,翻译为基数,可以理解为该索引字段不同结果的总数,用于粗略判断索引区分度(数值越大越好)
可以判断type,status两个字段的索引的区分度十分的低
所以通过这两个索引字段会取出一大堆数据,再一一对orderTime字段进行过滤,效率自然就低了
问题二: 为何走了type,status的索引合并,却不走orderTime的索引
我们使用FORCE INDEX ( Index_OrderTime ) 来强制走orderTime索引
EXPLAIN
SELECT
count( * ) AS col_0_0_
FROM
`order` orderentit0_ FORCE INDEX ( Index_OrderTime )
WHERE
1 = 1
AND orderentit0_.type = '0'
AND STATUS = '5'
AND OrderTime >= '2017-08-01 16:17:00'
AND OrderTime <= '2017-08-30 16:17:00'
;
来看explain分析结果
通过强制走orderTime索引方式,耗时约4s
结合之前的explain结果做个对比
索引 | type | key | rows |
---|---|---|---|
type,status | index_merge | index_type,index_status | 58W |
ordertime | range | index_orderTime | 96W |
可以看到因为sql优化器判断通过走合并index_status和index_type比走index_orderTime索引需要检索的行数少,所以选择了走索引合并
# 实际结果约47w,执行时间0.4s
SELECT
count( * ) AS col_0_0_
FROM
`order` orderentit0_ FORCE INDEX ( Index_OrderTime )
WHERE
1 = 1
AND OrderTime >= '2017-08-01 16:17:00'
AND OrderTime <= '2017-08-30 16:17:00'
# 实际结果约23w,执行时间8s
SELECT
count( * ) AS col_0_0_
FROM
`order` orderentit0_
WHERE
1 = 1
AND orderentit0_.type = '0'
AND STATUS = '5'
至此两个问题解释完毕,鼓掌鼓掌👏
但是又带来第三个问题....
问题三: 为何实际执行时间,与explain分析结果相反
1.索引合并,覆盖
#23w,约3.5s
SELECT count( * ) AS col_0_0_
FROM `order` orderentit0_
WHERE 1 = 1
AND orderentit0_.type = '0'
AND STATUS = '5'
explain:
index_merge Index_Status,index_Type NULL 587752
2.orderTime索引,覆盖
#47w,约0.8s
SELECT
count( * ) AS col_0_0_
FROM
`order` orderentit0_
WHERE
1 = 1
AND OrderTime >= '2017-08-01 16:17:00'
AND OrderTime <= '2017-08-30 16:17:00'
explain:
range Index_OrderTime NULL 963112
3.索引合并,非覆盖
#23w,约22s
SELECT count(appid) AS col_0_0_
FROM `order` orderentit0_
WHERE 1=1
AND orderentit0_.type = '0'
AND STATUS = '5'
explain:
同1
4.orderTime索引,非覆盖
#47w,约4s
SELECT
count( appid ) AS col_0_0_
FROM
`order` orderentit0_
FORCE INDEX ( Index_OrderTime )
WHERE
1 = 1
AND OrderTime >= '2017-08-01 16:17:00'
AND OrderTime <= '2017-08-30 16:17:00'
explain:
同2
5.status索引,覆盖
#37w,约0.3s
SELECT count( * ) AS col_0_0_
FROM `order` orderentit0_
WHERE 1 = 1
AND STATUS = '5'
6.type索引,覆盖
#400w,约1.7s
SELECT count( * ) AS col_0_0_
FROM `order` orderentit0_
WHERE 1 = 1
AND orderentit0_.type = '0'
猜测?:
- 由于type字段索引区分度太低,导致多余的磁盘io时间
- 对orderTime索引的查询类型为range,优先级较低
本人水平有限,在此抛出问题,请各位大神赐教
结论
- 删除type,status,orderTime索引(前二区分度太低),
- 添加orderTime,status,type的联合索引(基于业务需求)
=============2018-04-09==================
对于问题三的找到理论依据了