GaussDB-案例:改写SQL消除in-clause

35 阅读2分钟

GaussDB-案例:改写SQL消除in-clause

现象描述

in-clause/any-clause是常见的SQL语句约束条件,有时in或any后面的clause都是常量,类似于:

| ``` select count(1) from calc_empfyc_c1_result_tmp_t1 where ls_pid_cusr1 in (‘20120405’, ‘20130405’);

| ------------------------------------------------------------------------------------------------------------- |

或者:

| ```
select  count(1)  from calc_empfyc_c1_result_tmp_t1  where ls_pid_cusr1 in any(‘20120405’, ‘20130405’); 
``` |
| ---------------------------------------------------------------------------------------------------------------- |

但是也有一些如下的特殊用法:

| ```
SELECT  ls_pid_cusr1,COALESCE(max(round((current_date-bthdate)/365)),0) FROM calc_empfyc_c1_result_tmp_t1 t1,p10_md_tmp_t2 t2 WHERE t1.ls_pid_cusr1 = any(values(id),(id15)) GROUP BY ls_pid_cusr1; 
``` |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |

其中:id,id15为p10_md_tmp_t2中的两列,“t1.ls_pid_cusr1 = any(values(id),(id15))”等价于“t1.ls_pid_cusr1 = id or t1.ls_pid_cusr1 = id15” *。*

因此join-condition实质上是一个不等式,这种非等值的join操作必须使用nestloop连接,对应执行计划如下:

![](https://p9-xtjj-sign.byteimg.com/tos-cn-i-73owjymdk6/e3cbbb32f4c7409f8c0a41bea2af3895~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgamVycnl3YW5nMTk4Mw==:q75.awebp?rk3s=f64ab15b&x-expires=1772005446&x-signature=021kOHmgRaL8WsZnQEuMMrg3KnE%3D)

#### 优化说明

测试发现由于两表结果集过大,导致nestloop耗时过长,超过一小时未返回结果,因此性能优化的关键是消除nestloop,让join使用更高效的hashjoin来连接。从语义等价的角度消除any-clause,SQL改写如下:

| ```
select ls_pid_cusr1,COALESCE(max(round(ym/365)),0) from (          (                    SELECT                              ls_pid_cusr1,(current_date-bthdate) as ym                    FROM calc_empfyc_c1_result_tmp_t1 t1,p10_md_tmp_t2 t2                    WHERE t1.ls_pid_cusr1 = t2.id and t1.ls_pid_cusr1 != t2.id15          )          union all          (                    SELECT                              ls_pid_cusr1,(current_date-bthdate) as ym                    FROM calc_empfyc_c1_result_tmp_t1 t1,p10_md_tmp_t2 t2                    WHERE t1.ls_pid_cusr1 = id15          ) ) GROUP BY ls_pid_cusr1; 
``` |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

优化后的SQL查询由两个等值join的子查询构成,而每个子查询都可以使用更适合此场景的hashjoin。优化后的执行计划如下:

![](https://p9-xtjj-sign.byteimg.com/tos-cn-i-73owjymdk6/338d2e1e104e4dfba2a9108ee15da137~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgamVycnl3YW5nMTk4Mw==:q75.awebp?rk3s=f64ab15b&x-expires=1772005446&x-signature=EGr%2BDAn8dLXN4B0NQxTSDSYlpHo%3D)

优化后,从超过1个小时未返回结果优化到7s返回结果。

更多详情请参考GaussDB 文档中心:<https://doc.hcs.huawei.com/db/zh-cn/gaussdbqlh/24.1.30/productdesc/qlh_03_0001.html>