GaussDB-算子级调优GaussDB-算子级调优算子级调优介绍一个查询语句要经过多个算子步骤才会输出最终的结果。

GaussDB-算子级调优

算子级调优介绍

一个查询语句要经过多个算子步骤才会输出最终的结果。由于个别算子耗时过长导致整体查询性能下降的情况比较常见。这些算子是整个查询的瓶颈算子。通用的优化手段是EXPLAIN ANALYZE/PERFORMANCE命令查看执行过程的瓶颈算子，然后进行针对性优化。

如下面的执行过程信息中，Hashagg算子的执行时间占总时间的：(51016-13535)/ 56476 ≈66%，此处Hashagg算子就是这个查询的瓶颈算子，在进行性能优化时应当优先考虑此算子的优化。

算子级调优示例

示例1： 基表扫描时，对于点查询或者范围扫描等过滤大量数据的查询，如果使用SeqScan全表扫描会比较耗时，可以在条件列上建立索引选择IndexScan进行索引扫描提升扫描效率。

| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

| ```
gaussdb=#  create index idx on t1(c2); CREATE INDEX gaussdb=#  explain (analyze on,costs off) select * from t1 where c2=10004;  id |             operation             |    A-time     | A-rows | Peak Memory | A-width  ----+-----------------------------------+---------------+--------+-------------+---------   1 | ->  Streaming (type: GATHER)      | 3.206         |      5 | 85KB        |    2 |    ->  Index Scan using idx on t1 | [0.122,0.146] |      5 | [73KB,73KB] |  (2 rows)    Predicate Information (identified by plan id)  -----------------------------------------------    2 --Index Scan using idx on t1          Index Cond: (c2 = 10004) (2 rows) 
``` |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

上述例子中，全表扫描返回5条数据，过滤掉大量数据，在c2列上建立索引后，使用IndexScan扫描效率显著提高，从20毫秒降低到3毫秒。

**示例2：** 如果从执行计划中看，两表join选择了NestLoop，而实际行数比较大时，NestLoop Join可能执行比较慢。如下的例子中NestLoop耗时5秒，如果设置参数enable_mergejoin=off关掉Merge Join，同时设置参数enable_nestloop=off关掉NestLoop，让优化器选择HashJoin，则Join耗时降低至86毫秒。

gaussdb=# explain analyze select count(*) from t2,t1 where t1.c1=t2.c2; id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs ----+--------------------------------------------------+---------------------+----------+--------+-------------+---------+---------+--------- 1 | -> Aggregate | 5070.296 | 1 | 1 | 14KB | | 8 | 2148.49 2 | -> Streaming (type: GATHER) | 5070.219 | 2 | 2 | 81KB | | 8 | 2148.49 3 | -> Aggregate | [4828.705,5062.289] | 2 | 2 | [11KB,11KB] | | 8 | 2148.40 4 | -> Nested Loop (5,6) | [4828.565,5062.142] | 996 | 40 | [4KB,4KB] | | 0 | 2148.34 5 | -> Seq Scan on t1 | [13.574,14.508] | 90007 | 20000 | [15KB,15KB] | | 4 | 184.00 6 | -> Materialize | [1508.956,1579.488] | 22413670 | 20 | [35KB,36KB] | | 4 | 14.37 7 | -> Streaming(type: REDISTRIBUTE) | [55.825,56.842] | 498 | 20 | [44KB,44KB] | | 4 | 14.31 8 | -> Seq Scan on t2 | [0.105,0.132] | 498 | 20 | [13KB,13KB] | | 4 | 13.13 (8 rows)

Predicate Information (identified by plan id)

4 --Nested Loop (5,6) Join Filter: (t2.c2 = t1.c1) Rows Removed by Join Filter: 22412672 (3 rows)


设置参数后：

gaussdb=# set enable_mergejoin=off; SET gaussdb=# set enable_nestloop=off; SET gaussdb=# explain analyze select count(*) from t2,t1 where t1.c1=t2.c2; id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs ----+--------------------------------------------------+-----------------+--------+--------+---------------+---------+---------+--------- 1 | -> Aggregate | 92.911 | 1 | 1 | 14KB | | 8 | 224.45 2 | -> Streaming (type: GATHER) | 92.855 | 2 | 2 | 81KB | | 8 | 224.45 3 | -> Aggregate | [84.295,87.102] | 2 | 2 | [11KB,11KB] | | 8 | 224.36 4 | -> Hash Join (5,6) | [84.171,86.966] | 996 | 40 | [6KB,6KB] | | 0 | 224.30 5 | -> Seq Scan on t1 | [11.885,13.103] | 90007 | 20000 | [15KB,15KB] | | 4 | 184.00 6 | -> Hash | [55.895,56.072] | 498 | 21 | [292KB,292KB] | [20,20] | 4 | 14.31 7 | -> Streaming(type: REDISTRIBUTE) | [55.601,55.771] | 498 | 20 | [44KB,44KB] | | 4 | 14.31 8 | -> Seq Scan on t2 | [0.118,0.143] | 498 | 20 | [13KB,13KB] | | 4 | 13.13 (8 rows)

Predicate Information (identified by plan id)

4 --Hash Join (5,6) Hash Cond: (t1.c1 = t2.c2) (2 rows)


**示例3**：通常情况下Agg选择HashAgg性能较好，如果大结果集选择了Sort+GroupAgg，则需要设置enable_sort=off，HashAgg耗时优于Sort+GroupAgg。

gaussdb=# explain analyze select count(*) from t1 group by c2; id | operation | A-time | A-rows | E-rows | Peak Memory | E-memory | A-width | E-width | E-costs ----+------------------------------------+-----------------+--------+--------+-------------+----------+-----------------+---------+--------- 1 | -> GroupAggregate | 244.817 | 40000 | 5000 | 15KB | | | 12 | 2131.52 2 | -> Sort | 156.344 | 40000 | 10000 | 5603KB | | | 12 | 2131.52 3 | -> Streaming (type: GATHER) | 91.595 | 40000 | 10000 | 82KB | | | 12 | 1442.14 4 | -> GroupAggregate | [90.317,96.852] | 40000 | 10000 | [12KB,12KB] | 16MB | | 12 | 973.39 5 | -> Sort | [59.775,64.724] | 90007 | 20000 | [5MB,5MB] | 16MB | [896220,903920] | 4 | 873.39 6 | -> Seq Scan on t1 | [18.092,21.033] | 90007 | 20000 | [12KB,12KB] | 1MB | | 4 | 184.00 (6 rows)


设置参数后：

gaussdb=# set enable_sort=off; SET gaussdb=# explain analyze select count(*) from t1 group by c2; id | operation | A-time | A-rows | E-rows | Peak Memory | E-memory | A-width | E-width | E-costs ----+---------------------------------+-----------------+--------+--------+-------------+----------+---------+---------+--------- 1 | -> HashAggregate | 228.260 | 40000 | 5000 | 6663KB | | | 12 | 752.75 2 | -> Streaming (type: GATHER) | 95.506 | 40000 | 10000 | 82KB | | | 12 | 752.75 3 | -> HashAggregate | [63.974,71.290] | 40000 | 10000 | [3MB,3MB] | 16MB | [20,20] | 12 | 284.00 4 | -> Seq Scan on t1 | [17.578,21.204] | 90007 | 20000 | [12KB,12KB] | 1MB | | 4 | 184.00 (4 rows)


更多详情请参考GaussDB 文档中心：<https://doc.hcs.huawei.com/db/zh-cn/gaussdbqlh/24.1.30/productdesc/qlh_03_0001.html>