GBase8c sql性能优化之分片键案例一原文链接：https://www.gbase.cn/community

原文链接：www.gbase.cn/community/p…
更多精彩内容尽在南大通用GBase技术社区，南大通用致力于成为用户最信赖的数据库产品供应商。

本案例用来示例如何通过修改分片键来降低资源消耗，提高sql性能。

示例sql：

create table test(col int,id int,name text)
distribute by hash (col);
create table test_1(col int,id int,name varchar(64))
distribute by hash(name);
insert into test select 1,generate_series(1,100000), md5(random()::text);
insert into test select 64,generate_series(1,100000), md5(random()::text);

insert into test_1 select generate_series(1,100000),generate_series(1,100000), md5(random()::text);

执行计划如下：

explain analyze select * from test a join test_1 b on a.col=b.id ; 
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Streaming(type: GATHER)  (cost=13.29..29.25 rows=10 width=194) (actual time=106.033..1529.207 rows=200000 loops=1)
   Spawn on: All datanodes
   ->  Hash Join  (cost=13.29..28.64 rows=20 width=194) (Actual time: never executed)
         Hash Cond: (a.col = b.id)
         ->  Streaming(type: BROADCAST)  (cost=0.00..15.18 rows=40 width=40) (Actual time: never executed)
               Spawn on: All datanodes
               ->  Seq Scan on test a  (cost=0.00..13.13 rows=20 width=40) (Actual time: never executed)
         ->  Hash  (cost=13.13..13.13 rows=21 width=154) (Actual time: never executed)
                Buckets: 0  Batches: 0  Memory Usage: 0kB
               ->  Seq Scan on test_1 b  (cost=0.00..13.13 rows=20 width=154) (Actual time: never executed)
 Total runtime: 1562.160 ms

由于 test 表的分片键为col 字段，test_1表的分片键为name，而关联条件为 a.col=b.id。此时执行计划走的是stream，需要将表test 广播到 all datanodes（所有dn节点），即所有的dn节点都有一份表test，然后每个dn节点根据关联a.col=b.id条件进行关联。所有dn节点完成关联后，返回给上层cn，即 streaming(Gather)。能看到本次执行，数据在dn节点是有交互、互相拉取，产生了额外的网络开销（此点可进行优化）。

优化点：消除dn节点互相拉取数据，网络交互带来的开销。即根据关联条件，将test_1表的分片键设置成 id 字段。

create table test(col int,id int,name text)
distribute by hash (col);
create table test_1(col int,id int,name varchar(64))
distribute by hash(id);
insert into test select 1,generate_series(1,100000), md5(random()::text);
insert into test select 64,generate_series(1,100000), md5(random()::text);

insert into test_1 select generate_series(1,100000),generate_series(1,100000), md5(random()::text);

执行计划如下：

postgres=# explain analyze select * from test a join test_1 b on a.col=b.id; 
                                                      QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
Data Node Scan  (cost=0.00..0.00 rows=1000 width=194) (actual time=36.912..894.167 rows=200000 loops=1)
  Node/s: All datanodes
  Remote query: SELECT a.col, a.id, a.name, b.col, b.id, b.name FROM public.test a JOIN public.test_1 b ON a.col = b.id
  ->  Hash Join  (cost=805.59..238675.59 rows=20775000 width=195)
        Hash Cond: (a.col = b.id)
        ->  Seq Scan on test a  (cost=0.00..3870.00 rows=200000 width=41)
        ->  Hash  (cost=675.75..675.75 rows=20775 width=154)
              ->  Seq Scan on test_1 b  (cost=0.00..675.75 rows=20775 width=154)
Total runtime: 955.638 ms
(9 rows)

能看到此时执行时间由原来的 1562.160 ms 提升到 955.638 ms。从执行计划看到，消除dn节点互相拉取数据，网络交互带来的开销，每个dn节点都只在本dn节点做关联查询，查询完成后返回给上层。

总结：此案例，利用关联条件，调整表的分片键，消除dn节点互相拉取数据，网络交互带来的开销，使其走fsq执行计划。

原文链接：www.gbase.cn/community/p…
更多精彩内容尽在南大通用GBase技术社区，南大通用致力于成为用户最信赖的数据库产品供应商。

GBase8c sql性能优化之分片键 案例一

GBase8c sql性能优化之分片键案例一