ClickHouse的GLOBAL RIGHT JOIN大坑分析案例分享

180 阅读8分钟

ClickHouse的GLOBAL RIGHT JOIN大坑分析案例分享

转载自微信公众号:mp.weixin.qq.com/s/ro5qaM8sl…

1.背景

最近收到反馈在我们数据平台--洞察的统计/看板的一些表中偶尔会出现内存不足的报错,如下:

image2024-11-14_10-57-59

类似的,在分层管理/任务管理中也有部分任务会因为内存不足而失败.

2.分析

首先查看该表/任务的sql, 经排查发现是由于主sql中,有一段子查询的内存占用特别高,在内存紧张的情况下往往会失败,该子查询如下:

with voided as (
    select if(t.region_t is not null, t.region_t , 'all') as region_tmp,
           if( t.countrycode_t is not null,  t.countrycode_t , 'all') as countrycode_tmp,
           if( t.app_id is not null , toString( t.app_id) , 'all') as app_id,
           count(distinct user_id) as voided_user_cnt,
           sum(amount) as voided_amount,
           if(t.dt is not null,toString(t.dt),'all') as dt_tmp,
           if(t.pay_method is not null,toString(t.pay_method),'all') as pay_method_tmp
    from(select toNullable(if(mapping.region is not null, mapping.region, '其它')) as region_t,
                toNullable(if(account.CountryCode is not null, account.CountryCode, '其它')) as countrycode_t,
                toNullable(case when goods.device_type = '2' then 10018
                                when goods.device_type = '3' then 10019
                                when goods.device_type = '4' then 0
                                else 2
                    end) as app_id,
                voided.user_id as user_id,
                voided.amount as amount
                 ,toNullable(voided.dt) as dt
                 ,toNullable(voided.paymethod) as pay_method
         from (select user_id,order_id,amount,toDate(create_at) as dt,paymethod
               from sg_chikii.sv_usr_voidedOrder_distributed
               WHERE create_at BETWEEN toDateTime(toStartOfMonth(toDateTime('$begin_time$'))) AND date_sub(date_add(toDateTime(toStartOfMonth(addMonths(toDateTime('$begin_time$'), 1))-1), interval 1 day ), interval 1 second )
                  ) voided
                  global left join sg_chikii.sv_usr_account_distributed account on voided.user_id = toString(account.Id)
             global  left join sg_chikii.sv_usr_dimCountryMappingTbl_distributed mapping on account.CountryCode = mapping.code
             global   left join sg_chikii.sv_usr_goodsOrder_distributed goods on voided.order_id = goods.trade_no
            ) t
    group by t.region_t, t.countrycode_t, t.app_id,t.dt,t.pay_method
    with cube
    having dt_tmp != 'all'
)

内存占用如下: image2024-11-14_11-8-41

一共占用了10G,我们当前的ck一共也就16G,截止到当前的ck配置上的内存限制为14G,怪不得会执行失败

3.解决方案

现在知道出问题的sql,那么就有两个解决方案了:

第一个很简单,优化不行就生配置呗

第二个就是优化sql了。接下来就重点讲第二个方案。

既然是内存占用过高,那么就肯定是在内存中保存的数据太大了,看一下出问题的sql设计到的数据量:

表名/条数

sv_usr_voidedOrder_distributed : 288 sv_usr_dimCountryMappingTbl_distributed : 265 sv_usr_account_distributed : 53912447 sv_usr_goodsOrder_distributed : 61966255

可以看到account,goodsOrder表都是大表,难道ck把几个表都全量加载进内存了吗?先来看一下经过sql优化器优化过后的sql(使用ck的explain syntax):

SELECT
    if(isNotNull(region_t), region_t, 'all') AS region_tmp,
    if(isNotNull(countrycode_t), countrycode_t, 'all') AS countrycode_tmp,
    if(isNotNull(app_id), toString(app_id), 'all') AS app_id,
    uniqExact(user_id) AS voided_user_cnt,
    sum(amount) AS voided_amount,
    if(isNotNull(pay_method), toString(pay_method), 'all') AS pay_method_tmp
FROM
(
    SELECT
        toNullable(if(isNotNull(region), region, '其它')) AS region_t,
        toNullable(if(isNotNull(CountryCode), CountryCode, '其它')) AS countrycode_t,
        toNullable(multiIf(device_type = '2', 10018, device_type = '3', 10019, device_type = '4', 0, 2)) AS app_id,
        `--voided.user_id` AS user_id,
        `--voided.amount` AS amount,
        toNullable(`--voided.paymethod`) AS pay_method
    FROM
    (
        SELECT
            `--voided.order_id`,
            `--voided.paymethod`,
            `--voided.dt`,
            `--voided.amount`,
            `--voided.user_id`,
            Id,
            CountryCode,
            code,
            region
        FROM
        (
            SELECT
                order_id AS `--voided.order_id`,
                paymethod AS `--voided.paymethod`,
                dt AS `--voided.dt`,
                amount AS `--voided.amount`,
                user_id AS `--voided.user_id`,
                Id,
                CountryCode
            FROM
            (
                SELECT
                    user_id,
                    order_id,
                    amount,
                    toDate(create_at) AS dt,
                    paymethod
                FROM sg_chikii.sv_usr_voidedOrder_distributed
                WHERE (create_at >= toDateTime('2024-11-04')) AND (create_at <= toDateTime('2024-11-04'))
            ) AS voided
            GLOBAL ALL LEFT JOIN sg_chikii.sv_usr_account_distributed AS account ON `--voided.user_id` = toString(Id)
        ) AS `--.s`
        GLOBAL ALL LEFT JOIN sg_chikii.sv_usr_dimCountryMappingTbl_distributed AS mapping ON CountryCode = code
    ) AS `--.s`
    GLOBAL ALL LEFT JOIN sg_chikii.sv_usr_goodsOrder_distributed AS goods ON `--voided.order_id` = trade_no
) AS t
GROUP BY
    region_t,
    countrycode_t,
    app_id,
    pay_method
    WITH CUBE
HAVING dt_tmp != 'all'

明显的,我们的源sql被优化成了多个主查询-子查询的嵌套,同时还做了3个关键操作 global left join。

先来分析一下global关键字,在此贴上gpt的解释: image2024-11-14_11-37-14

再来看一下上面的子查询有哪些, 对于第一个主查询-子查询的嵌套:

第一个:sg_chikii.sv_usr_goodsOrder_distributed 第二个:sg_chikii.sv_usr_dimCountryMappingTbl_distributed 第三个:sg_chikii.sv_usr_account_distributed

是的,对于上面被sql优化器优化过后的sql来说,子查询就是这3个表的全量数据,这情况内存不高才有鬼了。

既然知道原因了,那么我们调换一下主查询和子查询不就行了?直接把A global left join B 变成 B global right join A。来试一下(由于sv_usr_dimCountryMappingTbl_distributed数据量比较小就不转换这个了):

select if(region_t1 is not NULL,toString(region_t1), 'all') as region,
       if(countrycode_t1 is not null,toString(countrycode_t1), 'all') as countrycode,
       if(app_id_t1 is not NULL, toString(app_id_t1),'all') as app_id,
       count(distinct user_id_t1) as voided_user_cnt,
       sum(amount_t1) as voided_amount,
       if(t.pay_method_t1 is not null,toString(t.pay_method_t1),'all') as pay_method_tmp
from(
​
        select toNullable(if(mapping.region is not null, mapping.region, '其它')) as region_t1,
               toNullable(tmp_1.country_code_t) as countrycode_t1,
               toNullable(case when device_type_t = '2' then 10018
                               when device_type_t = '3' then 10019
                               when device_type_t = '4' then 0
                               else 2
                   end) as  app_id_t1,
               user_id_t as user_id_t1,
               amount_t as amount_t1,
               toNullable(pay_method_t) as pay_method_t1
        from (
                 select DISTINCT
                     toNullable(voided_t.country_code) as country_code_t,
                     toNullable(voided_t.paymethod)  as pay_method_t,
                     goods.device_type                           as device_type_t,
                     if(voided_t.paymethod = 9, voided_t.amount / 120 * 100, toFloat64(voided_t.amount))                  as amount_t,
                     toNullable(voided_t.dt) as dt_t,
                     goods.trade_no as trade_no_t,
                     voided_t.order_id as order_id_t,
                     voided_t.user_id as user_id_t
                 from sg_chikii.sv_usr_goodsOrder_distributed goods
                          global right join
                      (
                          select
                              voided.user_id as user_id,
                              voided.order_id as order_id,
                              voided.amount as amount,
                              voided.paymethod as paymethod,
                              if(account.CountryCode = '' or account.CountryCode is null, 'other', account.CountryCode) as country_code,
                              voided.dt as dt
                          from sg_chikii.sv_usr_account_distributed account
                                   global right join (
                              select user_id,order_id,amount,paymethod,toDate(create_at) as dt
                              from sg_chikii.sv_usr_voidedOrder_distributed
                              WHERE create_at between '2024-11-04' and '2024-11-04'
                              ) voided on voided.user_id = toString(account.Id)
                          ) voided_t on voided_t.order_id = goods.trade_no
                 )tmp_1
                 GLOBAL LEFT JOIN sg_chikii.sv_usr_dimCountryMappingTbl_distributed mapping on tmp_1.country_code_t = mapping.code
        ) t
group by t.region_t1, t.countrycode_t1, t.app_id_t1,t.pay_method_t1
with cube

内存占用:

image2024-11-14_11-48-56

33453606 / 1024 / 1024 / 1024 = 0.031G,直接给他干到了0.031G

接下来只要数据正确就完美了: 原数据:

新sql的数据: image2024-11-14_14-6-56

可以看到数据对不上了,后续查明是 global 加 right join 的问题,这个等下再讲,先贴一下在不能使用global right join 的情况下的解决方案:

t1 left join t2的语义返回左表(t1)中的所有记录以及右表(t2)中满足查询条件的记录,如果右表中没有满足条件的记录,则置null

所以

select t1.id as id1
        ,t2.id as id2
from t1 left join t2 on t1.id = t2.id

可以改写成:

select id1, id2
from (
         select t1.id as id1
              ,t2.id as id2
         from t1 inner join t2 on t1.id = t2.id
    union all
        select t1.id as id1
                ,null as id2
        from t1
        where t1.id  not in (
            select id
            from t2
            where t2.id in (
                select t1.id
                from t1
                )
        )
     )

最终编写出的sql如下(数据已验证正确):

with voided as (
    select user_id,order_id,amount,paymethod,toDate(create_at) as dt
    from sg_chikii.sv_usr_voidedOrder_distributed
    WHERE create_at between  '$begin_time$' and '$end_time$'
),voided_tmp as (
    select
        v1.user_id as user_id,
        v1.order_id as order_id,
        v1.amount as amount,
        v1.paymethod as paymethod,
        if(account.CountryCode = '' or account.CountryCode is null, 'other', account.CountryCode) as country_code
    from sg_chikii.sv_usr_account_distributed account
             global inner join voided v1 on v1.user_id = toString(account.Id)
​
    union all
    select
        v1.user_id as user_id,
        v1.order_id as order_id,
        v1.amount as amount,
        v1.paymethod as paymethod,
        null
    from  voided v1
    where v1.user_id global not in (
        select toString(account.Id) as id
        from sg_chikii.sv_usr_account_distributed account
        where toString(account.Id) global in (
            select user_id from voided
        )
    )
​
),tot_t as (
    select toNullable(if(mapping.region is not null, mapping.region, '其它')) as region_t1,
           toNullable(tt.country_code_t) as countrycode_t1,
           toNullable(case when device_type_t = '2' then 10018
                           when device_type_t = '3' then 10019
                           when device_type_t = '4' then 0
                           else 2
               end) as  app_id_t1,
           user_id_t as user_id_t1,
           amount_t as amount_t1,
           toNullable(pay_method_t) as pay_method_t1
    from (
             select DISTINCT
                 toNullable(voided_t.country_code) as country_code_t,
                 toNullable(voided_t.paymethod)  as pay_method_t,
                 goods.device_type                           as device_type_t,
                 if(voided_t.paymethod = 9, voided_t.amount / 120 * 100, toFloat64(voided_t.amount))                  as amount_t,
                 goods.trade_no as trade_no_t,
                 voided_t.order_id as order_id_t,
                 voided_t.user_id as user_id_t
             from sg_chikii.sv_usr_goodsOrder_distributed goods
                      global inner join voided_tmp voided_t  on voided_t.order_id = goods.trade_no
​
             union all
             select
                 toNullable(voided_t.country_code) as country_code_t,
                 toNullable(voided_t.paymethod)  as pay_method_t,
                 null                           as device_type_t,
                 if(voided_t.paymethod = 9, voided_t.amount / 120 * 100, toFloat64(voided_t.amount))                  as amount_t,
                 null as trade_no_t,
                 voided_t.order_id as order_id_t,
                 voided_t.user_id as user_id_t
             from voided_tmp voided_t
             where voided_t.order_id global not in (
                 select trade_no
                 from sg_chikii.sv_usr_goodsOrder_distributed
                 where trade_no global in (
                     select order_id from voided_tmp
                 )
             )
             ) tt
             GLOBAL LEFT JOIN sg_chikii.sv_usr_dimCountryMappingTbl_distributed mapping on tt.country_code_t = mapping.code
)
​
select if(region_t1 is not NULL,toString(region_t1), 'all') as region,
       if(countrycode_t1 is not null,toString(countrycode_t1), 'all') as countrycode,
       if(app_id_t1 is not NULL, toString(app_id_t1),'all') as app_id,
       count(distinct user_id_t1) as voided_user_cnt,
       sum(amount_t1) as voided_amount,
       if(pay_method_t1 is not null,toString(pay_method_t1),'all') as pay_method_tmp
from tot_t t
group by t.region_t1, t.countrycode_t1, t.app_id_t1,t.pay_method_t1
with cube
order by region, countrycode, app_id, pay_method_tmp

占用内存: image2024-11-14_14-25-30

512429025 / 1024 / 1024 /1024 = 0.477G,大概0.5g,完全可以接受了

4. t1 global left join t2 != t2 global right join t1

现在再来说说为什么不能使用global right join。(其实如果ck集群只有一个节点的话是完全没有问题的)

前面说到GLOBAL关键字会让ck执行以下步骤: 解析查询->执行子查询->广播结果→结果聚合

问题就出在“执行子查询”和“结果聚合”这两步上。

首先我们都知道在使用分布式方案上的ck是可以分成两种表的:本地表,分布表。本地表只有当前节点的数据,分布表有全部节点的数据。

使用global关键字之后的“执行子查询”这一步的主查询是使用本地表的,现在假设有A,B两个节点,那么当在A节点执行字查询的时候,t2中与t1匹配的记录是在B节点上的,这时候就会返回 t2.null,t1.id这种记录,但是在整体上看真正的数据应该是t2.id,t1.id这样的。

下面是一个例子:

A节点有t1表的{id:1}记录,有t2表的{id=2}记录

B节点有t1表的{id:2}记录,有t2表的{id=1}记录

现在执行

select t1.id as id1
,t2.id as id2
from t2 global right join t1 on t1.id = t2.id

如果不是分布式数据的话,正确应该得到以下数据

{id1= 1,id2 = 1},

{id1=2,id2=2}

那么当ck是分布式的情况呢?

A节点: 由于A节点的主查询既t2表只有{id=2}的记录,所以会得到

{id1=1,id2 = null},

{id1=2,id2 = 2}

B节点:同理,得到:

{id1=1,id2 = 1},

{id1=2,id2 = null}

现在再执行到“结果聚合”的步骤,就会得到4条记录

{id1=1,id2 = null},

{id1=2,id2 = 2},

{id1=1,id2 = 1},

{id1=2,id2 = null}

所以,经过上述例子的分析就明白为什么 t1 global left join t2 != t2 global right join t1 了。