ClickHouse AggregatingMergeTree

1,747

1.介绍 

该表引擎继承自MergeTree,可以使用 AggregatingMergeTree 表来做增量数据统计聚合。如果要按一组规则来合并减少行数,则使用 AggregatingMergeTree 是合适的。AggregatingMergeTree是通过预先定义的聚合函数计算数据并通过二进制的格式存入表内。 是SummingMergeTree的加强版,SummingMergeTree能做的是对非主键列进行sum聚合,而AggregatingMergeTree则可以指定各种聚合函数。 

 2.建表语句

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

3.示例 

背景: 我们需要通过uid来获取用户的昵称或者等级等属性,而宽表中虽然保存有这些属性,不过在宽表更新时,就不可用了。所以通过抽取宽表的用户属性,新建一张AggregatingMergeTree表,这样的好处就是可以将更新的用户属性以insert的方式达到update的效果。

 建表:

CREATE TABLE user_group_test.aggregating_table_test1 on cluster ck_cluster
(
    `uid` Int64 COMMENT 'uid',
    `nickname` AggregateFunction(argMaxIf, String, DateTime, UInt8) COMMENT '昵称',
    `phone` AggregateFunction(argMaxIf, String, DateTime, UInt8) COMMENT '手机号码',
    `experience_level` AggregateFunction(argMaxIf, String, DateTime, UInt8) COMMENT '经验等级'
)
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/user_group_test/tables/{shard}/aggregating_table_test1', '{replica}')
ORDER BY uid
SETTINGS index_granularity = 8192;

CREATE TABLE user_group_test.aggregating_table_test1_all on cluster ck_cluster
as user_group_test.aggregating_table_test1
ENGINE = Distributed('ck_cluster', 'user_group_test', 'aggregating_table_test1', intHash64(uid));

使用聚合函数插入数据:

SELECT
    uid,
    argMaxIfState(nickName,regTime,nickName!='') nickName,
    argMaxIfState(phone,regTime,phone!='') phone,
    argMaxIfState(level,regTime,level!='') level
FROM default.dwd_user_wide_all where totalDate=yesterday()
group by uid limit 20;

查询:

select uid,argMaxIfMerge(nickname),
       argMaxIfMerge(experience_level)
from user_group_test.aggregating_table_test1_all
where uid=123456 group by uid;