1.介绍
该表引擎继承自MergeTree,可以使用 AggregatingMergeTree 表来做增量数据统计聚合。如果要按一组规则来合并减少行数,则使用 AggregatingMergeTree 是合适的。AggregatingMergeTree是通过预先定义的聚合函数计算数据并通过二进制的格式存入表内。 是SummingMergeTree的加强版,SummingMergeTree能做的是对非主键列进行sum聚合,而AggregatingMergeTree则可以指定各种聚合函数。
2.建表语句
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]
3.示例
背景: 我们需要通过uid来获取用户的昵称或者等级等属性,而宽表中虽然保存有这些属性,不过在宽表更新时,就不可用了。所以通过抽取宽表的用户属性,新建一张AggregatingMergeTree表,这样的好处就是可以将更新的用户属性以insert的方式达到update的效果。
建表:
CREATE TABLE user_group_test.aggregating_table_test1 on cluster ck_cluster
(
`uid` Int64 COMMENT 'uid',
`nickname` AggregateFunction(argMaxIf, String, DateTime, UInt8) COMMENT '昵称',
`phone` AggregateFunction(argMaxIf, String, DateTime, UInt8) COMMENT '手机号码',
`experience_level` AggregateFunction(argMaxIf, String, DateTime, UInt8) COMMENT '经验等级'
)
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/user_group_test/tables/{shard}/aggregating_table_test1', '{replica}')
ORDER BY uid
SETTINGS index_granularity = 8192;
CREATE TABLE user_group_test.aggregating_table_test1_all on cluster ck_cluster
as user_group_test.aggregating_table_test1
ENGINE = Distributed('ck_cluster', 'user_group_test', 'aggregating_table_test1', intHash64(uid));
使用聚合函数插入数据:
SELECT
uid,
argMaxIfState(nickName,regTime,nickName!='') nickName,
argMaxIfState(phone,regTime,phone!='') phone,
argMaxIfState(level,regTime,level!='') level
FROM default.dwd_user_wide_all where totalDate=yesterday()
group by uid limit 20;
查询:
select uid,argMaxIfMerge(nickname),
argMaxIfMerge(experience_level)
from user_group_test.aggregating_table_test1_all
where uid=123456 group by uid;