一、ByConity 介绍
ByConity 是字节跳动开源的新一代云原生数据仓库,基于Clickhouse 进行优化和扩展,采用了存算分离架构,
成熟的OLAP技术,资源弹性调整与隔离、数据查询写入、数据一致性、查询调度 方面均大幅提升。
在交互式分析、实时数据看板、实时数仓的等查询场景表现优异。
更是推出了BSP模式来应对大数据量,复杂业务场景的数据计算需求。
技术架构:
ByConity 大体上可以分为 3 层:服务接入层,计算层 和 存储层。
服务接入层:将用户的查询解析,生成高效执行计划,选取合适的计算节点下发计算任务,支持水平扩展。
计算层:计算层由一个或者多个计算组构成,计算组之间物理隔离,计算组由多个worker 可以动态扩缩容。
存储层:存储层由元数据和数据构成,元数据存在分布式kv中,数据存储在分布式文件系统HDFS 或 S3中,
与计算节点分离,可以降低计算节点负载时的对存储的影响。ByConity 采用列式存储格式,减少不必要的数据 IO 提高查询性能,并对数据进行高效压缩,降低存储成本。除此之外,对于连续存储的列式数据,ByConity 通过向量化执行技术,进一步提升查询性能。
功能特性:
-
弹性扩缩容与读写分离:ByConity 采用存储计算分离的架构,可以按业务场景进行动态扩缩容,成本可以控。
-
多租户隔离与资源共享:用户可以为查询 SQL 指定计算组,可以对不同业务进行针对性配置资源。提升了计算组的资源利用率。
-
查询优化器 与查询调度:具有 基于成本、基于规则与基于数据依赖的优化能力。基于策励与资源利用率的调度。
-
数据湖与ELT场景支持: 支持通过外部形式访问Hive 与 Hudi。对常规ELT业务场景做了支持。
-
与Clickhouse 的 SQL 兼容: 可以不修改业务SQL就可以迁移过来。
部署可以参考:
了解ByConity 的MPP模式与BSP模式:
我们今天主要对ByConity 的ELT场景进行一个简单的测试。
ByConity 在ELT场景推出了BSP模式,可以参考:ByConity 技术详解之 ELT
简单了解一下MPP与BSP
MPP 架构:
-
是一种分布式的计算架构,它将任务分配为多个子任务,并行地分配到多个独立的节点上进行并行处理,各节点独立处理完成后,汇总到一个或多节点上。适用于处理大规模的数据密集型任务,如海量数据的分析、复杂的 SQL 查询等
-
MPP优点:
- 水平扩张能力强,并发处理能力强。
-
MPP缺点:
-
单节点硬件资源的上限会影响查询数据量。
-
单节点执行慢会影响整体任务的执行。
-
出现问题,要从头开始执行。
-
BSP 架构:
-
是一个基于整体同步并行计算模型,它将任务拆分为DAG图任务,采用多阶段执行查询,并且进行阶段调度,只有上一个阶段全部执行完,才能执行下一个阶段。
-
BSP优点:
- 可以用比较小的资源处理更大数据量的计算(处理大于机器内存的数据),
- 具有task 级别的容错,可以重用中间落盘的计算数据,减少试错成本。
- 执行更加稳定,支持长时间任务。
-
缺点:
- 相比MPP具有通信开销,任务等待同步开销。
- 小任务中间开销大。
如何开启BSP模式:
执行SQL时候,在SQL尾部添加
- bsp_mode = 1,来打开BSP模式,
- distributed_max_parallel_size 可以设置算子并行度,并行度设置越大,单个算子需要的内存资源就越小,大家可以自行根据自己的需求合理设置。
SETTINGS bsp_mode = 1,
distributed_max_parallel_size = 20
ByConity 使用 ELT 相关参数:
在 query settings 中通过以下参数使用 ELT 能力:
二、常规业务场景下 MPP模式与BSP模式的测试
测试集群配置:
| 服务名 | 资源 | 服务名 | 资源 |
|---|---|---|---|
| Worker: | 4 * 8core 64G | Server: | 1 * 16core 64G |
| Daemon Manager: | 1 * 4core 16G | Resource Manager: | 1 * 4core 16G |
| TSO: | 1 * 4core 4G | FoundationDB | 3 * 4core 16G |
| CPU 型号: | Intel(R) Xeon(R) Platinum 8457C | ||
| ByConity版本: | v1.0.1 hotfi |
测试数据集:
| 表名 | 表描述 | 数据条数 |
|---|---|---|
| test_base_fast | 事件表详情表 | 145亿条 |
| common_base_tag_info | 用户画像/标签表 | 1289万条 |
连接到测试集群:
1、连接到ByConity
clickhouse client --port 9000 --host=xxx --user=xxx --password=xxx;
测试SQL类型:
测试SQL比较长,大家可以直接跳到后面看测试结果。
1、留存查询
SQL语句【大数据量下的查询】:
WITH RankedSales AS
(
SELECT
pd_device_id,
gameid,
SUM(playtime) AS total_playtime,
pd_is_new_user,
ck_date,
pd_kind
FROM test.test_base_fast
WHERE pd_kind NOT IN ('pandora_http_biz_code', 'mwc_performance', 'pandora_http_request_error', 'ts_action_click', 'app_detail')
GROUP BY
pd_device_id,
gameid,
pd_is_new_user,
ck_date,
pd_kind
)
SELECT
sum(retention_result[12]) AS retention_result_12,
sum(retention_result[11]) AS retention_result_11,
sum(retention_result[10]) AS retention_result_10,
sum(retention_result[9]) AS retention_result_9,
sum(retention_result[8]) AS retention_result_8,
sum(retention_result[7]) AS retention_result_7,
sum(retention_result[6]) AS retention_result_6,
sum(retention_result[5]) AS retention_result_5,
sum(retention_result[4]) AS retention_result_4,
sum(retention_result[3]) AS retention_result_3,
sum(retention_result[2]) AS retention_result_2,
sum(retention_result[1]) AS retention_result_1,
pd_is_new_user
FROM
(
SELECT
pd_device_id,
pd_is_new_user,
retention((pd_kind = 'play_game') AND (ck_date = '2024-12-06'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-07'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-08'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-09'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-10'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-11'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-12'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-13'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-14'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-15'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-16'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-17')) AS retention_result
FROM
(
SELECT DISTINCT
pd_kind,
pd_device_id,
ck_date,
pd_is_new_user
FROM RankedSales AS pd_tmp_table
)
GROUP BY
pd_device_id,
pd_is_new_user
)
WHERE (retention_result[1]) = 1
GROUP BY pd_is_new_user
SETTINGS bsp_mode = 1, distributed_max_parallel_size = 20
SQL语句【小数据量下的查询】:
WITH RankedSales AS
(
SELECT
pd_device_id,
gameid,
SUM(playtime) AS total_playtime,
pd_is_new_user,
ck_date,
pd_kind
FROM test.test_base_fast
WHERE pd_kind NOT IN ('pandora_http_biz_code', 'mwc_performance', 'pandora_http_request_error', 'ts_action_click', 'app_detail')
GROUP BY
pd_device_id,
gameid,
pd_is_new_user,
ck_date,
pd_kind
)
SELECT
sum(retention_result[12]) AS retention_result_12,
sum(retention_result[11]) AS retention_result_11,
sum(retention_result[10]) AS retention_result_10,
sum(retention_result[9]) AS retention_result_9,
sum(retention_result[8]) AS retention_result_8,
sum(retention_result[7]) AS retention_result_7,
sum(retention_result[6]) AS retention_result_6,
sum(retention_result[5]) AS retention_result_5,
sum(retention_result[4]) AS retention_result_4,
sum(retention_result[3]) AS retention_result_3,
sum(retention_result[2]) AS retention_result_2,
sum(retention_result[1]) AS retention_result_1,
pd_is_new_user
FROM
(
SELECT
pd_device_id,
pd_is_new_user,
retention((pd_kind = 'play_game') AND (ck_date = '2024-12-06'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-07'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-08'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-09'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-10'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-11'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-12'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-13'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-14'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-15'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-16'), (pd_kind = 'mwc_performance') AND (ck_date = '2024-12-17')) AS retention_result
FROM
(
SELECT DISTINCT
pd_kind,
pd_device_id,
ck_date,
pd_is_new_user
FROM RankedSales AS pd_tmp_table
)
-- 这里我们进行过滤,将 需要用的 play_game 与 mwc_performance 过滤出来,再进行计算
WHERE pd_kind in ('play_game', 'mwc_performance')
GROUP BY
pd_device_id,
pd_is_new_user
)
WHERE (retention_result[1]) = 1
GROUP BY pd_is_new_user
SETTINGS bsp_mode = 1, distributed_max_parallel_size = 4
2、转化查询
SQL语句【大数据量下的查询】:
SELECT
sum(multiIf(ck_pd_level >= 1, 1, 0)) AS ck_pd_level_number_1,
sum(multiIf(ck_pd_level >= 2, 1, 0)) AS ck_pd_level_number_2,
sum(multiIf(ck_pd_level >= 3, 1, 0)) AS ck_pd_level_number_3,
sum(multiIf(ck_pd_level >= 4, 1, 0)) AS ck_pd_level_number_4,
sum(multiIf(ck_pd_level >= 5, 1, 0)) AS ck_pd_level_number_5
FROM
(
SELECT
pd_device_id,
windowFunnel(3600)(toTimeZone(ck_datetime_timestamp, 'Asia/Shanghai'),
kind = 'app_time', kind = 'app_detail', kind = 'play_game',
kind = 'mwc_performance', kind = 'mwc_performance') AS ck_pd_level
FROM test.test_base_fast
WHERE pd_device_id IS NOT NULL
GROUP BY pd_device_id
)
SETTINGS bsp_mode = 1, distributed_max_parallel_size = 8
SQL语句【小数据量下的查询】:
SELECT
sum(multiIf(ck_pd_level >= 1, 1, 0)) AS ck_pd_level_number_1,
sum(multiIf(ck_pd_level >= 2, 1, 0)) AS ck_pd_level_number_2,
sum(multiIf(ck_pd_level >= 3, 1, 0)) AS ck_pd_level_number_3
FROM
(
SELECT
pd_device_id,
windowFunnel(3600)(toTimeZone(ck_datetime_timestamp, 'Asia/Shanghai'),
kind = 'app_time', kind = 'app_detail', kind = 'play_game') AS ck_pd_level
FROM test.test_base_fast
WHERE (pd_device_id IS NOT NULL) AND (kind IN ('app_time', 'app_detail', 'play_game')) AND (ck_date IN ('2024-12-14'))
GROUP BY pd_device_id
)
SETTINGS bsp_mode = 1,
distributed_max_parallel_size = 20;
3、计算圈选用户的指标
SQL语句【大数据量下的查询】:
WITH
tab_info AS
(
SELECT
tag_str_val,
groupArray(tag_id) AS tag_array
FROM test.common_base_tag_info
WHERE tag_str_val != ''
GROUP BY tag_str_val
),
kind_table AS
(
SELECT
pd_uid,
playtime,
kind
FROM test.test_base_fast
LEFT JOIN tab_info ON tab_info.tag_str_val = test.test_base_fast.pd_uid
),
palytime_table AS
(
SELECT
pd_uid,
sum(playtime) AS play_time_sum,
countDistinct(pd_uid),
play_time_sum / count(1) AS avg_time
FROM kind_table
WHERE kind IN ('app_time', 'play_game', 'mwc_performance')
GROUP BY pd_uid
HAVING play_time_sum > 200000
)
SELECT
sum(play_time_sum)/1000 AS plays_time_sum,
countDistinct(pd_uid) AS pd_uid_count,
plays_time_sum / countDistinct(pd_uid)/1000 AS avg_play_time
FROM palytime_table
SETTINGS bsp_mode = 1, distributed_max_parallel_size = 20
SQL语句【小数据量下的查询】:
WITH
tab_info AS
(
SELECT
tag_str_val,
groupArray(tag_id) AS tag_array
FROM test.common_base_tag_info
WHERE tag_str_val != ''
GROUP BY tag_str_val
),
kind_table AS
(
SELECT
pd_uid,
playtime,
kind
FROM test.test_base_fast
LEFT JOIN tab_info ON tab_info.tag_str_val = test.test_base_fast.pd_uid
),
palytime_table AS
(
SELECT
pd_uid,
sum(playtime) AS play_time_sum,
countDistinct(pd_uid),
play_time_sum / count(1) AS avg_time
FROM kind_table
WHERE kind IN ('app_time', 'play_game', 'mwc_performance')
and ck_date IN ('2024-12-14')
GROUP BY pd_uid
HAVING play_time_sum > 200000
)
SELECT
sum(play_time_sum)/1000 AS plays_time_sum,
countDistinct(pd_uid) AS pd_uid_count,
plays_time_sum / countDistinct(pd_uid)/1000 AS avg_play_time
FROM palytime_table
SETTINGS bsp_mode = 1, distributed_max_parallel_size = 20
4、大量数据聚合并插入新表中
SQL语句【大数据量下的查询】:
(5亿条数据19个列做分组聚合, 聚合结果为2.3亿条)
INSERT INTO test.test_common_base_realtime_flight_data (ck_date, pd_kind, pd_device_id, pd_app_version_code, pd_is_new_user, pd_biz_fresh_install, show_categoryid, channelid, launchtype, isfirstplay, crashtype, pagename, gameid, ck_pandora_new_ab_group_array, ck_pandora_ab_group_array, ck_pandora_switch_ab_group_array, playtime, show_param1, pv, ssp_unit_id, icon_type, reqid, result, rechargequota, voucherquota, price, ssp_price, ssp_unit_id_ecpm, ssp_233_income, realtime_ecpm_ratio, duration, gpackagename)
SELECT
ck_date,
pd_kind,
pd_device_id,
pd_app_version_code,
pd_is_new_user,
pd_biz_fresh_install,
show_categoryid,
channelid,
launchtype,
isfirstplay,
crashtype,
pagename,
arrayDistinct(groupArray(ifNull(gameid, 0))) AS gameid,
ck_pandora_new_ab_group_array,
ck_pandora_ab_group_array,
ck_pandora_switch_ab_group_array,
sum(if(playtime IS NOT NULL, if((playtime > 2000) AND (playtime < 21600000), playtime, 0), NULL)) AS playtime,
sum(show_param1) AS show_param1,
count(*) AS pv,
ssp_unit_id,
icon_type,
reqid,
result,
sum(rechargequota) AS rechargequota,
sum(voucherquota) AS voucherquota,
sum(price) AS price,
sum(if(ssp_price_float IS NOT NULL, if((ssp_price_float >= 0) AND (ssp_price_float <= 50000), ssp_price_float, 0), 0)) AS ssp_price,
sum(if(ssp_unit_id_ecpm IS NOT NULL, if((ssp_unit_id_ecpm > 0) AND (ssp_unit_id_ecpm <= 500), ssp_unit_id_ecpm, 0), 0)) AS ssp_unit_id_ecpm,
sum(if(ssp_233_income IS NOT NULL, if((ssp_233_income > 0) AND (ssp_233_income <= 0.5), ssp_233_income, 0), 0)) AS ssp_233_income,
sum(if(realtime_ecpm_ratio IS NOT NULL, if((realtime_ecpm_ratio > 0) AND (realtime_ecpm_ratio < 500), realtime_ecpm_ratio, 0), 0)) AS realtime_ecpm_ratio,
sum(if(duration IS NOT NULL, if((duration > 2000) AND (duration < 21600000), duration, 0), 0)) AS duration,
arrayDistinct(groupArray(ifNull(gpackagename, ''))) AS gpackagename
FROM test.test_base_fast
WHERE (ck_date IN ('2024-12-14')) AND (pd_kind IN ('app_time', 'item_web_click', 'play_game', 'event_start_launch_game', 'crash', 'item_click', 'feed_item_click', 'event_community_article_use_time', 'event_community_main_topic_show', 'event_launch_game_success', 'event_new_ad_show_success', 'event_the_third_pay_success', 'event_ad_free_privilege_pay_success', 'event_le_coin_recharge_succeeded', 'event_intermodal_sdk_le_coin_recharge_result'))
GROUP BY
ck_date,
pd_kind,
pd_device_id,
pd_app_version_code,
pd_is_new_user,
pd_biz_fresh_install,
show_categoryid,
channelid,
launchtype,
isfirstplay,
crashtype,
pagename,
ck_pandora_new_ab_group_array,
ck_pandora_ab_group_array,
ck_pandora_switch_ab_group_array,
ssp_unit_id,
icon_type,
reqid,
result
SETTINGS bsp_mode = 1, distributed_max_parallel_size = 16
测试结果:
1、留存查询
- 大数据量下的查询测试结果:
| 模式 | 执行时间s | 数据处理总条数/总大小 | 每秒处理行/数据大小 |
|---|---|---|---|
| MPP模式 | 内存不足 | ||
| BSP模式20 并发 | 315.338 sec | 7.10 billion rows / 535.00 GB | 22.51 million rows/s / 1.70 GB/s |
BSP模式 资源消耗监控信息
- 小型数据量的查询测试结果:
| 模式 | 执行时间 | 数据处理总行/总大小 | 每秒处理行/数据大小 |
|---|---|---|---|
| MPP模式 | 16.926 sec | 386.15 million rows / 25.20 GB | 22.81 million rows/s / 1.49 GB/s |
| BSP模式20 并发 | 24.508 sec | 514.64 million rows / 33.58 GB | 21.00 million rows/s / 1.37 GB/s |
| BSP模式4 并发 | 17.193 sec | 514.82 million rows / 33.59 GB | 29.94 million rows/s / 1.95 GB/s |
MPP模式 与 BSP模式的资源消耗 (蓝色框 MPP模式,红色框 BSP模式)
- 总结:
1、MPP模式出现内存不足的报错时,可以尝试使用BSP模式。
2、BSP模式可以通过拆分任务,降低对资源的消耗,从而达成小资源跑大任务的目的。
3、BSP模式合理调整任务并行数,也可以解决MPP的计算时间。
4、BSP消耗的内存要小于MPP
2、 转化分析
- 大数据量下的查询测试结果:
| 模式 | 执行时间 | 数据总处理行/大小 | 每秒处理行/数据大小 |
|---|---|---|---|
| MPP模式 | 106.687 sec | 10.96 billion rows / 632.60 GB | 102.70 million rows/s / 5.93 GB/s |
| BSP模式20 并发 | 106.23 sec | 14.62 billion rows / 844.25 GB | 137.65 million rows/s / 7.95 GB/s. |
| BSP模式4 并发 | 104.148 sec. | 14.65 billion rows / 846.02 GB | 140.70 million rows/s / 8.12 GB/s |
MPP模式 与 BSP模式的资源消耗 (蓝色框 MPP模式,红色框 BSP模式)
- 小数据量下的查询测试结果:
| 模式 | 执行时间 | 数据总处理行/ 总大小 | 每秒处理行/数据大小 |
|---|---|---|---|
| MPP模式 | 5.269 se | 1.67 billion rows / 57.42 GB | 316.11 million rows/s / 10.90 GB/s |
| BSP模式20 并发 | 6.448 sec | 2.22 billion rows / 76.57 GB | 344.51 million rows/s / 11.88 GB/s |
| BSP模式4 并发 | 5.349 sec | 2.22 billion rows / 76.57 GB | 415.28 million rows/s / 14.32 GB/s |
MPP模式 与 BSP模式的资源消耗 (蓝色框 MPP模式,红色框 BSP模式)
- 总结:
1、转换分析场景 MPP与BSP的查询速度接近
3、计算圈选用户的指标
- 大数据量下的查询测试结果:
| 模式 | 执行时间 | 数据总处理行/ 总大小 | 每秒处理行/数据大小 |
|---|---|---|---|
| MPP模式 | 64.789 sec | 15.41 billion rows / 620.37 GB | 237.83 million rows/s / 9.58 GB/s |
| BSP模式20 并发 | 117.395 sec. | 15.41 billion rows / 620.63 GB | 131.30 million rows/s / 5.29 GB/s |
| BSP 模式 4 并发 | 64.407 sec | 15.42 billion rows / 620.77 GB | 239.40 million rows/s / 9.64 GB/s |
MPP模式 与 BSP模式的资源消耗 (蓝色框 MPP模式,红色框 BSP模式)
- 总结:
1、并不是任务并发越高越好,合适的并发可以让BSP的执行时间接近MPP
4、大量数据聚合并插入新表中
(5亿条数据19个列做分组聚合, 聚合结果为2.3亿条,并插入另一个表中)
- 大数据量下的查询测试结果:
| 模式 | 执行时间 | 数据总处理行/ 总大小 | 每秒处理行/数据大小 |
|---|---|---|---|
| MPP模式 | 内存不足 | ||
| BSP模式20 并发 | 1699.801 sec | 488.49 million rows / 565.08 GB | 287.38 thousand rows/s / 332.44 MB/s |
| BSP模式40 并发 | 1746.204 | 488.49 million rows / 565.08 GB | 279.74 thousand rows/s / 323.61 MB/s |
BSP模式 20并行任务
- 总结:
1、对于大型复杂的数仓聚合场景,BSP可以完成任务,MPP则无法应付这种场景
三、应用总结:
1、BSP模式更适合大数据量的计算,可以通过分解并行任务,降低单个任务的内存需求,能够有效保障任务执行的稳定性,更适合ELT,数仓等长任务场景。
2、BSP的任务并行度,并不是设置的越高越好,设置的太高,内存占用低,执行时间长。设置的太低,内存不足,无法完成计算,需要多次尝试获取最佳值。
3、MPP 模式相对BSP模式,可以读取更少的数据完成查询,在短时任务的运行上更具优势。通过排序键、分区、索引等手段可以进一步减少需要操作的数据量,达到更快的查询速度。