背景
在基于文章一的基础上,多余的话我就不多赘述。直接上clickhouse部署过程中的一些问题。
clickhouse 创建kafka引擎表
CREATE TABLE default.kafka_clickhouse_inner_log ON CLUSTER clickhouse_cluster (
log_uuid String ,
date_partition UInt32 ,
event_name String ,
activity_name String ,
activity_type String ,
activity_id UInt16
) ENGINE = Kafka SETTINGS
kafka_broker_list = 'kafka1:9092,kafka2:9092,kafka3:9092',
kafka_topic_list = 'data_clickhouse',
kafka_group_name = 'clickhouse_xxx',
kafka_format = 'JSONEachRow',
kafka_row_delimiter = '\n',
kafka_num_consumers = 1;
问题1:clikhouse客户端无法查询 kafka引擎表:
Direct select is not allowed. To enable use setting stream_like_engine_allow_direct_select
. (QUERY_NOT_ALLOWED) (version 22.5.2.53 (official build))
解决方案:
需要在clickhouse client 创建加上 --stream_like_engine_allow_direct_select 1
clickhouse-client --stream_like_engine_allow_direct_select 1 --password xxxxx
clickhouse 创建本地节点表
问题2: 无法开启本地表 macro
Code: 62. DB::Exception: There was an error on [10.74.244.57:9000]: Code: 62. DB::Exception: No macro 'shard' in config while processing substitutions in '/clickhouse/tables/default/bi_inner_log_local/{shard}' at '50' or macro is not supported here. (SYNTAX_ERROR) (version 22.5.2.53 (official build)). (SYNTAX_ERROR) (version 22.5.2.53 (official build))
创建本地表(使用复制去重表引擎)
create table default.bi_inner_log_local ON CLUSTER clickhouse_cluster (
log_uuid String ,
date_partition UInt32 ,
event_name String ,
activity_name String ,
credits_bring Int16 ,
activity_type String ,
activity_id UInt16
) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/default/bi_inner_log_local/{shard}','{replica}')
PARTITION BY date_partition
ORDER BY (event_name,date_partition,log_uuid)
SETTINGS index_granularity = 8192;
解决方案:
在不同的clickhouse 节点上配置不同的shard,每一个节点的shard名称不能一致。
<macros>
<shard>01</shard>
<replica>example01-01-1</replica>
</macros>
问题3: clickhouse 中节点数据已经存在
Code: 253. DB::Exception: There was an error on : Code: 253. DB::Exception: Replica /clickhouse/tables/default/bi_inner_log_local/01/replicas/example01-01-1 already exists. (REPLICA_IS_ALREADY_EXIST) (version 22.5.2.53 (official build)). (REPLICA_IS_ALREADY_EXIST) (version 22.5.2.53 (official build))
解决方案:
进入zookeeper 客户端删除相关节点,然后再重新创建ReplicatedReplacingMergeTree 表。这样可以保障每一个clickhouse节点都会去消费kafka partition的数据。
clickhouse 创建集群表
创建分布式表(根据log_uuid对数据进行分发,相同的log_uuid会发送到同一个shard分片上,用于后续合并时的数据去重)
CREATE TABLE default.bi_inner_log_all ON CLUSTER clickhouse_cluster AS default.bi_inner_log_local
ENGINE = Distributed(clickhouse_cluster, default, bi_inner_log_local, xxHash32(log_uuid));
问题4: 分布式集群表无法查询
Code: 516. DB::Exception: Received from 10.74.244.57:9000. DB::Exception: default: Authentication failed: password is incorrect or there is no user with such name. (AUTHENTICATION_FAILED) (version 22.5.2.53 (official build))
解决方案:
<!--分布式表配置-->
<remote_servers>
<clickhouse_cluster> <!--集群名称, 可以自定义, 后面在新建库、表的时候需要用到集群名称-->
<shard>
<!--内部复制(默认false), 开启后, 在分布式表引擎下, 数据写入时-->
<!--每个分片只会去寻找一个节点写, 并不是每个都写-->
<internal_replication>true</internal_replication>
<replica>
<host>ip1</host>
<port>9000</port>
<user>default</user>
<password>xxxx</password>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ip2</host>
<port>9000</port>
<user>default</user>
<password>xxxx</password>
</replica>
</shard>
</clickhouse_cluster>
</remote_servers>
clickhouse 创建物化视图
创建物化视图,把Kafka消费表消费的数据同步到ClickHouse 分布式表.
CREATE MATERIALIZED VIEW default.view_bi_inner_log ON CLUSTER clickhouse_cluster TO default.bi_inner_log_all AS
SELECT
log_uuid ,
date_partition ,
event_name ,
activity_name ,
credits_bring ,
activity_type ,
activity_id
FROM default.kafka_clickhouse_inner_log;
总结
功夫不负有心人,解决完以上所有的问题。数据流转通了!本文所有组件都是比较新的版本,所以过程中问题的解决基本都是官方文档或操作手册一步一步的解决。总结一句话:遇到问题去官方文档或--help 去尝试解决。慢慢的你就会升华。