本文主要分享一下使用kafka完成两个集群之间的存量与增量同步的实践以及遇到的一些问题; 有问题欢迎指正。。。
文章以操作为主: 理论知识大家可以看看大佬分享的 cloud.tencent.com/developer/a…
1. 背景
业务拆分,需要搭建一套新的kafka,把业务转移到新的kafka上面;为了保证用户的数据,需要对存量数据做一个迁移;并且迁移要快,对业务不要造成太大影响;
2. 方案确认
因为kafka 的mirror-maker2 新增了诸多特性,如权限同步,消费者组offset同步,动态同步topic与group信息,同步速度高的情况;所以使用mirror-maker2 来进行存量与增量同步;
在调研测试中发现当前使用的kafka2.7版本并不支持topic迁移后保持完全一样;在kafka3.0 版本的mirror-maker2 增加了支持topic单向不带前缀同步的功能;
3. 使用配置
config/connect-mirror-maker.properties
contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see org.apache.kafka.clients.consumer.ConsumerConfig for more details
# Sample MirrorMaker 2.0 top-level configuration file
# Run with ./bin/connect-mirror-maker.sh connect-mirror-maker.properties
# specify any number of cluster aliases
clusters = A,B
#replication.policy.separator=""
#source.cluster.alias=""
#target.cluster.alias=""
# connection information for each cluster
# This is a comma separated host:port pairs for each cluster
# for e.g. "A_host1:9092, A_host2:9092, A_host3:9092"
A.bootstrap.servers = xxxx:9092
B.bootstrap.servers = yyyy:9092
A.security.protocol=SASL_PLAINTEXT
A.sasl.mechanism=SCRAM-SHA-512
A.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
username="xxx" password="xxx";
B.security.protocol=SASL_PLAINTEXT
B.sasl.mechanism=SCRAM-SHA-512
B.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
username="xxx" password="xxx";
# enable and configure individual replication flows
# 设置同步的流向
A->B.enabled = true
#A.producer.enable.idempotence = true
#B.producer.enable.idempotence = true
# regex which defines which topics gets replicated. For eg "foo-.*"
#A->B.topics = hadoopLogCollection,t_biz_act_mmetric
#设置同步的topic;支持正则
A->B.topics = xxxx,xxxx
#设置排除的topic:支持正则
A->B.topics.exclude= xxxx
#B->A.enabled = true
#B->A.topics = .*
# Setting replication factor of newly created remote topics
replication.factor=3
############################# Internal Topic Settings #############################
# The replication factor for mm2 internal topics "heartbeats", "B.checkpoints.internal" and
# "mm2-offset-syncs.B.internal"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
sync.topic.configs.enabled=true
#同步配置的时间频率
sync.topic.configs.enabled.interval.seconds=60
checkpoints.topic.replication.factor=2
heartbeats.topic.replication.factor=2
offset-syncs.topic.replication.factor=2
#offset-syncs.topic.location = target
#启动同步的Task数量----启用几个线程进行同步
tasks.max = 5
# The replication factor for connect internal topics "mm2-configs.B.internal", "mm2-offsets.B.internal" and
# "mm2-status.B.internal"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offset.storage.replication.factor=2
status.storage.replication.factor=2
config.storage.replication.factor=2
# customize as needed
# replication.policy.separator = _
sync.topic.acls.enabled = true
emit.heartbeats.interval.seconds = 5
#开启topic动态和消费者组 动态同步与同步的周期
refresh.topics.enabled = true
refresh.topics.interval.seconds = 60
refresh.groups.enabled = true
refresh.groups.interval.seconds = 60
# 开始消费者组offset同步;设置同步的周期---注意:仅仅同步idle中的消费者的offset
sync.group.offsets.enabled = true
sync.group.offsets.interval.seconds = 5
#设置同步的topic Name命名规则;3.0版本提供了两种topic同步命名规则,默认会带上前缀,也可以手动不带前缀的----此时不能做双向同步
replication.policy.class = org.apache.kafka.connect.mirror.IdentityReplicationPolicy
4.启动命令:
bin/connect-mirror-maker.sh config/connect-mirror-maker.properties --clusters B
5. 开启运行监控
存量数据较多,同时为了不给集群造成太大压力,同步用户限速到了250MB/s;同步时间就较长,需要监控整体的同步情况; 我使用的jmx_exporter导出metrics搭配promethus与grafana进行监控;
export JMX_PORT="xxxx"
export KAFKA_OPTS="-javaagent:jmx/jmx_prometheus_javaagent-0.12.0.jar=xxxx:jmx/kafka-mm2.yml"
kafka-mm2.yml
lowercaseOutputName: true
rules:
- pattern: 'kafka.connect.mirror<type=MirrorCheckpointConnector, source=(.+), target=(.+), group=(.+), topic=(.+), partition=([0-9+])><>([a-z-]+)'
name: kafka_mirror_checkpointConnector_$6
type: GAUGE
labels:
group: $3
topic: "$4"
partition: "$5"
- pattern: 'kafka.connect.mirror<type=MirrorSourceConnector, target=(.+), topic=(.+), partition=([0-9]+)><>([a-z-]+)'
name: kafka_mirror_sourceConnector_$4
type: GAUGE
labels:
topic: "$2"
partition: "$3"
具体的metrics可以查看官方文档的说明; kafka.apache.org/documentati…
6. 关于用户与权限同步:
mirror-maker2 并不会同步用户过去;这部分需要单独处理---如使用数据库里面的数据做成脚本进行执行;
同时,需要注意 sync.topic.acls.enabled = true 并不会同步写权限和消费者组权限;写权限官方明文说明了防止用户和mm2双写导致异常;(说的是怕被删掉) 这一块需要自行实现:
我的实现如下:
/**
* 批量同步topic的写权限
*
* @param topics topic列表
* @param sourceProp source adminClient的配置
* @param targetProp target adminClient的配置
* @param dryRun 是否测试的跑;默认是true;即不会实际同步权限
*/
private void syncAcl(List<String> topics, HashMap<String, Object> sourceProp, HashMap<String, Object> targetProp, boolean dryRun) {
logger.info("=============================开始同步Topic{}的用户写权限===============================", topics);
AdminClient sourceAdmin = AdminClient.create(sourceProp);
AdminClient targetAdmin = AdminClient.create(targetProp);
try {
Collection<AclBinding> aclBindings = sourceAdmin.describeAcls(new AclBindingFilter(new ResourcePatternFilter(ResourceType.TOPIC, null, PatternType.ANY), AccessControlEntryFilter.ANY))
.values()
.get();
aclBindings.forEach(x -> {
logger.debug("过滤前的acl为:{}", x);
});
List<AclBinding> bindings = aclBindings.stream()
.filter(x -> x.pattern()
.resourceType() == ResourceType.TOPIC)
.filter(x -> x.pattern()
.patternType() == PatternType.LITERAL)
.filter(this::shouldReplicateAcl)
.filter(x -> certainUser(x.pattern()
.name(), topics))
.collect(Collectors.toList());
bindings.forEach(x -> {
logger.info("过滤后的acl为:{}", x);
});
if (!dryRun) {
updateTopicAcls(targetAdmin, bindings);
}
} catch (Exception e) {
logger.error("权限同步失败", e);
}
logger.info("=============================同步Topic{}的所有用户写权限===============================", topics);
}
private void updateTopicAcls(AdminClient targetAdmin, List<AclBinding> bindings) {
logger.trace("Syncing {} topic ACL bindings.", bindings.size());
targetAdmin.createAcls(bindings)
.values()
.forEach((k, v) -> v.whenComplete((x, e) -> {
if (e != null) {
logger.warn("Could not sync ACL of topic {}.", k.pattern()
.name(), e);
}
}));
}
boolean shouldReplicateAcl(AclBinding aclBinding) {
return aclBinding.entry()
.permissionType() == AclPermissionType.ALLOW && aclBinding.entry()
.operation() == AclOperation.WRITE;
}
/**
* 同步小组权限
*
* @param sourceProp source adminClient的配置
* @param targetProp target adminClient的配置
* @param dryRun 是否测试的跑;默认是true;即不会实际同步权限
*/
private void syncGroupAcl(HashMap<String, Object> sourceProp, HashMap<String, Object> targetProp, boolean dryRun) {
logger.info("=============================开始同步非Topic用户权限===============================");
AdminClient sourceAdmin = AdminClient.create(sourceProp);
AdminClient targetAdmin = AdminClient.create(targetProp);
try {
Collection<AclBinding> aclBindings = sourceAdmin.describeAcls(new AclBindingFilter(new ResourcePatternFilter(ResourceType.GROUP, null, PatternType.ANY), AccessControlEntryFilter.ANY))
.values()
.get();
List<AclBinding> bindings = aclBindings.stream()
.filter(x -> x.pattern()
.resourceType() == ResourceType.GROUP)
.filter(x -> x.pattern()
.patternType() == PatternType.LITERAL)
.filter(this::certainUser)
.collect(Collectors.toList());
bindings.forEach(x -> {
logger.info("过滤后的acl为:{}", x);
});
if (!dryRun) {
updateTopicAcls(targetAdmin, bindings);
}
logger.info("=============================同步非Topic级的所有用户写权限,合计{}条===============================", bindings.size());
} catch (Exception e) {
logger.error("权限同步失败", e);
}
}
7. 关于同步完成确认:
怎么确认同步完成了呢?
生产者如何确保自己切换的时候mm2已经把集群一的数据同步完了呢?
因为mm2与mm1 不同;使用的low api,无法直接从consumer lag中抓取到消费进度;
而理论上有一个topic offset sync是会存放上下游的topic 的 offset的,然而实际测试没有数据,因为项目紧张没有深入研究;有知道原因的请指教
只能另辟蹊径;
使用grafana展示了同步延时,同步数据的存放时间;
record-age-ms
通过这个参数基本可以确认到数据写入到被同步的时间; 再搭配一个校验小接口给到用户验证同步是否完成,那就可以让用户自助放心切换了
/**
* 检查源集群和目标集群的最后一条数据是否一致;
*
* @param oldConsumer 原集群消费者;没有group
* @param newConsumer 目标集群消费者;没有group
* @param testTopic 校验的topic;
* @return 检查是否ok
* @throws InterruptedException
*/
private boolean checkLastData(KafkaConsumer<String, String> oldConsumer, KafkaConsumer<String, String> newConsumer, String testTopic) throws InterruptedException {
logger.info("===================开始检查{}两边最后一条offset====================", testTopic);
int count = 0;
do {
try {
count++;
//获取分区信息
List<TopicPartition> topicPartitions = oldConsumer.partitionsFor(testTopic)
.stream()
.map(partitionInfo -> new TopicPartition(partitionInfo.topic(), partitionInfo.partition()))
.collect(Collectors.toList());
oldConsumer.assign(topicPartitions);
newConsumer.assign(topicPartitions);
Map<TopicPartition, Long> OldTopicOffsets = oldConsumer.endOffsets(topicPartitions);
Map<TopicPartition, Long> newTopicOffsets = newConsumer.endOffsets(topicPartitions);
for (Map.Entry<TopicPartition, Long> entry : OldTopicOffsets.entrySet()) {
TopicPartition topicPartition = entry.getKey();
long oldOffset = entry.getValue() - 1;
long newOffset = newTopicOffsets.get(topicPartition) - 1;
oldOffset = oldOffset < 0 ? 0 : oldOffset;
newOffset = newOffset < 0 ? 0 : newOffset;
oldConsumer.seek(topicPartition, oldOffset);
newConsumer.seek(topicPartition, newOffset);
ConsumerRecords<String, String> oldRecords = oldConsumer.poll(Duration.ofSeconds(2L));
ConsumerRecords<String, String> newRecords = newConsumer.poll(Duration.ofSeconds(2L));
List<ConsumerRecord<String, String>> oldRecord = oldRecords.records(topicPartition);
List<ConsumerRecord<String, String>> newRecord = newRecords.records(topicPartition);
if (oldRecord.size() == 0 && newRecord.size() == 0) {
logger.info("{}两个集群都没有数据,继续下一个分区", topicPartition);
continue;
}
if (!(oldRecord.size() > 0 && newRecord.size() > 0)) {
throw new RuntimeException(topicPartition + "数据没有对上,两个集群一个取到数据一个没有取到数据");
}
RecordPojo oldLastRecord = new RecordPojo(oldRecord.get(oldRecord.size() - 1));
RecordPojo newLastRecord = new RecordPojo(newRecord.get(newRecord.size() - 1));
logger.info("{}取出来老集群与新集群最后一条数据为:\n{}\n{}", topicPartition, oldLastRecord, newLastRecord);
if (!oldLastRecord.equals(newLastRecord)) {
throw new RuntimeException(topicPartition + "数据没有对上");
}
}
logger.info("topic{}同步check成功", testTopic);
return true;
} catch (Exception e) {
logger.error("数据对数异常", e);
Thread.sleep(Long.parseLong(configUtil.getValue("sleep.time", "2000")));
}
} while (count < 6);
return false;
}
后来发现还有一个topic mm2-offsets.A.internal存放了connect 的source的offset
# 默认在source集群,通过offset-syncs.topic.location设置所在位置(默认source,可选target) mm2-offset-syncs.B.internal:储存同步过程中的上下游offset;需要解码
# 默认在target集群 mm2-offsets.A.internal:OFFSET_STORAGE_TOPIC_CONFIG:储存connecter的消费offset的;