1 前言
KafkaConsumer进行轮询对消息进行消费时,大概流程如下:
今天,就来分析下updateAssignmentMetadataIfNeeded方法中的第一步,即调用ConsumerCoordinator类的poll方法,加入消费者群组。
2 源码分析
ConsumerCoordinator类的poll方法,确保了该group的coordinator是已知的,并且这个consumer是已经加入到group中,也用于offset周期性的commit
2.1 流程梳理
public boolean poll(Timer timer) {
// 执行已完成的offset提交请求的回调函数
invokeCompletedOffsetCommitCallbacks();
//如果分区负载算法是自动分配的(Kafka根据消费者个数与分区数动态负载)
//即以subscribe方式订阅的(其他方式见 SubscriptionState )
if (subscriptions.partitionsAutoAssigned()) {
// Always update the heartbeat last poll time so that the heartbeat thread does not leave the
// group proactively due to application inactivity even if (say) the coordinator cannot be found.
// 检查心跳线程是否运行正常,如果心跳线程失败则抛出异常,反之则更新pol调用时间
pollHeartbeat(timer.currentTimeMs());
if (coordinatorUnknown() && !ensureCoordinatorReady(timer)) { // 如果不存在协调器或者协调器已断开连接,返回false,结束本次拉取
return false;
}
//判断是否需要重新加入group,如果订阅的partition变化或者分配的partition变化
if (rejoinNeededOrPending()) {
if (subscriptions.hasPatternSubscription()) { // 判断订阅方式是不是Auto-Pattern
// 返回下一次可以进行更新cluster元数据信息的时间间隔,为0说明当前是可以更新的
if (this.metadata.timeToAllowUpdate(time.milliseconds()) == 0) {
//设置needUpdate为true
this.metadata.requestUpdate();
}
// 判断是否可以刷新元数据
// 1.满足needUpdate属性为true 或者 2.下次更新cluster元数据信息等待时间为0
// 刷新元数据信息,最终调用的是NetworkClient中的poll方法
if (!client.ensureFreshMetadata(timer)) {
return false;
}
}
// 向GroupCoordinator发送请求
if (!ensureActiveGroup(timer)) { //确保group是active的;加入group;分配订阅的partition
return false;
}
}
} else {
if (metadata.updateRequested() && !client.hasReadyNodes(timer.currentTimeMs())) {
client.awaitMetadataUpdate(timer);
}
}
maybeAutoCommitOffsetsAsync(timer.currentTimeMs());
return true;
}
在poll方法中,具体实现,可以分为三个步骤:
1.如果是通过subscribe()方法订阅topic的,并且ConsumerCooridinator是未知的,就在ensureCoordinatorReady()中实现初始化ConsumerCoordinator,主要就是发送GroupCoordinator请求,并且建立连接。
2.通过rejoinNeededOrPending()判断是否需要重新加入group中,通过ensureActiveGroup发送join-group、sync-group请求,加入到group中并且获取其分配的TopicPartition列表(在这一步中,判断如果订阅方式是Auto-Pattern的,还需要强制更新元数据信息)
3.如果是通过assign()方式进行订阅的,则不需要进行Cooridinator相关的过程,只需要更新元数据信息,并且与相应的node连接准备好即可
4.无论是哪一种方式,果设置的是自动commit,如果定时达到自动commit
2.2 ensureCoordinatorReady(),初始化ConsumerCoordinator
该方法的作用就是:选择一个请求连接数最少的broker,向其发送GroupCoordinator请求,并且建立相应的TCP连接。
- 其方法调用的流程为:ensureCoordinatorReady() –> lookupCoordinator() –> sendGroupCoordinatorRequest()
- 如果 client 获取到 Server response,那么就会与 GroupCoordinator 建立连接;
ensureCoordinatorReady():
// 确保coordinator已经准备好,返回true
protected synchronized boolean ensureCoordinatorReady(final Timer timer) {
if (!coordinatorUnknown())
return true;
do {
final RequestFuture<Void> future = lookupCoordinator(); //获取Group Coordinator,并且建立连接
client.poll(future, timer);
if (!future.isDone()) { // 获取过程未完成(超时等),直接跳出循环返回false
// ran out of time
break;
}
if (future.failed()) { // 获取过程失败了
if (future.isRetriable()) {
log.debug("Coordinator discovery failed, refreshing metadata");
client.awaitMetadataUpdate(timer);
} else
throw future.exception();
} else if (coordinator != null && client.isUnavailable(coordinator)) {
// we found the coordinator, but the connection has failed, so mark
// it dead and backoff before retrying discovery
markCoordinatorUnknown();
timer.sleep(retryBackoffMs);
}
} while (coordinatorUnknown() && timer.notExpired()); //在没有ConsumerCoordinator或者时间还没有过期
return !coordinatorUnknown();
}
lookupCoordinator():选择一个连接最小的节点,发送groupCoordinator请求
protected synchronized RequestFuture<Void> lookupCoordinator() {
if (findCoordinatorFuture == null) {
// find a node to ask about the coordinator
Node node = this.client.leastLoadedNode(); //选择一个连接请求数最少的节点
if (node == null) {
log.debug("No broker available to send FindCoordinator request");
return RequestFuture.noBrokersAvailable();
} else
//发送请求,并对 response 进行处理
findCoordinatorFuture = sendFindCoordinatorRequest(node);
}
return findCoordinatorFuture;
}
sendFindCoordinatorRequest,GroupCoordinatorResponseHandler对GroupCoordinator的response进行回调处理
//发送GroupCoordinator的请求,并且对response进行处理
private RequestFuture<Void> sendFindCoordinatorRequest(Node node) {
// initiate the group metadata request
log.debug("Sending FindCoordinator request to broker {}", node);
FindCoordinatorRequest.Builder requestBuilder =
new FindCoordinatorRequest.Builder(FindCoordinatorRequest.CoordinatorType.GROUP, this.groupId);
return client.send(node, requestBuilder)
.compose(new FindCoordinatorResponseHandler()); // compose的作用是就是将GroupCoordinatorResponseHandler类转换为RequestFuture.实际上就是为返回的Future类重置onSuccess()和onFailure()方法
}
// 对GroupCoordinator的response进行处理,回调
private class FindCoordinatorResponseHandler extends RequestFutureAdapter<ClientResponse, Void> {
@Override
public void onSuccess(ClientResponse resp, RequestFuture<Void> future) {
log.debug("Received FindCoordinator response {}", resp);
clearFindCoordinatorFuture();
FindCoordinatorResponse findCoordinatorResponse = (FindCoordinatorResponse) resp.responseBody();
Errors error = findCoordinatorResponse.error();
if (error == Errors.NONE) {
//如果正确获取GroupCoordinator时,建立连接并且更新心跳时间
synchronized (AbstractCoordinator.this) {
// use MAX_VALUE - node.id as the coordinator id to allow separate connections
// for the coordinator in the underlying network client layer
int coordinatorConnectionId = Integer.MAX_VALUE - findCoordinatorResponse.node().id();
AbstractCoordinator.this.coordinator = new Node(
coordinatorConnectionId,
findCoordinatorResponse.node().host(),
findCoordinatorResponse.node().port());
log.info("Discovered group coordinator {}", coordinator);
client.tryConnect(coordinator); //初始化tcp连接
heartbeat.resetSessionTimeout(); //更新心跳时间
}
future.complete(null);
} else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
future.raise(new GroupAuthorizationException(groupId));
} else {
log.debug("Group coordinator lookup failed: {}", error.message());
future.raise(error);
}
}
@Override
public void onFailure(RuntimeException e, RequestFuture<Void> future) {
clearFindCoordinatorFuture();
super.onFailure(e, future);
}
}
2.3 ensureActiveGroup(),向 GroupCoordinator 发送 join-group、sync-group 请求
- ensureActiveGroup方法的调用过程:ensureActiveGroup() –> ensureCoordinatorReady() –> startHeartbeatThreadIfNeeded() –> joinGroupIfNeeded();
joinGroupIfNeeded()方法中最重要的是initiateJoinGroup(),该方法的调用过程为:initiateJoinGroup() –> sendJoinGroupRequest() –> JoinGroupResponseHandler.handle().succeed –> onJoinLeader()/onJoinFollower() –> sendSyncGroupRequest() –> SyncGroupResponseHandler
ensureActiveGroup方法
boolean ensureActiveGroup(final Timer timer) {
// always ensure that the coordinator is ready because we may have been disconnected
// when sending heartbeats and does not necessarily require us to rejoin the group.
if (!ensureCoordinatorReady(timer)) { // 确保已经与Group Coordinator建立连接
return false;
}
startHeartbeatThreadIfNeeded(); //启动心跳发送线程(并不一定立即发送心跳,满足条件后才会发送心跳)
return joinGroupIfNeeded(timer); //发送 JoinGroup 请求,对返回的信息进行处理
}
join-group请求是在joinGroupIfNeeded()实现
boolean joinGroupIfNeeded(final Timer timer) {
while (rejoinNeededOrPending()) {
if (!ensureCoordinatorReady(timer)) {
return false;
}
// 触发onJoinPrepare,包括 offset commit 和 rebalance listener
if (needsJoinPrepare) {
onJoinPrepare(generation.generationId, generation.memberId);
needsJoinPrepare = false;
}
// 初始化 JoinGroup 请求,并且发送该请求
final RequestFuture<ByteBuffer> future = initiateJoinGroup();
client.poll(future, timer); //客户端轮询确保异步请求完成后返回
if (!future.isDone()) {
// we ran out of time
return false;
}
if (future.succeeded()) { //请求完成,根据结果处理回调
// Duplicate the buffer in case `onJoinComplete` does not complete and needs to be retried.
ByteBuffer memberAssignment = future.value().duplicate();
onJoinComplete(generation.generationId, generation.memberId, generation.protocol, memberAssignment);
// We reset the join group future only after the completion callback returns. This ensures
// that if the callback is woken up, we will retry it on the next joinGroupIfNeeded.
resetJoinGroupFuture();
needsJoinPrepare = true;
} else {
resetJoinGroupFuture();
final RuntimeException exception = future.exception();
if (exception instanceof UnknownMemberIdException ||
exception instanceof RebalanceInProgressException ||
exception instanceof IllegalGenerationException ||
exception instanceof MemberIdRequiredException)
continue;
else if (!future.isRetriable())
throw exception;
timer.sleep(retryBackoffMs);
}
}
return true;
}
sendJoinGroupRequest()方法是initiateJoinGroup()方法来调用的
// 发送joinGroup请求,并且添加 listener
private synchronized RequestFuture<ByteBuffer> initiateJoinGroup() {
if (joinFuture == null) {
// fence off the heartbeat thread explicitly so that it cannot interfere with the join group.
// Note that this must come after the call to onJoinPrepare since we must be able to continue
// sending heartbeats if that callback takes some time.
//在rebalance期间,心跳线程停止
disableHeartbeatThread();
//将成员状态标记为rebalance
state = MemberState.REBALANCING;
//发送JoinGroup请求
joinFuture = sendJoinGroupRequest();
joinFuture.addListener(new RequestFutureListener<ByteBuffer>() {
@Override
public void onSuccess(ByteBuffer value) {
// handle join completion in the callback so that the callback will be invoked
// even if the consumer is woken up before finishing the rebalance
synchronized (AbstractCoordinator.this) {
log.info("Successfully joined group with generation {}", generation.generationId);
state = MemberState.STABLE; //标记 Consumer状态 为stable
rejoinNeeded = false;
if (heartbeatThread != null)
heartbeatThread.enable();
}
}
@Override
public void onFailure(RuntimeException e) {
// we handle failures below after the request finishes. if the join completes
// after having been woken up, the exception is ignored and we will rejoin
synchronized (AbstractCoordinator.this) {
state = MemberState.UNJOINED; //标记 Consumer状态为 unjoined
}
}
});
}
return joinFuture;
}
sendJoinGroupRequest()及其处理如下。
// 发送JoinGroup请求,并且返回分区指定方案
RequestFuture<ByteBuffer> sendJoinGroupRequest() {
if (coordinatorUnknown())
return RequestFuture.coordinatorNotAvailable();
// send a join group request to the coordinator
log.info("(Re-)joining group");
// 消费者创建“加入组请求”,包括消费者的元数据作为请求的数据内容
// 消费者发送请求用到的元数据,Assignor(分区分配器)会用在具体分区分配器的算法执行上,即assign方法上
// subscriptions表示每个消费者的订阅信息,让消费者都发送自己的订阅信息给协调者,协调者就可以收集到所有消费者订阅的主题;
// metadata是集群的元数据,记录了每个主题的相关信息,包括主题的分区数。这样协调者就可以将对应主题的分区,分配给所有订阅这些主题的消费者
JoinGroupRequest.Builder requestBuilder = new JoinGroupRequest.Builder(
groupId, //消费者组id
this.sessionTimeoutMs, //会话超时时间
this.generation.memberId, //消费者成员编号
protocolType(), //协议类型
metadata()) //元数据
.setRebalanceTimeout(this.rebalanceTimeoutMs);
log.debug("Sending JoinGroup ({}) to coordinator {}", requestBuilder, this.coordinator);
// Note that we override the request timeout using the rebalance timeout since that is the
// maximum time that it may block on the coordinator. We add an extra 5 seconds for small delays.
int joinGroupTimeoutMs = Math.max(rebalanceTimeoutMs, rebalanceTimeoutMs + 5000);
// 消费者发送“加入组请求”,采用组合模式返回一个新的异步请求对象,并且定义回调器
return client.send(coordinator, requestBuilder, joinGroupTimeoutMs)
.compose(new JoinGroupResponseHandler());
}
//同步group信息
private class JoinGroupResponseHandler extends CoordinatorResponseHandler<JoinGroupResponse, ByteBuffer> {
@Override
public void handle(JoinGroupResponse joinResponse, RequestFuture<ByteBuffer> future) {
Errors error = joinResponse.error();
if (error == Errors.NONE) {
log.debug("Received successful JoinGroup response: {}", joinResponse);
sensors.joinLatency.record(response.requestLatencyMs());
synchronized (AbstractCoordinator.this) {
if (state != MemberState.REBALANCING) {
// if the consumer was woken up before a rebalance completes, we may have already left
// the group. In this case, we do not want to continue with the sync group.
future.raise(new UnjoinedGroupException());
} else {
AbstractCoordinator.this.generation = new Generation(joinResponse.generationId(),
joinResponse.memberId(), joinResponse.groupProtocol());
// Join Group成功之后,需要进行sync-group,获取分配的TopicPartition列表
// 协调者在收集完所有的消费者及其订阅消息后,并不执行具体的任务分配算法,而是交给其中一个消费者作为主消费者执行分区分配任务
if (joinResponse.isLeader()) {
onJoinLeader(joinResponse).chain(future);
} else {
onJoinFollower().chain(future);
}
}
}
} else if (error == Errors.COORDINATOR_LOAD_IN_PROGRESS) {
log.debug("Attempt to join group rejected since coordinator {} is loading the group.", coordinator());
// backoff and retry
future.raise(error);
} else if (error == Errors.UNKNOWN_MEMBER_ID) {
// reset the member id and retry immediately
resetGeneration();
log.debug("Attempt to join group failed due to unknown member id.");
future.raise(Errors.UNKNOWN_MEMBER_ID);
} else if (error == Errors.COORDINATOR_NOT_AVAILABLE
|| error == Errors.NOT_COORDINATOR) {
// re-discover the coordinator and retry with backoff
markCoordinatorUnknown();
log.debug("Attempt to join group failed due to obsolete coordinator information: {}", error.message());
future.raise(error);
} else if (error == Errors.INCONSISTENT_GROUP_PROTOCOL
|| error == Errors.INVALID_SESSION_TIMEOUT
|| error == Errors.INVALID_GROUP_ID
|| error == Errors.GROUP_AUTHORIZATION_FAILED
|| error == Errors.GROUP_MAX_SIZE_REACHED) {
log.error("Attempt to join group failed due to fatal error: {}", error.message());
if (error == Errors.GROUP_MAX_SIZE_REACHED) {
future.raise(new GroupMaxSizeReachedException(groupId));
} else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
future.raise(new GroupAuthorizationException(groupId));
} else {
future.raise(error);
}
} else if (error == Errors.MEMBER_ID_REQUIRED) {
// Broker requires a concrete member id to be allowed to join the group. Update member id
// and send another join group request in next cycle.
synchronized (AbstractCoordinator.this) {
AbstractCoordinator.this.generation = new Generation(OffsetCommitRequest.DEFAULT_GENERATION_ID,
joinResponse.memberId(), null);
AbstractCoordinator.this.rejoinNeeded = true;
AbstractCoordinator.this.state = MemberState.UNJOINED;
}
future.raise(Errors.MEMBER_ID_REQUIRED);
} else {
// unexpected error, throw the exception
log.error("Attempt to join group failed due to unexpected error: {}", error.message());
future.raise(new KafkaException("Unexpected error in join group response: " + error.message()));
}
}
}
sendJoinGroupRequest():向 GroupCoordinator 发送 join-group 请求
对应GroupCoordinator的handleJoinGroup方法
- 如果group是新的group.id,那么创建GroupMetadata实例,此时group初始化状态为Empty
- 当 GroupCoordinator 接收到 consumer 的 join-group 请求后,由于此时这个 group 的 member 列表还是空(group 是新建的,每个 consumer 实例被称为这个 group 的一个 member),第一个加入的 member 将被选为 leader,也就是说,对于一个新的 consumer group 而言,当第一个 consumer 实例加入后将会被选为 leader;
- 如果 GroupCoordinator 接收到 leader 发送 join-group 请求,将会触发 rebalance,group 的状态变为 PreparingRebalance;
- 此时,GroupCoordinator 将会等待一定的时间,如果在一定时间内,接收到 join-group 请求的 consumer 将被认为是依然存活的,此时 group 会变为 AwaitSync 状态,并且 GroupCoordinator 会向这个 group 的所有 member 返回其 response;
- consumer 在接收到 GroupCoordinator 的 response 后,如果这个 consumer 是 group 的 leader,那么这个 consumer 将会负责为整个 group assign partition 订阅安排(默认是按 range 的策略,目前也可选 roundrobin),然后 leader 将分配后的信息以
sendSyncGroupRequest()请求的方式发给 GroupCoordinator,而作为 follower 的 consumer 实例会发送一个空列表; - GroupCoordinator 在接收到 leader 发来的请求后,会将 assign 的结果返回给所有已经发送 sync-group 请求的 consumer 实例,并且 group 的状态将会转变为 Stable,如果后续再收到 sync-group 请求,由于 group 的状态已经是 Stable,将会直接返回其分配结果。
sync-group请求的发送
// 当consumer为follower时,从 GroupCoordinator 拉取分配结果
// new SyncGroupRequest.Builder 最后一个参数为空列表
private RequestFuture<ByteBuffer> onJoinFollower() {
// send follower's sync group with an empty assignment
SyncGroupRequest.Builder requestBuilder =
new SyncGroupRequest.Builder(groupId, generation.generationId, generation.memberId,
Collections.<String, ByteBuffer>emptyMap());
log.debug("Sending follower SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
return sendSyncGroupRequest(requestBuilder);
}
//当consumer为leader时,对group下的所有实例进行分配,将 assign 的结果发送到 GroupCoordinator
private RequestFuture<ByteBuffer> onJoinLeader(JoinGroupResponse joinResponse) {
try {
// perform the leader synchronization and send back the assignment for the group
Map<String, ByteBuffer> groupAssignment = performAssignment(joinResponse.leaderId(), joinResponse.groupProtocol(),
joinResponse.members());
SyncGroupRequest.Builder requestBuilder =
new SyncGroupRequest.Builder(groupId, generation.generationId, generation.memberId, groupAssignment);
log.debug("Sending leader SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
// 发送 sync-group 请求
return sendSyncGroupRequest(requestBuilder);
} catch (RuntimeException e) {
return RequestFuture.failure(e);
}
}
private RequestFuture<ByteBuffer> sendSyncGroupRequest(SyncGroupRequest.Builder requestBuilder) {
if (coordinatorUnknown())
return RequestFuture.coordinatorNotAvailable();
return client.send(coordinator, requestBuilder)
.compose(new SyncGroupResponseHandler());
}
private class SyncGroupResponseHandler extends CoordinatorResponseHandler<SyncGroupResponse, ByteBuffer> {
@Override
public void handle(SyncGroupResponse syncResponse,
RequestFuture<ByteBuffer> future) {
Errors error = syncResponse.error();
if (error == Errors.NONE) { //同步成功
sensors.syncLatency.record(response.requestLatencyMs());
future.complete(syncResponse.memberAssignment());
} else {
requestRejoin();
if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
future.raise(new GroupAuthorizationException(groupId));
} else if (error == Errors.REBALANCE_IN_PROGRESS) {
log.debug("SyncGroup failed because the group began another rebalance");
future.raise(error);
} else if (error == Errors.UNKNOWN_MEMBER_ID
|| error == Errors.ILLEGAL_GENERATION) {
log.debug("SyncGroup failed: {}", error.message());
resetGeneration();
future.raise(error);
} else if (error == Errors.COORDINATOR_NOT_AVAILABLE
|| error == Errors.NOT_COORDINATOR) {
log.debug("SyncGroup failed: {}", error.message());
markCoordinatorUnknown();
future.raise(error);
} else {
future.raise(new KafkaException("Unexpected error from SyncGroup: " + error.message()));
}
}
}
}
注意:如果是协调者负责分区的分配工作,消费者发送完“加入组请求”后,就可以从“加入组响应”中获得分区,但是,实际协调者并不会执行分区分配,所以它返回的“加入组响应”没有分配结果。
协调者返回给主消费者的是:所有消费者成员列表及其对应的订阅信息 ;返回给普通消费者的则没有这些消息。
由于消费者接受的“加入组响应”不是分配的分区,所以不能直接完成“加入组”的异步请求,而应该再次发送“同步组请求”,即在onJoinFollower和onJoinLeader方法中发送sync-group请求
onJoinLeader:不同于onJoinFollower,在收到“加入组响应”后立即发送sync-group请求,而是先获取执行分区分配过程中需要用到的数据,然后调用performAssignment()执行分区分配
onJoinComplete()
//todo
3 join-group和async-group流程总结
加入消费者群组的流程一般来说:
- 消费者发送订阅消息给协调者
- 协调者收集所有的消费者,以及它们对应的订阅消息
- 协调者执行任务分配算法,即具体如何将不同的分区分配给不同的消费者
- 分配结果确定后,协调者将分区返回给消费者,消费者分配到分区开始工作
但是,协调者不负责分配分区结果,改进后的具体步骤:
- 消费者发送订阅消息给协调者
- 协调者收集所有的消费者,以及它们对应的订阅消息
- 协调者将所有的消费者成员列表及其订阅消息发送给主消费者
- 主消费者将执行具体的分区分配算法
- 主消费者将分配结果同步回协调者
- 协调者收到主消费者的分配结果,将分区返回给每个消费者
4 主消费者执行分配任务
JoinGroupRequest:"加入组请求"
private final String groupId; // 消费组编号
private final int sessionTimeout; // 会话超时时间
private final int rebalanceTimeout; // 再平衡超时时间
private final String memberId; // 消费者成员编号
private final String protocolType; // 协议类型
private final List<ProtocolMetadata> groupProtocols; //元数据
JoinGroupResponse:"加入组响应"
private final int throttleTimeMs;
private final Errors error;
private final int generationId; // 纪元编号
private final String groupProtocol; // 统一的消费组协议
private final String memberId; // 消费者成员编号
private final String leaderId; // 主消费者编号,memberId=leaderId就是主消费者
private final Map<String, ByteBuffer> members; // 所有消费者成员消息(包含编号,还有订阅消息)
performAssignment方法:
// 在主消费者(ConsumerCoordinator)执行分区分配,返回每个消费者的分区分配结果
@Override
protected Map<String, ByteBuffer> performAssignment(String leaderId,
String assignmentStrategy,
Map<String, ByteBuffer> allSubscriptions) {
// 根据协调者指定的消费组协议,获取唯一的分区分配器
PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
if (assignor == null)
throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);
Set<String> allSubscribedTopics = new HashSet<>();
// subscriptions是从所有消费者的订阅元数据中解析出来的
Map<String, Subscription> subscriptions = new HashMap<>();
for (Map.Entry<String, ByteBuffer> subscriptionEntry : allSubscriptions.entrySet()) {
// 反序列化消费者的订阅消息
Subscription subscription = ConsumerProtocol.deserializeSubscription(subscriptionEntry.getValue());
// 消费者订阅消息的键是消费者成员编号,值是订阅的主题
subscriptions.put(subscriptionEntry.getKey(), subscription);
// 所以消费者订阅的所有主题,集群元数据会获取这些主题的所有分区
allSubscribedTopics.addAll(subscription.topics());
}
// the leader will begin watching for changes to any of the topics the group is interested in,
// which ensures that all metadata changes will eventually be seen
this.subscriptions.groupSubscribe(allSubscribedTopics);
metadata.setTopics(this.subscriptions.groupSubscription());
// update metadata (if needed) and keep track of the metadata used for assignment so that
// we can check after rebalance completion whether anything has changed
if (!client.ensureFreshMetadata(time.timer(Long.MAX_VALUE))) throw new TimeoutException();
isLeader = true;
log.debug("Performing assignment using strategy {} with subscriptions {}", assignor.name(), subscriptions);
// 根据分配策略,为所有消费者分配分区。返回值表示每个消费者的分配结果
Map<String, Assignment> assignment = assignor.assign(metadata.fetch(), subscriptions);
// user-customized assignor may have created some topics that are not in the subscription list
// and assign their partitions to the members; in this case we would like to update the leader's
// own metadata with the newly added topics so that it will not trigger a subsequent rebalance
// when these topics gets updated from metadata refresh.
//
// TODO: this is a hack and not something we want to support long-term unless we push regex into the protocol
// we may need to modify the PartitionAssignor API to better support this case.
Set<String> assignedTopics = new HashSet<>();
for (Assignment assigned : assignment.values()) {
for (TopicPartition tp : assigned.partitions())
assignedTopics.add(tp.topic());
}
if (!assignedTopics.containsAll(allSubscribedTopics)) {
Set<String> notAssignedTopics = new HashSet<>(allSubscribedTopics);
notAssignedTopics.removeAll(assignedTopics);
log.warn("The following subscribed topics are not assigned to any members: {} ", notAssignedTopics);
}
if (!allSubscribedTopics.containsAll(assignedTopics)) {
Set<String> newlyAddedTopics = new HashSet<>(assignedTopics);
newlyAddedTopics.removeAll(allSubscribedTopics);
log.info("The following not-subscribed topics are assigned, and their metadata will be " +
"fetched from the brokers: {}", newlyAddedTopics);
allSubscribedTopics.addAll(assignedTopics);
this.subscriptions.groupSubscribe(allSubscribedTopics);
metadata.setTopics(this.subscriptions.groupSubscription());
if (!client.ensureFreshMetadata(time.timer(Long.MAX_VALUE))) throw new TimeoutException();
}
assignmentSnapshot = metadataSnapshot;
log.debug("Finished assignment for group: {}", assignment);
Map<String, ByteBuffer> groupAssignment = new HashMap<>();
for (Map.Entry<String, Assignment> assignmentEntry : assignment.entrySet()) {
ByteBuffer buffer = ConsumerProtocol.serializeAssignment(assignmentEntry.getValue());
groupAssignment.put(assignmentEntry.getKey(), buffer);
}
return groupAssignment;
}
获取分区列表过程:
5 分区器的不同实现类
AbstractPartitionAssignor实现了PartitionAssigner的assign()分区分配方法,但是也定义了一个参数类型不同的assign()抽象方法
public abstract class AbstractPartitionAssignor implements PartitionAssignor {
public abstract Map<String, List<TopicPartition>> assign(
// 每个主题的分区数量
Map<String, Integer> partitionsPerTopic,
// 每个消费者订阅的主题列表
Map<String, Subscription> subscriptions);
@Override
public Map<String, Assignment> assign(
// 集群元数据
Cluster metadata,
// 所有消费者的订阅消息
Map<String, Subscription> subscriptions) {
Set<String> allSubscribedTopics = new HashSet<>();
for (Map.Entry<String, Subscription> subscriptionEntry : subscriptions.entrySet())
allSubscribedTopics.addAll(subscriptionEntry.getValue().topics());
Map<String, Integer> partitionsPerTopic = new HashMap<>();
for (String topic : allSubscribedTopics) {
Integer numPartitions = metadata.partitionCountForTopic(topic);
if (numPartitions != null && numPartitions > 0)
partitionsPerTopic.put(topic, numPartitions);
else
log.debug("Skipping assignment for topic {} since no metadata is available", topic);
}
// 调用上面的assign方法
Map<String, List<TopicPartition>> rawAssignments = assign(partitionsPerTopic, subscriptions);
// this class maintains no user data, so just wrap the results
Map<String, Assignment> assignments = new HashMap<>();
for (Map.Entry<String, List<TopicPartition>> assignmentEntry : rawAssignments.entrySet())
assignments.put(assignmentEntry.getKey(), new Assignment(assignmentEntry.getValue()));
return assignments;
}
分配器的三个实现类:
- RangeAssignor
- RoundRobinAssignor
- StickyAssignor
具体分析请看另外一篇文章