Kafka2.4源码阅读——消费者客户端(PartitionAssignor)

1,281 阅读12分钟

前言

上一篇文章分析了生产者客户端流程,这篇文章开始分析消费者客户端流程源码,从PartitionAssignor开始分析,看一下消费者是如如何被分配分区的

正文

KafKa一个「主题」可以有多个「分区」,一个「分区」只能被一个「消费者」所消费,一个或多个消费者组成一个「消费者组」,当多个消费者去订阅一个主题的时候,需要将主题里面的分区尽可能平均分配给每一个消费者,例如有9个分区和3个消费者,那么一个消费者能分到3个分区。

分区是由消费者客户端决定的,而不是Kafka服务器,当我们去消费消息的时候可以自定义消费哪个分区,也可以使用内置的策略来进行分配,下面分析三种不同的消费策略

抽象方法——AbstractPartitionAssignor

@Override
    public GroupAssignment assign(Cluster metadata, GroupSubscription groupSubscription) {
        Map<String, Subscription> subscriptions = groupSubscription.groupSubscription();
        Set<String> allSubscribedTopics = new HashSet<>();
        //获取所有的topic
        for (Map.Entry<String, Subscription> subscriptionEntry : subscriptions.entrySet())
            allSubscribedTopics.addAll(subscriptionEntry.getValue().topics());

        //将获取topic对应的分区数量
        Map<String, Integer> partitionsPerTopic = new HashMap<>();
        for (String topic : allSubscribedTopics) {
            Integer numPartitions = metadata.partitionCountForTopic(topic);
            if (numPartitions != null && numPartitions > 0)
                partitionsPerTopic.put(topic, numPartitions);
            else
                log.debug("Skipping assignment for topic {} since no metadata is available", topic);
        }

        //获取group里面消费者分到到分区集合
        Map<String, List<TopicPartition>> rawAssignments = assign(partitionsPerTopic, subscriptions);

        // this class maintains no user data, so just wrap the results
        Map<String, Assignment> assignments = new HashMap<>();
        for (Map.Entry<String, List<TopicPartition>> assignmentEntry : rawAssignments.entrySet())
            assignments.put(assignmentEntry.getKey(), new Assignment(assignmentEntry.getValue()));
        return new GroupAssignment(assignments);
    }

该抽象方法是将元数据和订阅数据转换为「主题——分区数」,「消费者——订阅主题」的映射关系,可以从中获取到某个消费者订阅了哪些主题,同时获取主题的分区数量,然后交给子类实现

RangeAssignor

range.png

该分区方式是:[分区数量 / 消费者数量] 来获取最小分配单位,多出来的分区按照消费者顺序分配

例如上面图片:分区:7 / 消费者:3 = 2,多出来1个分区就分配到「消费者1」身上,如果多出来2个分区,那么第二个多出的分区会分配到「消费者2」身上

 @Override
    public Map<String, List<TopicPartition>> assign(Map<String, Integer> partitionsPerTopic,
                                                    Map<String, Subscription> subscriptions) {
        Map<String, List<MemberInfo>> consumersPerTopic = consumersPerTopic(subscriptions);

        Map<String, List<TopicPartition>> assignment = new HashMap<>();
        //初始化Map
        for (String memberId : subscriptions.keySet())
            assignment.put(memberId, new ArrayList<>());

        for (Map.Entry<String, List<MemberInfo>> topicEntry : consumersPerTopic.entrySet()) {
            String topic = topicEntry.getKey();
            List<MemberInfo> consumersForTopic = topicEntry.getValue();

            Integer numPartitionsForTopic = partitionsPerTopic.get(topic);
            if (numPartitionsForTopic == null)
                continue;

            //排序
            Collections.sort(consumersForTopic);

            //每个消费者能分到多少个topic
            int numPartitionsPerConsumer = numPartitionsForTopic / consumersForTopic.size();
            //还余下多少个topic
            int consumersWithExtraPartition = numPartitionsForTopic % consumersForTopic.size();

            List<TopicPartition> partitions = AbstractPartitionAssignor.partitions(topic, numPartitionsForTopic);
            //遍历消费者 然后分区
            for (int i = 0, n = consumersForTopic.size(); i < n; i++) {
                int start = numPartitionsPerConsumer * i + Math.min(i, consumersWithExtraPartition);
                int length = numPartitionsPerConsumer + (i + 1 > consumersWithExtraPartition ? 0 : 1);
                assignment.get(consumersForTopic.get(i).memberId).addAll(partitions.subList(start, start + length));
            }
        }
        return assignment;
    }

源码很简单,对消费者排序,然后相除和取余来进行计算

RoundRobinAssignor

该方式是使用轮循的方式来分配分区,原理也很简单

@Override
    public Map<String, List<TopicPartition>> assign(Map<String, Integer> partitionsPerTopic,
                                                    Map<String, Subscription> subscriptions) {
        Map<String, List<TopicPartition>> assignment = new HashMap<>();
        List<MemberInfo> memberInfoList = new ArrayList<>();
        for (Map.Entry<String, Subscription> memberSubscription : subscriptions.entrySet()) {
            assignment.put(memberSubscription.getKey(), new ArrayList<>());
            memberInfoList.add(new MemberInfo(memberSubscription.getKey(),
                                              memberSubscription.getValue().groupInstanceId()));
        }
        //使用循环队列 循环对消费者进行分配
        CircularIterator<MemberInfo> assigner = new CircularIterator<>(Utils.sorted(memberInfoList));

        for (TopicPartition partition : allPartitionsSorted(partitionsPerTopic, subscriptions)) {
            final String topic = partition.topic();
            while (!subscriptions.get(assigner.peek().memberId).topics().contains(topic))
                assigner.next();
            assignment.get(assigner.next().memberId).add(partition);
        }
        return assignment;
    }
    private List<TopicPartition> allPartitionsSorted(Map<String, Integer> partitionsPerTopic,
                                                     Map<String, Subscription> subscriptions) {
        //将分区进行排序
        SortedSet<String> topics = new TreeSet<>();
        for (Subscription subscription : subscriptions.values())
            topics.addAll(subscription.topics());

        //依次创建TopicPartition对象
        List<TopicPartition> allPartitions = new ArrayList<>();
        for (String topic : topics) {
            Integer numPartitionsForTopic = partitionsPerTopic.get(topic);
            if (numPartitionsForTopic != null)
                allPartitions.addAll(AbstractPartitionAssignor.partitions(topic, numPartitionsForTopic));
        }
        return allPartitions;
    }

核心是使用循环的迭代器来对消费者进行循环,每次迭代都分配分区,直到所有的分区都被分配完毕

StickyAssignor

StickyAssignor是粘性分区,对于上面介绍到的RangeAssignor和RoundRobinAssignor来说,这两种只体现了分区的「公平性」,能够保证每一次分配都尽可能公平,但是没有考虑到上一次分配情况,粘性分区方式则是保存了上一次分配情况,根据上次分配情况来进行公平分配。

还是按照刚才图片所示的分区方式: 消费者1:1、2、3 消费者2:4、5 消费者3:6、7

如果这个时候消费者1下线了,按照RangeAssignor的分配方式是:

消费者2:1,2,3,4 消费者3:5,6,7

如果按照StickyAssignor的方式,会考虑到上一次分配情况:

消费者2:4,5,1,2 消费者3:6、7,3

在尽量维持原来分配不变的情况下进行公平分配

@Override
    public Map<String, List<TopicPartition>> assign(Map<String, Integer> partitionsPerTopic,
                                                    Map<String, Subscription> subscriptions) {
        partitionMovements = new PartitionMovements();
        Map<String, List<TopicPartition>> consumerToOwnedPartitions = new HashMap<>();
        //这种情况是当前消费者组里面的消费者都订阅同一个主题
        if (allSubscriptionsEqual(partitionsPerTopic.keySet(), subscriptions, consumerToOwnedPartitions)) {
            log.debug("Detected that all consumers were subscribed to same set of topics, invoking the "
                          + "optimized assignment algorithm");
            partitionsTransferringOwnership = new HashMap<>();
            return constrainedAssign(partitionsPerTopic, consumerToOwnedPartitions);
        } else {
            log.debug("Detected that all not consumers were subscribed to same set of topics, falling back to the "
                          + "general case assignment algorithm");
            partitionsTransferringOwnership = null;
            return generalAssign(partitionsPerTopic, subscriptions);
        }
    }

由于各种操作都会触发重分配:新增/取消订阅主题、消费者下线、修改主题分区数量,所以StickyAssignor将重分配分为两种情况:

  • 消费者组里面的消费者都订阅了相同的主题列表
  • 订阅的主题是混乱的

第一种情况:都订阅一样的主题

/**
     * 如果所有使用者都具有相同的订阅,则返回true。 还用每个使用者的先前拥有和仍订阅的分区来填充传入的consumerToOwnedPartitions
     */
    private boolean allSubscriptionsEqual(Set<String> allTopics,
                                          Map<String, Subscription> subscriptions,
                                          Map<String, List<TopicPartition>> consumerToOwnedPartitions) {
        Set<String> membersWithOldGeneration = new HashSet<>();
        Set<String> membersOfCurrentHighestGeneration = new HashSet<>();
        int maxGeneration = DEFAULT_GENERATION;

        Set<String> subscribedTopics = new HashSet<>();

        for (Map.Entry<String, Subscription> subscriptionEntry : subscriptions.entrySet()) {
            String consumer = subscriptionEntry.getKey();
            Subscription subscription = subscriptionEntry.getValue();
            
            if (subscribedTopics.isEmpty()) {
                subscribedTopics.addAll(subscription.topics());
                //用于判断消费者是否都订阅了相同主题
            } else if (!(subscription.topics().size() == subscribedTopics.size()
                && subscribedTopics.containsAll(subscription.topics()))) {
                return false;
            }

            //获取上一次分区信息previous_assignment
            MemberData memberData = memberData(subscription);

            List<TopicPartition> ownedPartitions = new ArrayList<>();
            consumerToOwnedPartitions.put(consumer, ownedPartitions);

            if (memberData.generation.isPresent() && memberData.generation.get() >= maxGeneration
                || !memberData.generation.isPresent() && maxGeneration == DEFAULT_GENERATION) {

                // 如果当前成员的年龄较高,则所有先前拥有的分区都无效
                if (memberData.generation.isPresent() && memberData.generation.get() > maxGeneration) {
                    membersWithOldGeneration.addAll(membersOfCurrentHighestGeneration);
                    membersOfCurrentHighestGeneration.clear();
                    maxGeneration = memberData.generation.get();
                }

                membersOfCurrentHighestGeneration.add(consumer);
                for (final TopicPartition tp : memberData.partitions) {
                    //过滤掉失效的主题
                    if (allTopics.contains(tp.topic())) {
                        ownedPartitions.add(tp);
                    }
                }
            }
        }

        for (String consumer : membersWithOldGeneration) {
            consumerToOwnedPartitions.get(consumer).clear();
        }
        return true;
    }

迭代消费者订阅情况,判断是否都订阅了相同主题;如果都是订阅了相同主题,就获取上一次的分配情况跟当前订阅的主题进行比较,删除那些失效的主题分区,这样我们就获取到了一份有效的分配情况,但是可能不公平,所以进入到constrainedAssign来进行公平分配

//获取主题-分区对象 按照 按照主题排序 主题相同按照分区排序
        SortedSet<TopicPartition> unassignedPartitions = getTopicPartitions(partitionsPerTopic);

        Set<TopicPartition> allRevokedPartitions = new HashSet<>();

        // 未达标的成员
        List<String> unfilledMembers = new LinkedList<>();
        // 最大数量成员
        Queue<String> maxCapacityMembers = new LinkedList<>();
        // 最小数量成员
        Queue<String> minCapacityMembers = new LinkedList<>();

        int numberOfConsumers = consumerToOwnedPartitions.size();
        // minQuota和maxQuota相差1
        int minQuota = (int) Math.floor(((double) unassignedPartitions.size()) / numberOfConsumers);
        int maxQuota = (int) Math.ceil(((double) unassignedPartitions.size()) / numberOfConsumers);
// 使用minQuota初始化map
        Map<String, List<TopicPartition>> assignment = new HashMap<>(
            consumerToOwnedPartitions.keySet().stream().collect(Collectors.toMap(c -> c, c -> new ArrayList<>(minQuota))));

步骤一:将主题——分区数量格式化为TopicPartition对象,由于所有消费者订阅的主题都是一样的,所以理论上说每个消费者能够分配的最小值是【分区数量 / 消费者数量】,最大值是 【最小值+1】

for (Map.Entry<String, List<TopicPartition>> consumerEntry : consumerToOwnedPartitions.entrySet()) {
            String consumer = consumerEntry.getKey();
            List<TopicPartition> ownedPartitions = consumerEntry.getValue();

            List<TopicPartition> consumerAssignment = assignment.get(consumer);
            int i = 0;
            // 分配maxQuota个 超出这个数量的加入「撤销」集合
            for (TopicPartition tp : ownedPartitions) {
                if (i < maxQuota) {
                    consumerAssignment.add(tp);
                    unassignedPartitions.remove(tp);
                } else {
                    allRevokedPartitions.add(tp);
                }
                ++i;
            }

            //说明该消费者没有达到平均水平
            if (ownedPartitions.size() < minQuota) {
                unfilledMembers.add(consumer);
            } else {
                //添加到对应的集合中
                if (consumerAssignment.size() == minQuota)
                    minCapacityMembers.add(consumer);
                if (consumerAssignment.size() == maxQuota)
                    maxCapacityMembers.add(consumer);
            }
        }

步骤二:对上一次分配情况进行统计,主要是验证上一次分配是否达到了这一次允许的最大值,如果超过了需要删除多余的,然后将消费者放到对应的集合中

这一步后我们得到了四个集合:

  • unfilledMembers:没有达到最小值的消费者,重点关注它,需要将里面的消费者的分区数量达到到minQuota
  • minCapacityMembers:刚好到minQuota的消费者,说明已经饱和分配完成,不用管它了
  • maxCapacityMembers:达到了允许的最大值,如果有需要可以从中取出一个
  • unassignedPartitions:还没有分配的分区,主要将这个里面的分区分配给unfilledMembers的消费者
Collections.sort(unfilledMembers);
        Iterator<TopicPartition> unassignedPartitionsIter = unassignedPartitions.iterator();

        //将那些没有达到平均水平分区消费者分配分区
        while (!unfilledMembers.isEmpty() && !unassignedPartitions.isEmpty()) {
            Iterator<String> unfilledConsumerIter = unfilledMembers.iterator();

            while (unfilledConsumerIter.hasNext()) {
                String consumer = unfilledConsumerIter.next();
                List<TopicPartition> consumerAssignment = assignment.get(consumer);

                if (unassignedPartitionsIter.hasNext()) {
                    TopicPartition tp = unassignedPartitionsIter.next();
                    consumerAssignment.add(tp);
                    unassignedPartitionsIter.remove();
                    //如果说这个分区在撤销集合里面,说明是从另外一个消费者转移过来的
                    if (allRevokedPartitions.contains(tp))
                        partitionsTransferringOwnership.put(tp, consumer);
                } else {
                    break;
                }

                if (consumerAssignment.size() == minQuota) {
                    minCapacityMembers.add(consumer);
                    unfilledConsumerIter.remove();
                }
            }
        }

步骤三:从「未分配分区」集合中取出分区,分配给「未达到最小分配数量」的消费者,将所有未分配的节点尽量分配出去

//将剩余的主题分区都分配完以后 如果还有没达到最低标准的消费者 需要从maxQuota中取出
        for (String consumer : unfilledMembers) {
            List<TopicPartition> consumerAssignment = assignment.get(consumer);
            int remainingCapacity = minQuota - consumerAssignment.size();
            while (remainingCapacity > 0) {
                String overloadedConsumer = maxCapacityMembers.poll();
                //这种情况说明消费者太多 分区太少 有些消费者就分不到足够的数量
                if (overloadedConsumer == null) {
                    throw new IllegalStateException("Some consumers are under capacity but all partitions have been assigned");
                }
                TopicPartition swappedPartition = assignment.get(overloadedConsumer).remove(0);
                consumerAssignment.add(swappedPartition);
                --remainingCapacity;
                partitionsTransferringOwnership.put(swappedPartition, consumer);
            }
            minCapacityMembers.add(consumer);
        }

步骤四:经过步骤三以后,还会有一些消费者没有达到「最小分配数量」,这时候从maxCapacityMembers列表中取出一个来进行转移

//如果还有主题分区没有被分给消费者 就按照消费者顺序分
        for (TopicPartition unassignedPartition : unassignedPartitions) {
            String underCapacityConsumer = minCapacityMembers.poll();
            if (underCapacityConsumer == null) {
                throw new IllegalStateException("Some partitions are unassigned but all consumers are at maximum capacity");
            }
            // We can skip the bookkeeping of unassignedPartitions and maxCapacityMembers here since we are at the end
            assignment.get(underCapacityConsumer).add(unassignedPartition);

            if (allRevokedPartitions.contains(unassignedPartition))
                partitionsTransferringOwnership.put(unassignedPartition, underCapacityConsumer);
        }

return assignment;

步骤5:经过了以上步骤,所有的消费者都超过了最小容量,但是还有一些没有分配,就按照顺序分配给达到最小容量的节点。

++(通过上面的流程,感觉步骤五实际上不太可能进入)++

总结:由于是订阅的主题都一致,所以可以无脑进行分配,基本逻辑就是根据上一次分配情况进行调整

第二种情况

private Map<String, List<TopicPartition>> generalAssign(Map<String, Integer> partitionsPerTopic,
                                                            Map<String, Subscription> subscriptions) {
        //获取当前消费者前一次被分配分区情况 (虽然是currentAssignment,但是表示这一次之前)
        Map<String, List<TopicPartition>> currentAssignment = new HashMap<>();
        //获取上一次某个主题分区给了哪个消费者
        Map<TopicPartition, ConsumerGenerationPair> prevAssignment = new HashMap<>();

        prepopulateCurrentAssignments(subscriptions, currentAssignment, prevAssignment);

该方法一开始也是获取上一次分配情况,prepopulateCurrentAssignments就不看了,大同小异

//因为订阅的主题是不同的 所以要记录当前分区 可以分配给哪些消费者
        final Map<TopicPartition, List<String>> partition2AllPotentialConsumers = new HashMap<>();
        // 记录消费者可以被分配分区列表
        final Map<String, List<TopicPartition>> consumer2AllPotentialPartitions = new HashMap<>();

        //初始化 partition2AllPotentialConsumers
        for (Entry<String, Integer> entry: partitionsPerTopic.entrySet()) {
            for (int i = 0; i < entry.getValue(); ++i)
                partition2AllPotentialConsumers.put(new TopicPartition(entry.getKey(), i), new ArrayList<>());
        }

        for (Entry<String, Subscription> entry: subscriptions.entrySet()) {
            String consumerId = entry.getKey();
            consumer2AllPotentialPartitions.put(consumerId, new ArrayList<>());
            entry.getValue().topics().stream().filter(topic -> partitionsPerTopic.get(topic) != null).forEach(topic -> {
                for (int i = 0; i < partitionsPerTopic.get(topic); ++i) {
                    TopicPartition topicPartition = new TopicPartition(topic, i);
                    consumer2AllPotentialPartitions.get(consumerId).add(topicPartition);
                    partition2AllPotentialConsumers.get(topicPartition).add(consumerId);
                }
            });

            // 适用于新增了消费者情况 put新消费者
            if (!currentAssignment.containsKey(consumerId))
                currentAssignment.put(consumerId, new ArrayList<>());
        }

由于每个消费者订阅的主题是不同的,所以根据当前订阅信息和分区信息解析出两个Map:

  • 该主题分区可以被分配给哪些消费者:partition2AllPotentialConsumers
  • 该消费者可以被分配哪些主题:consumer2AllPotentialPartitions

同时新增了消费者,要添加消费者到currentAssignment中,currentAssignment是上一次分配情况,这一次分配会在上一次基础上进行修改

//给上一次分配情况进行「分区-消费者」映射
        Map<TopicPartition, String> currentPartitionConsumer = new HashMap<>();
        for (Map.Entry<String, List<TopicPartition>> entry: currentAssignment.entrySet())
            for (TopicPartition topicPartition: entry.getValue())
                currentPartitionConsumer.put(topicPartition, entry.getKey());

        List<TopicPartition> sortedPartitions = sortPartitions(partition2AllPotentialConsumers);

将上一次分区进行映射,同时获取当前能够被分配的「主题-分区」,进行排序

//从排序后的分区开始
        List<TopicPartition> unassignedPartitions = new ArrayList<>(sortedPartitions);
        boolean revocationRequired = false;
        for (Iterator<Entry<String, List<TopicPartition>>> it = currentAssignment.entrySet().iterator(); it.hasNext();) {
            Map.Entry<String, List<TopicPartition>> entry = it.next();
            //如果消费者从消费者组中离线了 删除它
            if (!subscriptions.containsKey(entry.getKey())) {
                for (TopicPartition topicPartition: entry.getValue())
                    currentPartitionConsumer.remove(topicPartition);
                it.remove();
            } else {
                for (Iterator<TopicPartition> partitionIter = entry.getValue().iterator(); partitionIter.hasNext();) {
                    TopicPartition partition = partitionIter.next();
                    //如果说主题调整了不存在了 删除它
                    if (!partition2AllPotentialConsumers.containsKey(partition)) {
                        partitionIter.remove();
                        currentPartitionConsumer.remove(partition);
                        //如果当前消费者没订阅主题了 删除它
                    } else if (!subscriptions.get(entry.getKey()).topics().contains(partition.topic())) {
                        partitionIter.remove();
                        revocationRequired = true;
                    } else
                        unassignedPartitions.remove(partition);
                }
            }
        }

然后修修整上一次分配信息,维持原来分配的基础上,删除那些已经失效的主题或者分区

以上就是一次预分配,在上一次的基础上预分配,和第一种情况差不多的步骤,区别就是订阅主题的不同,需要转换获取更多信息

// 按照已分配的数量进行排序
        TreeSet<String> sortedCurrentSubscriptions = new TreeSet<>(new SubscriptionComparator(currentAssignment));
        sortedCurrentSubscriptions.addAll(currentAssignment.keySet());

        balance(currentAssignment, prevAssignment, sortedPartitions, unassignedPartitions, sortedCurrentSubscriptions,
            consumer2AllPotentialPartitions, partition2AllPotentialConsumers, currentPartitionConsumer, revocationRequired);
        return currentAssignment;

按照被分配分区数量对消费者进行排序,进入到balance方法,进行平衡操作

private void balance(Map<String, List<TopicPartition>> currentAssignment,
                         Map<TopicPartition, ConsumerGenerationPair> prevAssignment,
                         List<TopicPartition> sortedPartitions,
                         List<TopicPartition> unassignedPartitions,
                         TreeSet<String> sortedCurrentSubscriptions,
                         Map<String, List<TopicPartition>> consumer2AllPotentialPartitions,
                         Map<TopicPartition, List<String>> partition2AllPotentialConsumers,
                         Map<TopicPartition, String> currentPartitionConsumer,
                         boolean revocationRequired) {
        boolean initializing = currentAssignment.get(sortedCurrentSubscriptions.last()).isEmpty();
        boolean reassignmentPerformed = false;

        //遍历未被分配的分区
        for (TopicPartition partition: unassignedPartitions) {
            if (partition2AllPotentialConsumers.get(partition).isEmpty())
                continue;
            //因为我们按照了消费者被分配的数量进行排序,所以优先分配给小的消费者 就遍历这个排序集合
            //剩下的只需要判断这个分区是否能够分配就行了
            assignPartition(partition, sortedCurrentSubscriptions, currentAssignment,
                consumer2AllPotentialPartitions, currentPartitionConsumer);
        }


private void assignPartition(TopicPartition partition,
                                 TreeSet<String> sortedCurrentSubscriptions,
                                 Map<String, List<TopicPartition>> currentAssignment,
                                 Map<String, List<TopicPartition>> consumer2AllPotentialPartitions,
                                 Map<TopicPartition, String> currentPartitionConsumer) {
        for (String consumer: sortedCurrentSubscriptions) {
            if (consumer2AllPotentialPartitions.get(consumer).contains(partition)) {
                sortedCurrentSubscriptions.remove(consumer);
                currentAssignment.get(consumer).add(partition);
                currentPartitionConsumer.put(partition, consumer);
                sortedCurrentSubscriptions.add(consumer);
                break;
            }
        }
    }

initializing获取了「被分配分区数量」排序最大值的消费者的分配情况,如果为空说明当前没有分配(上一次分配和现在毫无关系,topic或消费者完全变更),如果这种情况说明以后不能进行回退分配

reassignmentPerformed说明是否发生了公平操作

++首先将没有被分配的分区全部分配出去,不管是否平衡++

// 到目前为止 已经分配出去了 但是可能不均衡
        //这一步是计算那些可能有多个消费者的分区 会触发重新分配
        Set<TopicPartition> fixedPartitions = new HashSet<>();
        for (TopicPartition partition: partition2AllPotentialConsumers.keySet())
            //过滤掉只有一个消费者的分区 这种分区就无法被重新分配
            if (!canParticipateInReassignment(partition, partition2AllPotentialConsumers))
                fixedPartitions.add(partition);
        sortedPartitions.removeAll(fixedPartitions);
        unassignedPartitions.removeAll(fixedPartitions);

      

        Map<String, List<TopicPartition>> preBalanceAssignment = deepCopy(currentAssignment);
        Map<TopicPartition, String> preBalancePartitionConsumers = new HashMap<>(currentPartitionConsumer);

//能否进行重分配,分区能被分给2个以上的消费者
private boolean canParticipateInReassignment(TopicPartition partition,
                                                 Map<TopicPartition, List<String>> partition2AllPotentialConsumers) {
        return partition2AllPotentialConsumers.get(partition).size() >= 2;
    }

然后是缩小重新公平分配的「分区」范围,过滤那些分区只能分配给一个消费者的情况,比如说只有一个消费者订阅了该主题,这种情况就不能执行重分配

Map<String, List<TopicPartition>> fixedAssignments = new HashMap<>();
        for (String consumer: consumer2AllPotentialPartitions.keySet())
            if (!canParticipateInReassignment(consumer, currentAssignment,
                consumer2AllPotentialPartitions, partition2AllPotentialConsumers)) {
                sortedCurrentSubscriptions.remove(consumer);
                fixedAssignments.put(consumer, currentAssignment.remove(consumer));
            }


private boolean canParticipateInReassignment(String consumer,
                                                 Map<String, List<TopicPartition>> currentAssignment,
                                                 Map<String, List<TopicPartition>> consumer2AllPotentialPartitions,
                                                 Map<TopicPartition, List<String>> partition2AllPotentialConsumers) {
        List<TopicPartition> currentPartitions = currentAssignment.get(consumer);
        int currentAssignmentSize = currentPartitions.size();
        int maxAssignmentSize = consumer2AllPotentialPartitions.get(consumer).size();
        if (currentAssignmentSize > maxAssignmentSize)
            log.error("The consumer {} is assigned more partitions than the maximum possible.", consumer);

        //触发重新分配条件
        if (currentAssignmentSize < maxAssignmentSize)
            return true;

        //如果说现在分配的分区最大能分配的相等 表示一个主题的所有分区全被分配给一个消费者了
        //需要判断一下是否有多个消费者订阅该主题 触发重分配操作
        for (TopicPartition partition: currentPartitions)
            if (canParticipateInReassignment(partition, partition2AllPotentialConsumers))
                return true;

        return false;
    }

这一步是筛选能执行重新分配的消费者,条件是:该消费当前分配的数量 < 能够最大分配数量,如果相等则要按照上一步判断,是否该分区能够被分配给多个消费者

这里很绕,但是核心思想就是去除某个只被一个消费者订阅的主题

Map<String, List<TopicPartition>> preBalanceAssignment = deepCopy(currentAssignment);
        Map<TopicPartition, String> preBalancePartitionConsumers = new HashMap<>(currentPartitionConsumer);

        // 如果说没有取消订阅一些主题 使用unassignedPartitions来作为参数
        if (!revocationRequired) {
            performReassignments(unassignedPartitions, currentAssignment, prevAssignment, sortedCurrentSubscriptions,
                consumer2AllPotentialPartitions, partition2AllPotentialConsumers, currentPartitionConsumer);
        }

        //执行所有分区的重分配
        reassignmentPerformed = performReassignments(sortedPartitions, currentAssignment, prevAssignment, sortedCurrentSubscriptions,
                   consumer2AllPotentialPartitions, partition2AllPotentialConsumers, currentPartitionConsumer);

对原始分配情况进行拷贝,调用performReassignments方法进行重新分配,这里有一个判断:是否取消过订阅,区别是performReassignments的首个参数不同,看了这个方法的源码才知道为什么

private boolean performReassignments(List<TopicPartition> reassignablePartitions,
                                         Map<String, List<TopicPartition>> currentAssignment,
                                         Map<TopicPartition, ConsumerGenerationPair> prevAssignment,
                                         TreeSet<String> sortedCurrentSubscriptions,
                                         Map<String, List<TopicPartition>> consumer2AllPotentialPartitions,
                                         Map<TopicPartition, List<String>> partition2AllPotentialConsumers,
                                         Map<TopicPartition, String> currentPartitionConsumer) {
        boolean reassignmentPerformed = false;
        boolean modified;
        
        do {
            modified = false;
            Iterator<TopicPartition> partitionIterator = reassignablePartitions.iterator();
            while (partitionIterator.hasNext() && !isBalanced(currentAssignment, sortedCurrentSubscriptions, consumer2AllPotentialPartitions)) {
                TopicPartition partition = partitionIterator.next();

                // 分区必须的有两个消费者才能重新分配
                if (partition2AllPotentialConsumers.get(partition).size() <= 1)
                    log.error("Expected more than one potential consumer for partition '{}'", partition);

                // 分区必须的有消费者
                String consumer = currentPartitionConsumer.get(partition);
                if (consumer == null)
                    log.error("Expected partition '{}' to be assigned to a consumer", partition);

                //这个分区前一代被分配过 但是这一代分配数量比上一代还多 触发重新分配
                if (prevAssignment.containsKey(partition) &&
                    currentAssignment.get(consumer).size() > currentAssignment.get(prevAssignment.get(partition).consumer).size() + 1) {
                    reassignPartition(partition, currentAssignment, sortedCurrentSubscriptions, currentPartitionConsumer, prevAssignment.get(partition).consumer);
                    reassignmentPerformed = true;
                    modified = true;
                    continue;
                }

                // 因为一个分区可以分配给多个消费者 被分配的消费者数量比其他要多 可以转移
                for (String otherConsumer: partition2AllPotentialConsumers.get(partition)) {
                    if (currentAssignment.get(consumer).size() > currentAssignment.get(otherConsumer).size() + 1) {
                        reassignPartition(partition, currentAssignment, sortedCurrentSubscriptions, currentPartitionConsumer, consumer2AllPotentialPartitions);
                        reassignmentPerformed = true;
                        modified = true;
                        break;
                    }
                }
            }
        } while (modified);

        return reassignmentPerformed;
    }

需要对「主题-分区」进行遍历,来判断是否达到了平衡,如何才需要移动?

比如当前我们将「分区1」分配给了「消费者1」,这个分区可以分配给「消费者1」和「消费者2」,如果「消费者1」被分配的数量大于「消费者2」的数量+1,说明可以把这个分区移动给「消费者2」

private boolean isBalanced(Map<String, List<TopicPartition>> currentAssignment,
                               TreeSet<String> sortedCurrentSubscriptions,
                               Map<String, List<TopicPartition>> allSubscriptions) {
        //按照排序结果 被分配最大数量和最小数量相差1 说明全部消费者平衡了
        //但是这是理想情况 对于多个主题来说不太可能 这里是简单判断
        int min = currentAssignment.get(sortedCurrentSubscriptions.first()).size();
        int max = currentAssignment.get(sortedCurrentSubscriptions.last()).size();
        if (min >= max - 1)
            return true;

        // 将当前已被分区的情况进行映射
        final Map<TopicPartition, String> allPartitions = new HashMap<>();
        Set<Entry<String, List<TopicPartition>>> assignments = currentAssignment.entrySet();
        for (Map.Entry<String, List<TopicPartition>> entry: assignments) {
            List<TopicPartition> topicPartitions = entry.getValue();
            for (TopicPartition topicPartition: topicPartitions) {
                if (allPartitions.containsKey(topicPartition))
                    log.error("{} is assigned to more than one consumer.", topicPartition);
                allPartitions.put(topicPartition, entry.getKey());
            }
        }
        
        for (String consumer: sortedCurrentSubscriptions) {
            List<TopicPartition> consumerPartitions = currentAssignment.get(consumer);
            int consumerPartitionCount = consumerPartitions.size();

            // 如果当前消费者被分配了它能拥有的所有分区 则跳过
            if (consumerPartitionCount == allSubscriptions.get(consumer).size())
                continue;

            //遍历当前消费者所有能被分配的主题分区 去验证是否被分区
            //如果没有被分区 说明被其他消费者分配了 但是自己的数量被其他消费者数量少
            //说明可以把对方的分区拿过来
            List<TopicPartition> potentialTopicPartitions = allSubscriptions.get(consumer);
            for (TopicPartition topicPartition: potentialTopicPartitions) {
                if (!currentAssignment.get(consumer).contains(topicPartition)) {
                    String otherConsumer = allPartitions.get(topicPartition);
                    int otherConsumerPartitionCount = currentAssignment.get(otherConsumer).size();
                    if (consumerPartitionCount < otherConsumerPartitionCount) {
                        log.debug("{} can be moved from consumer {} to consumer {} for a more balanced assignment.",
                            topicPartition, otherConsumer, consumer);
                        return false;
                    }
                }
            }
        }
        return true;
    }

以上是判断分配是否平衡的方法,对每一个消费者进行遍历获取当前「被分配的数量」,然后遍历「当前消费者可以被分配的所有分区」,如果这里面的分区被分配给了其他消费者,其他消费者「被分配的数量」又很多,说明可以抢过来!于是不平衡

//如果触发过重分配 但是分配还不如以前
        if (!initializing && reassignmentPerformed && getBalanceScore(currentAssignment) >= getBalanceScore(preBalanceAssignment)) {
            deepCopy(preBalanceAssignment, currentAssignment);
            currentPartitionConsumer.clear();
            currentPartitionConsumer.putAll(preBalancePartitionConsumers);
        }

        // 将那些无法改变的分配恢复
        for (Entry<String, List<TopicPartition>> entry: fixedAssignments.entrySet()) {
            String consumer = entry.getKey();
            currentAssignment.put(consumer, entry.getValue());
            sortedCurrentSubscriptions.add(consumer);
        }

        fixedAssignments.clear();

经过一些骚操作过后,发现还不如原来分配得好,就回退

总结

代码虽然贼上,但是绝大部份都是转换映射关系。核心思路就是:

  • 获取上一次分配情况,结合这一次订阅主题进行枝剪,删除过时的分配情况,剩下的就是有效分配。
  • 对这一次有效分配进行平衡,如果该分区可以分配给很多个消费者,那么一定是分配给当前最少的消费者,如果不是这样就需要转移分区

CooperativeStickyAssignor

Kafka的新版有两种协议:COOPERATIVE和EAGER,上面介绍的三种重分配策略就属于EAGER,而CooperativeStickyAssignor则属于COOPERATIVE

如果是大规模集群,几百上千个集群组成,随时都在发生上线下线,订阅的改变。那么EAGER每次都会大规模重新分配,虽然有黏性平衡策略,但是还是会慢,cooperative协议将一次全局重平衡,改成每次小规模重平衡,直至最终收敛平衡的过程

@Override
    protected MemberData memberData(Subscription subscription) {
        return new MemberData(subscription.ownedPartitions(), Optional.empty());
    }

    @Override
    public Map<String, List<TopicPartition>> assign(Map<String, Integer> partitionsPerTopic,
                                                    Map<String, Subscription> subscriptions) {
        Map<String, List<TopicPartition>> assignments = super.assign(partitionsPerTopic, subscriptions);

        Map<TopicPartition, String> partitionsTransferringOwnership = super.partitionsTransferringOwnership == null ?
            computePartitionsTransferringOwnership(subscriptions, assignments) :
            super.partitionsTransferringOwnership;

        adjustAssignment(assignments, partitionsTransferringOwnership);
        return assignments;
    }

当发生重分配的时候,各个消费者会将自己的分配信息发出来保存为ownedPartitions,而这个memberData方法就是获取ownedPartitions

(这里很奇怪就是,为什么StickyAssignor是通过解析二进制数据来获取上一次分配信息,而CooperativeStickyAssignor直接通过消费者发送分配分区?)

而且通过assign代码来看,还是完整执行了AbstractStickyAssignor方法,和StickyAssignor的分配方式一样,然后下面的操作就是把重新分配的结果给删除了!

 private void adjustAssignment(Map<String, List<TopicPartition>> assignments,
                                  Map<TopicPartition, String> partitionsTransferringOwnership) {
        for (Map.Entry<TopicPartition, String> partitionEntry : partitionsTransferringOwnership.entrySet()) {
            assignments.get(partitionEntry.getValue()).remove(partitionEntry.getKey());
        }
    }

将分配好的分区给删除。。。

#总结

以上就是四种分区分配流程,对于最后一种CooperativeStickyAssignor,还需要结合服务端才可以看出来,坑留到后面解决