Kafka源码解析之MetaData更新

1,723 阅读7分钟

前言

kafka客户端通过CLuster类来封装整个kafka集群的信息,Metadata类可以认为是Cluster类的上一层封装,主要功能是集群信息更新的管理。本文主要介绍3个部分的内容:MetaData是什么,MetaData如何更新数据的流程,MetaData更新触发的场景

MetaData是什么

MetaData的类的介绍

从uml中可以看出,该类有很多记录刷新相关的属性,这些属性都是服务于Cluster,来看一下Cluster类:

Cluster类基本保存了所有的kafka集群相关的信息:

  • Node: broker节点的相关的信息:
    • host,port等相关信息,用于socket的连接
  • PartitionInfo: 分区信息
    • topic名称,partition 分区序号
    • leader: 当前分区的leader,唯一进行通信的节点
    • inSyncReplicas: ISR的节点(follower角色)
  • TopicPartition: 每个topic和每个分区组成的唯一索引。代表一个分区标识

从上面的映射关系,可以看到,kafka是以分区为最小管理粒度,然后分区中存在一个Leader负责交互。Cluster代表整个kafka集群的一个实体类,MetaData的角色相当于Cluster的在客户端的维护者

MetaData如何进行更新

在讲解Producer的时候已经描述过,每个Producer新建的时候,都会创建一个新的metaData,在第一次调用doSend()时会存在一个阻塞等待更新的函数:

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
    ...
    // first make sure the metadata for the topic is available
    ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
    long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
    Cluster cluster = clusterAndWaitTime.cluster;
    ...
}

来看一下waitOnMetadata方法:

//等待更新metaData
private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long maxWaitMs) throws InterruptedException {
        //如果metadata不存在当前topic的元数据,会触发一次强制刷新,metaData中的needUpdate置为true
        metadata.add(topic);
        Cluster cluster = metadata.fetch();
        Integer partitionsCount = cluster.partitionCountForTopic(topic);
        // Return cached metadata if we have it, and if the record's partition is either undefined
        // or within the known partition range
        if (partitionsCount != null && (partition == null || partition < partitionsCount))
            return new ClusterAndWaitTime(cluster, 0);

        long begin = time.milliseconds();
        long remainingWaitMs = maxWaitMs;
        long elapsed;
        // Issue metadata requests until we have metadata for the topic or maxWaitTimeMs is exceeded.
        // In case we already have cached metadata for the topic, but the requested partition is greater
        // than expected, issue an update request only once. This is necessary in case the metadata
        // is stale and the number of partitions for this topic has increased in the meantime.
        do {
            log.trace("Requesting metadata update for topic {}.", topic);
            metadata.add(topic);
            //该标记位用于保存当前的版本号,如果大于这个版本,表示更新成功了
            int version = metadata.requestUpdate();
            //唤醒sender线程,立即执行更新命令
            sender.wakeup();
            try {
                //  阻塞等待更新成功
                metadata.awaitUpdate(version, remainingWaitMs);
            } catch (TimeoutException ex) {
                throw new TimeoutException("Failed to update metadata after " + maxWaitMs + " ms.");
            }
            //到这里,应该是当前cluster更新成功了
            cluster = metadata.fetch();
            elapsed = time.milliseconds() - begin;
            if (elapsed >= maxWaitMs)
                throw new TimeoutException("Failed to update metadata after " + maxWaitMs + " ms.");
            if (cluster.unauthorizedTopics().contains(topic))
                throw new TopicAuthorizationException(topic);
            remainingWaitMs = maxWaitMs - elapsed;
            partitionsCount = cluster.partitionCountForTopic(topic);
        //一直等到topic信息更新成功,或者超时抛出异常
        } while (partitionsCount == null);
        // 这里表示前面的消息投递分配策略存在问题
        if (partition != null && partition >= partitionsCount) {
            throw new KafkaException(
                    String.format("Invalid partition given with record: %d is not in the range [0...%d).", partition, partitionsCount));
        }

        return new ClusterAndWaitTime(cluster, elapsed);
    }

从上面的代码中,可以看到执行真正更新流程的是sender类,顾名思义,他的作用就是不断的触发消息的发送,而真正执行发送的是NetworkClient。NetworkClient负责实际元数据更新命令的发送和响应处理,来看一下 。NetworkClient.poll(pollTimeout, now)方法

@Override
    public List<ClientResponse> poll(long timeout, long now) {
        ...
        long metadataTimeout = metadataUpdater.maybeUpdate(now);
        try {
            this.selector.poll(Utils.min(timeout, metadataTimeout, requestTimeoutMs));
        } catch (IOException e) {
            log.error("Unexpected error during I/O", e);
        }

        // process completed actions
        long updatedNow = this.time.milliseconds();
        List<ClientResponse> responses = new ArrayList<>();
        handleCompletedSends(responses, updatedNow);
        handleCompletedReceives(responses, updatedNow);
        handleDisconnections(responses, updatedNow);
        handleConnections();
        handleInitiateApiVersionRequests(updatedNow);
        handleTimedOutRequests(responses, updatedNow);
        completeResponses(responses);
        return responses;

  • metadataUpdater.maybeUpdate(now) 该方法主要是判断是否需要更新元数据,如果需要,则发送更新命令,然后返回最大等待时间
  • handleCompletedReceives(responses, updatedNow) 在这个方法中处理更新命令的返回

来看一下maybeUpdate

 @Override
public long maybeUpdate(long now) {
    // 这里表示定时刷新的策略下还需要等多久
    long timeToNextMetadataUpdate = metadata.timeToNextUpdate(now);
    //如果已经在更新中,等待结果时,则返回请求超时时间
    long waitForMetadataFetch = this.metadataFetchInProgress ? requestTimeoutMs : 0;
    //取两个最大值
    long metadataTimeout = Math.max(timeToNextMetadataUpdate, waitForMetadataFetch);
    if (metadataTimeout > 0) {
        return metadataTimeout;
    }
    //到这里表示需要立即更新,取最空闲的节点
    Node node = leastLoadedNode(now);
    if (node == null) {
        log.debug("Give up sending metadata request since no node is available");
        return reconnectBackoffMs;
    }
    //执行更新操作
    return maybeUpdate(now, node);
}

 private long maybeUpdate(long now, Node node) {
    String nodeConnectionId = node.idString();
    //判断当前node的状态
    if (canSendRequest(nodeConnectionId)) {
        //标记正在处理,防止并发请求更新
        this.metadataFetchInProgress = true;
        //这里是构建不同的请求帧
        MetadataRequest.Builder metadataRequest;
        if (metadata.needMetadataForAllTopics())
            metadataRequest = MetadataRequest.Builder.allTopics();
        else
            metadataRequest = new MetadataRequest.Builder(new ArrayList<>(metadata.topics()),
                    metadata.allowAutoTopicCreation());


        log.debug("Sending metadata request {} to node {}", metadataRequest, node);
        //发送具体爹请求信息
        sendInternalMetadataRequest(metadataRequest, nodeConnectionId, now);
        返回请求超时间
        return requestTimeoutMs;
    }

   //判断是否Node正在连接
    if (isAnyNodeConnecting()) {
        //重连超时时间
        return reconnectBackoffMs;
    }

    //如果存在可用的Node,则尝试初始化连接
    if (connectionStates.canConnect(nodeConnectionId, now)) {
        initiateConnect(node, now);
        return reconnectBackoffMs;
    }
    //到这里就表示game off,阻塞等待有新的节点使用    
    return Long.MAX_VALUE;
}

上面的更新中包含了所有的情况:

  • 如果 node 可以发送请求,则直接发送请求;
  • 如果该 node 正在建立连接,则直接返回重新连接超时时间,等待更新成功;
  • 如果该 node 还没建立连接,则向 broker 初始化链接。 而 KafkaProducer 线程之前是一直阻塞在两个 while 循环中,直到 metadata 更新

从上述的代码中可以看出:整个流程会一直重试。知道Cluster数据更新成功

  • sender 线程第一次调用 poll() 方法时,初始化与 node 的连接;
  • sender 线程第二次调用 poll() 方法时,发送 Metadata 请求;
  • sender阻塞等待一定时间,如果有响应返回,则获取 metadataResponse,并更新 metadata

如果cluster更新成功后,producer就不会被阻塞,可以顺畅工作了,NetworkClient 接收到 Server 端对 Metadata 请求的响应后,更新 Metadata 信息。

private void handleCompletedReceives(List<ClientResponse> responses, long now) {
        for (NetworkReceive receive : this.selector.completedReceives()) {
            ...
            //如果是MetadataResponse类的响应,交由metadataUpdater来处理
            if (req.isInternalRequest && body instanceof MetadataResponse)
                metadataUpdater.handleCompletedMetadataResponse(req.header, now, (MetadataResponse) body);
            ...
        }
    }

那么继续看metadataUpdater

 @Override
public void handleCompletedMetadataResponse(RequestHeader requestHeader, long now, MetadataResponse response) {
    this.metadataFetchInProgress = false;
    //从响应中返回集群信息
    Cluster cluster = response.cluster();
    Map<String, Errors> errors = response.errors();
    if (cluster.nodes().size() > 0) {
        //然后更新到metadata中
        this.metadata.update(cluster, response.unavailableTopics(), now);
    } else {
        log.trace("Ignoring empty metadata response with correlation id {}.", requestHeader.correlationId());
        this.metadata.failedUpdate(now, null);
    }
}
 

具体response如何解析,有兴趣的同学可以自行看一下。然后看一下metadata的成功处理和失败处理

public synchronized void update(Cluster cluster, Set<String> unavailableTopics, long now) {
        Objects.requireNonNull(cluster, "cluster should not be null");
        //更新成功更新时间
        this.needUpdate = false;
        this.lastRefreshMs = now;
        this.lastSuccessfulRefreshMs = now;
        //增加版本号
        this.version += 1;
        //如果topic存在过期刷新的配置,则刷新时间
        if (topicExpiryEnabled) {
            for (Iterator<Map.Entry<String, Long>> it = topics.entrySet().iterator(); it.hasNext(); ) {
                Map.Entry<String, Long> entry = it.next();
                long expireMs = entry.getValue();
                if (expireMs == TOPIC_EXPIRY_NEEDS_UPDATE)
                    entry.setValue(now + TOPIC_EXPIRY_MS);
                else if (expireMs <= now) {
                    it.remove();
                    log.debug("Removing unused topic {} from the metadata list, expiryMs {} now {}", entry.getKey(), expireMs, now);
                }
            }
        }
        //通知事件监听者
        for (Listener listener: listeners)
            listener.onMetadataUpdate(cluster, unavailableTopics);
    
        String previousClusterId = cluster.clusterResource().clusterId();
        //全局更新和局部更新的差别
        if (this.needMetadataForAllTopics) {
            this.needUpdate = false;
            this.cluster = getClusterForCurrentTopics(cluster);
        } else {
            this.cluster = cluster;
        }
        
        if (!cluster.isBootstrapConfigured()) {
            String clusterId = cluster.clusterResource().clusterId();
            if (clusterId == null ? previousClusterId != null : !clusterId.equals(previousClusterId))
                log.info("Cluster ID: {}", cluster.clusterResource().clusterId());
            clusterResourceListeners.onUpdate(cluster.clusterResource());
        }
        //唤醒所有的wait
        notifyAll();
        log.debug("Updated cluster metadata version {} to {}", this.version, this.cluster);
    }
 
 
    public synchronized void failedUpdate(long now, AuthenticationException authenticationException) {
      //只刷新最后更新的时间
       this.lastRefreshMs = now;
        this.authenticationException = authenticationException;
        if (authenticationException != null)
            this.notifyAll();
    }

到这里,更新就结束了

MetaData更新触发的场景

metaData的默认机制是定时更新, 可以看一下metadata.timeToNextUpdate(now)的实现

public synchronized long timeToNextUpdate(long nowMs) {
    long timeToExpire = needUpdate ? 0 : Math.max(this.lastSuccessfulRefreshMs + this.metadataExpireMs - nowMs, 0);
    long timeToAllowUpdate = this.lastRefreshMs + this.refreshBackoffMs - nowMs;
    return Math.max(timeToExpire, timeToAllowUpdate);
}
  1. 如果needUpdate为true,则表示触发立即刷新
  2. metadataExpireMs 表示metadata信息有效的周期,由配置项metadataExpireMs决定。默认是5分钟
  3. refreshBackoffMs 失败重试时间,防止在某些场景下,做无谓的尝试。

来看一下强制进行刷新的场景:

  • 在NetworkClient调用poll存在超时请求时:handleTimedOutRequests
  • 在NetworkClient调用poll处理断开连接时:handleDisconnections
  • 初始化一个Node连接时,会进行强制刷新:
  • 发送消息时,如果无法找到 partition 的 leader;
  • 处理 Producer 响应时,如果返回关于 Metadata 过期的异常(InvalidMetadataException) 和 topic/partition不存在时(UnknownTopicOrPartitionException)
  • 发送消息时,如果无法找到 partition 的 leader