Kafka源码分析4 Metadata 什么时候更新的

191 阅读5分钟

个人Github地址 github.com/AmbitionTn/… , 希望下面的文章对大家有所帮助,要是觉得感兴趣可以关注一下,会持续更新,持续学习中

Kafka源码版本基于 kafka3.3

上一篇文章:Kafka源码分析3 Sender线程做了什么

概述

上一篇文章说明了KafkaProducer的sender线程到底做了什么,是如何异步从accumulator中拉取缓存的消息,如果通过NetwokClient和Selector进行发送的,这篇文章主要说明,存储accumulator需要获取到集群的metadata信息,那么metadata是什么时候获取的呢?

doSend

在doSend中只能看到

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
    // Append callback takes care of the following:
    //  - call interceptors and user callback on completion
    //  - remember partition that is calculated in RecordAccumulator.append
    AppendCallbacks<K, V> appendCallbacks = new AppendCallbacks<K, V>(callback, this.interceptors, record);

    try {
        /**
         * 步骤1 校验sender状态
         * 通过判断Sender线程running属性
         * 判断发送线程是否关闭,如果发送线程关闭,那么将不能够继续发送,抛出异常
         */
        throwIfProducerClosed();
        // first make sure the metadata for the topic is available
        long nowMs = time.milliseconds();
        ClusterAndWaitTime clusterAndWaitTime;
        try {
            /**
             * 步骤2 获取集群元数据信息
             * 等待获取集群元数据信息
             * maxBlockTimeMs 最大等待时间
             */
            clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), nowMs, maxBlockTimeMs);
        } catch (KafkaException e) {
            if (metadata.isClosed())
                throw new KafkaException("Producer closed while send in progress", e);
            throw e;
        }
        // 省略其余代码
        ...
    }
}
  • 从dosend中可以看到,clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), nowMs, maxBlockTimeMs);在dosend的时候是需要获取metadata数据的。
private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long nowMs, long maxWaitMs) throws InterruptedException {
    // add topic to metadata topic list if it is not there already and reset expiry
    Cluster cluster = metadata.fetch();

    if (cluster.invalidTopics().contains(topic))
        throw new InvalidTopicException(topic);

    metadata.add(topic, nowMs);

    Integer partitionsCount = cluster.partitionCountForTopic(topic);
    // Return cached metadata if we have it, and if the record's partition is either undefined
    // or within the known partition range
    if (partitionsCount != null && (partition == null || partition < partitionsCount))
        return new ClusterAndWaitTime(cluster, 0);

    long remainingWaitMs = maxWaitMs;
    long elapsed = 0;
    // Issue metadata requests until we have metadata for the topic and the requested partition,
    // or until maxWaitTimeMs is exceeded. This is necessary in case the metadata
    // is stale and the number of partitions for this topic has increased in the meantime.
    long nowNanos = time.nanoseconds();
    do {
        if (partition != null) {
            log.trace("Requesting metadata update for partition {} of topic {}.", partition, topic);
        } else {
            log.trace("Requesting metadata update for topic {}.", topic);
        }
        metadata.add(topic, nowMs + elapsed);
        int version = metadata.requestUpdateForTopic(topic);
        sender.wakeup();
        try {
            metadata.awaitUpdate(version, remainingWaitMs);
        } catch (TimeoutException ex) {
            // Rethrow with original maxWaitMs to prevent logging exception with remainingWaitMs
            throw new TimeoutException(
                    String.format("Topic %s not present in metadata after %d ms.",
                            topic, maxWaitMs));
        }
        cluster = metadata.fetch();
        elapsed = time.milliseconds() - nowMs;
        if (elapsed >= maxWaitMs) {
            throw new TimeoutException(partitionsCount == null ?
                    String.format("Topic %s not present in metadata after %d ms.",
                            topic, maxWaitMs) :
                    String.format("Partition %d of topic %s with partition count %d is not present in metadata after %d ms.",
                            partition, topic, partitionsCount, maxWaitMs));
        }
        metadata.maybeThrowExceptionForTopic(topic);
        remainingWaitMs = maxWaitMs - elapsed;
        partitionsCount = cluster.partitionCountForTopic(topic);
    } while (partitionsCount == null || (partition != null && partition >= partitionsCount));

    producerMetrics.recordMetadataWait(time.nanoseconds() - nowNanos);

    return new ClusterAndWaitTime(cluster, elapsed);
}
  • waitOnMetadata 方法主要通过优先从缓存获取,如果缓存中国没有那么标记需要更新然后阻塞等待sender线程poll集群metadata后继续执行。
    • metadata.fetch():从缓存中拉取Metadata数据
    • metadata.awaitUpdate(version, remainingWaitMs);
    • metadata.requestUpdateForTopic(topic); 更新metadata为需要更新状态
/**
 * Wait for metadata update until the current version is larger than the last version we know of
 */
public synchronized void awaitUpdate(final int lastVersion, final long timeoutMs) throws InterruptedException {
    long currentTimeMs = time.milliseconds();
    long deadlineMs = currentTimeMs + timeoutMs < 0 ? Long.MAX_VALUE : currentTimeMs + timeoutMs;
    time.waitObject(this, () -> {
        // Throw fatal exceptions, if there are any. Recoverable topic errors will be handled by the caller.
        maybeThrowFatalException();
        return updateVersion() > lastVersion || isClosed();
    }, deadlineMs);

    if (isClosed())
        throw new KafkaException("Requested metadata update after close");
}
public synchronized int requestUpdateForTopic(String topic) {
    if (newTopics.contains(topic)) {
        return requestUpdateForNewTopics();
    } else {
        return requestUpdate();
    }
}

public synchronized int requestUpdateForNewTopics() {
    // Override the timestamp of last refresh to let immediate update.
    this.lastRefreshMs = 0;
    this.needPartialUpdate = true;
    this.requestVersion++;
    return this.updateVersion;
}
  • 从上面可以看到在dosend()发送的时候只是从缓存中获取metadata而已,如果获取不到就会阻塞并且更新metadata为需要更新状态,等待sender线程异步获取并更新版本号。

sender线程做的事

runOnce: sender的runOnce方法

void runOnce() {
    /**
     * 判断是否为事务操作
     */
    if (transactionManager != null) {
        try {
            transactionManager.maybeResolveSequences();

            // do not continue sending if the transaction manager is in a failed state
            /// 如果事务管理器有错误,那么停止运行,不再继续发送
            if (transactionManager.hasFatalError()) {
                RuntimeException lastError = transactionManager.lastError();
                if (lastError != null)
                    maybeAbortBatches(lastError);
                client.poll(retryBackoffMs, time.milliseconds());
                return;
            }

            // Check whether we need a new producerId. If so, we will enqueue an InitProducerId
            // request which will be sent below
            // 检查是否需要重新生成事务Id,如果需要会重新生成一个
            transactionManager.bumpIdempotentEpochAndResetIdIfNeeded();

            if (maybeSendAndPollTransactionalRequest()) {
                return;
            }
        } catch (AuthenticationException e) {
            // This is already logged as error, but propagated here to perform any clean ups.
            log.trace("Authentication exception while processing transactional request", e);
            transactionManager.authenticationFailed(e);
        }
    }

    long currentTimeMs = time.milliseconds();
    // 发送数据
    long pollTimeout = sendProducerData(currentTimeMs);
    // 执行真正的网络请求
    client.poll(pollTimeout, currentTimeMs);
}
  • client.poll(): 执行真正的网络请求,包含发送消息,获取元数据信息

poll: 执行真正的网络请求,于broker打交道

public List<ClientResponse> poll(long timeout, long now) {
    ensureActive();

    if (!abortedSends.isEmpty()) {
        // If there are aborted sends because of unsupported version exceptions or disconnects,
        // handle them immediately without waiting for Selector#poll.
        List<ClientResponse> responses = new ArrayList<>();
        handleAbortedSends(responses);
        completeResponses(responses);
        return responses;
    }

    long metadataTimeout = metadataUpdater.maybeUpdate(now);
    try {
        this.selector.poll(Utils.min(timeout, metadataTimeout, defaultRequestTimeoutMs));
    } catch (IOException e) {
        log.error("Unexpected error during I/O", e);
    }

    // process completed actions
    long updatedNow = this.time.milliseconds();
    List<ClientResponse> responses = new ArrayList<>();
    handleCompletedSends(responses, updatedNow);
    handleCompletedReceives(responses, updatedNow);
    handleDisconnections(responses, updatedNow);
    handleConnections();
    handleInitiateApiVersionRequests(updatedNow);
    handleTimedOutConnections(responses, updatedNow);
    handleTimedOutRequests(responses, updatedNow);
    completeResponses(responses);

    return responses;
}

handleCompletedReceives: 处理完成的消息

private void handleCompletedReceives(List<ClientResponse> responses, long now) {
    for (NetworkReceive receive : this.selector.completedReceives()) {
        String source = receive.source();
        InFlightRequest req = inFlightRequests.completeNext(source);

        AbstractResponse response = parseResponse(receive.payload(), req.header);
        if (throttleTimeSensor != null)
            throttleTimeSensor.record(response.throttleTimeMs(), now);

        if (log.isDebugEnabled()) {
            log.debug("Received {} response from node {} for request with header {}: {}",
                req.header.apiKey(), req.destination, req.header, response);
        }

        // If the received response includes a throttle delay, throttle the connection.
        maybeThrottle(response, req.header.apiVersion(), req.destination, now);
        // 如果是内部请求,并且响应类型为 MetadataResponse 
        if (req.isInternalRequest && response instanceof MetadataResponse)
            // 更新metadatata
            metadataUpdater.handleSuccessfulResponse(req.header, now, (MetadataResponse) response);
        else if (req.isInternalRequest && response instanceof ApiVersionsResponse)
            handleApiVersionsResponse(responses, req, now, (ApiVersionsResponse) response);
        else
            responses.add(req.completed(response, now));
    }
}
  • metadataUpdater.handleSuccessfulResponse: 更新metadata信息
@Override
public void handleSuccessfulResponse(RequestHeader requestHeader, long now, MetadataResponse response) {
    // If any partition has leader with missing listeners, log up to ten of these partitions
    // for diagnosing broker configuration issues.
    // This could be a transient issue if listeners were added dynamically to brokers.
    List<TopicPartition> missingListenerPartitions = response.topicMetadata().stream().flatMap(topicMetadata ->
        topicMetadata.partitionMetadata().stream()
            .filter(partitionMetadata -> partitionMetadata.error == Errors.LISTENER_NOT_FOUND)
            .map(partitionMetadata -> new TopicPartition(topicMetadata.topic(), partitionMetadata.partition())))
        .collect(Collectors.toList());
    if (!missingListenerPartitions.isEmpty()) {
        int count = missingListenerPartitions.size();
        log.warn("{} partitions have leader brokers without a matching listener, including {}",
                count, missingListenerPartitions.subList(0, Math.min(10, count)));
    }

    // Check if any topic's metadata failed to get updated
    Map<String, Errors> errors = response.errors();
    if (!errors.isEmpty())
        log.warn("Error while fetching metadata with correlation id {} : {}", requestHeader.correlationId(), errors);

    // When talking to the startup phase of a broker, it is possible to receive an empty metadata set, which
    // we should retry later.
    if (response.brokers().isEmpty()) {
        log.trace("Ignoring empty metadata response with correlation id {}.", requestHeader.correlationId());
        this.metadata.failedUpdate(now);
    } else {
        // 更新元数据信息
        this.metadata.update(inProgress.requestVersion, response, inProgress.isPartialUpdate, now);
    }

    inProgress = null;
}

update: 更新缓存中的metadata cache信息

public synchronized void update(int requestVersion, MetadataResponse response, boolean isPartialUpdate, long nowMs) {
    Objects.requireNonNull(response, "Metadata response cannot be null");
    if (isClosed())
        throw new IllegalStateException("Update requested after metadata close");

    // 判断请求版本是否小于当前版本,如果小于需要继续更新
    this.needPartialUpdate = requestVersion < this.requestVersion;
    // 最新刷新时间
    this.lastRefreshMs = nowMs;
    // 版本增增加1
    this.updateVersion += 1;
    if (!isPartialUpdate) {
        this.needFullUpdate = false;
        this.lastSuccessfulRefreshMs = nowMs;
    }

    String previousClusterId = cache.clusterResource().clusterId();
    // 将响应的请求处理成metaCache对象,并赋值
    this.cache = handleMetadataResponse(response, isPartialUpdate, nowMs);

    Cluster cluster = cache.cluster();
    maybeSetMetadataError(cluster);

    this.lastSeenLeaderEpochs.keySet().removeIf(tp -> !retainTopic(tp.topic(), false, nowMs));

    String newClusterId = cache.clusterResource().clusterId();
    if (!Objects.equals(previousClusterId, newClusterId)) {
        log.info("Cluster ID: {}", newClusterId);
    }
    // 更新集群资源信息
    clusterResourceListeners.onUpdate(cache.clusterResource());

    log.debug("Updated cluster metadata updateVersion {} to {}", this.updateVersion, this.cache);
}

总结

在本篇文章中,主要说明了doSend方法在将消息放入accumulator缓存的时候是如何知道集群的原数据信息的,是如何等待元数据信息获取的。以及Kafka Producer通过sender线程去异步获取metadata数据的是如何做的。

  1. 在设计系统的过程中可以考虑懒加载,对于Kafka原数据就是采用懒加载的方式,第一次进来时没有加载的,只是标记一下需要强制加载的标记,后面执行client.poll()才真正完成数据加载。