flink中消费kafka数据如果过要进行kafka的动态分区发现

199 阅读1分钟

先说结论:在flink中消费kafka数据如果过要进行kafka的动态分区发现需要配置参数


flink.partition-discovery.interval-millis

表示多久刷新一次分区。

代码跟踪:

FlinkKafkaConsumer类中的属性和构造方法


/** Configuration key to define the consumer's partition discovery interval, in milliseconds. */

public static final String KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS =

"flink.partition-discovery.interval-millis";

public FlinkKafkaConsumer(

Pattern subscriptionPattern,

KafkaDeserializationSchema<T> deserializer,

Properties props) {

this(null, subscriptionPattern, deserializer, props);

}

private FlinkKafkaConsumer(

List<String> topics,

Pattern subscriptionPattern,

KafkaDeserializationSchema<T> deserializer,

Properties props) {

super(

topics,

subscriptionPattern,

deserializer,

getLong(

checkNotNull(props, "props"),

KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS,

PARTITION_DISCOVERY_DISABLED),

!getBoolean(props, KEY_DISABLE_METRICS, false));

发现这个配置在其父类FlinkKafkaConsumerBase的构造器中进行处理


public FlinkKafkaConsumerBase(

List<String> topics,

Pattern topicPattern,

KafkaDeserializationSchema<T> deserializer,

long discoveryIntervalMillis,

boolean useMetrics) {

this.topicsDescriptor = new KafkaTopicsDescriptor(topics, topicPattern);

this.deserializer = checkNotNull(deserializer, "valueDeserializer");

checkArgument(

discoveryIntervalMillis == PARTITION_DISCOVERY_DISABLED

|| discoveryIntervalMillis >= 0,

"Cannot define a negative value for the topic / partition discovery interval.");

this.discoveryIntervalMillis = discoveryIntervalMillis;

this.useMetrics = useMetrics;

}

那么这个参数在那里生效,我们接着往下看。

在FlinkKafkaConsumerBase 类中有个方法


public void run(SourceContext<T> sourceContext) throws Exception

这个方法就实际读取kafka数据的方法,这个方法里面有这么段代码


// depending on whether we were restored with the current state version (1.3),

// remaining logic branches off into 2 paths:

// 1) New state - partition discovery loop executed as separate thread, with this

// thread running the main fetcher loop

// 2) Old state - partition discovery is disabled and only the main fetcher loop is

// executed

if (discoveryIntervalMillis == PARTITION_DISCOVERY_DISABLED) {

kafkaFetcher.runFetchLoop();

} else {

runWithPartitionDiscovery();

}

可以看到如果我们配置了PARTITION_DISCOVERY_DISABLED参数,则会走下面的分支:


private void runWithPartitionDiscovery() throws Exception {

final AtomicReference<Exception> discoveryLoopErrorRef = new AtomicReference<>();

createAndStartDiscoveryLoop(discoveryLoopErrorRef);

kafkaFetcher.runFetchLoop();

// make sure that the partition discoverer is waked up so that

// the discoveryLoopThread exits

partitionDiscoverer.wakeup();

joinDiscoveryLoopThread();

// rethrow any fetcher errors

final Exception discoveryLoopError = discoveryLoopErrorRef.get();

if (discoveryLoopError != null) {

throw new RuntimeException(discoveryLoopError);

}

}

这个方法里面我们会唤醒动态分区发现的线程来处理kafka的动态分区情况。

具体的处理方式我们可以看到AbstractPartitionDiscoverer 这个抽象类下面的子类。

PartitionDiscoverer.png

选取对应kafka版本的处理来具体查看实现。