先说结论:在flink中消费kafka数据如果过要进行kafka的动态分区发现需要配置参数
flink.partition-discovery.interval-millis
表示多久刷新一次分区。
代码跟踪:
FlinkKafkaConsumer类中的属性和构造方法
/** Configuration key to define the consumer's partition discovery interval, in milliseconds. */
public static final String KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS =
"flink.partition-discovery.interval-millis";
public FlinkKafkaConsumer(
Pattern subscriptionPattern,
KafkaDeserializationSchema<T> deserializer,
Properties props) {
this(null, subscriptionPattern, deserializer, props);
}
private FlinkKafkaConsumer(
List<String> topics,
Pattern subscriptionPattern,
KafkaDeserializationSchema<T> deserializer,
Properties props) {
super(
topics,
subscriptionPattern,
deserializer,
getLong(
checkNotNull(props, "props"),
KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS,
PARTITION_DISCOVERY_DISABLED),
!getBoolean(props, KEY_DISABLE_METRICS, false));
发现这个配置在其父类FlinkKafkaConsumerBase的构造器中进行处理
public FlinkKafkaConsumerBase(
List<String> topics,
Pattern topicPattern,
KafkaDeserializationSchema<T> deserializer,
long discoveryIntervalMillis,
boolean useMetrics) {
this.topicsDescriptor = new KafkaTopicsDescriptor(topics, topicPattern);
this.deserializer = checkNotNull(deserializer, "valueDeserializer");
checkArgument(
discoveryIntervalMillis == PARTITION_DISCOVERY_DISABLED
|| discoveryIntervalMillis >= 0,
"Cannot define a negative value for the topic / partition discovery interval.");
this.discoveryIntervalMillis = discoveryIntervalMillis;
this.useMetrics = useMetrics;
}
那么这个参数在那里生效,我们接着往下看。
在FlinkKafkaConsumerBase 类中有个方法
public void run(SourceContext<T> sourceContext) throws Exception
这个方法就实际读取kafka数据的方法,这个方法里面有这么段代码
// depending on whether we were restored with the current state version (1.3),
// remaining logic branches off into 2 paths:
// 1) New state - partition discovery loop executed as separate thread, with this
// thread running the main fetcher loop
// 2) Old state - partition discovery is disabled and only the main fetcher loop is
// executed
if (discoveryIntervalMillis == PARTITION_DISCOVERY_DISABLED) {
kafkaFetcher.runFetchLoop();
} else {
runWithPartitionDiscovery();
}
可以看到如果我们配置了PARTITION_DISCOVERY_DISABLED参数,则会走下面的分支:
private void runWithPartitionDiscovery() throws Exception {
final AtomicReference<Exception> discoveryLoopErrorRef = new AtomicReference<>();
createAndStartDiscoveryLoop(discoveryLoopErrorRef);
kafkaFetcher.runFetchLoop();
// make sure that the partition discoverer is waked up so that
// the discoveryLoopThread exits
partitionDiscoverer.wakeup();
joinDiscoveryLoopThread();
// rethrow any fetcher errors
final Exception discoveryLoopError = discoveryLoopErrorRef.get();
if (discoveryLoopError != null) {
throw new RuntimeException(discoveryLoopError);
}
}
这个方法里面我们会唤醒动态分区发现的线程来处理kafka的动态分区情况。
具体的处理方式我们可以看到AbstractPartitionDiscoverer 这个抽象类下面的子类。
选取对应kafka版本的处理来具体查看实现。