今天有朋友提出了一个非常有意思的问题,window是否是提前创建好?有几个朋友都说会提前创建好,搞得我自己都差点怀疑人生。觉得有必要记录一下。
以下为讨论内容:
A:
首先窗口是按照设定的时间大小来系统划定的,跟数据没关系,假如设定窗口大小是3s的滚动窗口,那么系统会划分好窗口区间是[00:00:00,00:00:03)[00:00:03,00:00:06)
[00:00:06,00:00:09)
[00:00:09,00:00:12)
[00:00:12,00:00:15)
[00:00:15,00:00:18)
[00:00:18,00:00:21)
[00:00:21,00:00:24)
[00:00:24,00:00:27)
[00:00:27,00:00:30)
[00:00:30,00:00:33)
[00:00:33,00:00:36)
[00:00:36,00:00:39)
[00:00:39,00:00:42)
[00:00:42,00:00:45)
[00:00:45,00:00:48)
[00:00:48,00:00:51)
[00:00:51,00:00:54)
[00:00:54,00:00:57)
[00:00:57,00:01:00)
,假如现在有第一条数据event time是2018-10-01
10:11:22
那么它会落在2018-10-01 [10:11:21,10:11:24)窗口,那么根据触发条件公式:必须要有一条event time时间是:2018-10-01
10:11:34数据进来才能触发[10:11:21 10:11:24)这个窗口计算,但是不包括2018-10-01
10:11:34这条数据
我:问题来了,window是事先被创建好还是来数据才会创建?
A:
事先创建好的,只要你给定时间窗口大小,系统就会给你划分好,
从格林威时间1970-01-01 00:00:00到你系统时间来划分区间
我:
比如现在eventTime是中午12点,那么会事先创建12点以前的window吗
A:window和数据没有关系
我:事先创建好window,那么会创建到哪个时间点?无限吗,还是1天的量就够了,还是根据现有的时间,再逐个创建出来
然后我就根据这个问题进行了查证,
首先这个应该是阿里云邪大佬的博客,里面有说到可能会创建新窗口,如下:
但是我之前没有验证过,所以这次我直接去看了下源码,如下:
程序刚开始运行时,会依次调用如下:
1.
// ------------------------------------------------------------------------
// Windowing
// ------------------------------------------------------------------------
/**
* Windows this {@code KeyedStream} into tumbling time windows.
*
* <p>This is a shortcut for either {@code .window(TumblingEventTimeWindows.of(size))} or
* {@code .window(TumblingProcessingTimeWindows.of(size))} depending on the time characteristic
* set using
* {@link org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)}
*
* @param size The size of the window.
*/
public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size) {
if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) {
return window(TumblingProcessingTimeWindows.of(size));
} else {
return window(TumblingEventTimeWindows.of(size));
}
}
2.
/**
* Windows this data stream to a {@code WindowedStream}, which evaluates windows
* over a key grouped stream. Elements are put into windows by a {@link WindowAssigner}. The
* grouping of elements is done both by key and by window.
*
* <p>A {@link org.apache.flink.streaming.api.windowing.triggers.Trigger} can be defined to
* specify when windows are evaluated. However, {@code WindowAssigners} have a default
* {@code Trigger} that is used if a {@code Trigger} is not specified.
*
* @param assigner The {@code WindowAssigner} that assigns elements to windows.
* @return The trigger windows data stream.
*/
@PublicEvolving
public <W extends Window> WindowedStream<T, KEY, W> window(WindowAssigner<? super T, W> assigner) {
//注意:这里不要看到new WindowedStream就以为是创建好window,这里只是创建Stream流对象,并不是窗口
return new WindowedStream<>(this, assigner);
}
3.
@PublicEvolving
public WindowedStream(KeyedStream<T, K> input,
WindowAssigner<? super T, W> windowAssigner) {
this.input = input;
this.windowAssigner = windowAssigner;
this.trigger = windowAssigner.getDefaultTrigger(input.getExecutionEnvironment());
}
由以上可知:程序刚启动时,会创建好WindowedStream流对象,但是这里没有说创建窗口哦
那么窗口是在哪里创建?debug如下:
当我输入一条数据:
然后程序debug得到:
由上图可知:
window的创建是和当前进来的数据和数据所以带的时间戳有关系,而并不是系统会事先创建好许多window
如有不对之处,请指出