flink之window细节详解

1,028 阅读4分钟

今天有朋友提出了一个非常有意思的问题,window是否是提前创建好?有几个朋友都说会提前创建好,搞得我自己都差点怀疑人生。觉得有必要记录一下。

以下为讨论内容:

A:
首先窗口是按照设定的时间大小来系统划定的,跟数据没关系,假如设定窗口大小是3s的滚动窗口,那么系统会划分好窗口区间是[00:00:00,00:00:03)[00:00:03,00:00:06) 
[00:00:06,00:00:09) 
[00:00:09,00:00:12) 
[00:00:12,00:00:15) 
[00:00:15,00:00:18) 
[00:00:18,00:00:21) 
[00:00:21,00:00:24) 
[00:00:24,00:00:27) 
[00:00:27,00:00:30) 
[00:00:30,00:00:33) 
[00:00:33,00:00:36) 
[00:00:36,00:00:39) 
[00:00:39,00:00:42) 
[00:00:42,00:00:45) 
[00:00:45,00:00:48) 
[00:00:48,00:00:51) 
[00:00:51,00:00:54) 
[00:00:54,00:00:57) 
[00:00:57,00:01:00)
,假如现在有第一条数据event time是2018-10-01 
10:11:22
那么它会落在2018-10-01 [10:11:21,10:11:24)窗口,那么根据触发条件公式:必须要有一条event time时间是:2018-10-01 
10:11:34数据进来才能触发[10:11:21 10:11:24)这个窗口计算,但是不包括2018-10-01 
10:11:34这条数据

我:问题来了,window是事先被创建好还是来数据才会创建?

A:
事先创建好的,只要你给定时间窗口大小,系统就会给你划分好,
从格林威时间1970-01-01 00:00:00到你系统时间来划分区间

我:
比如现在eventTime是中午12点,那么会事先创建12点以前的window吗

A:window和数据没有关系

我:事先创建好window,那么会创建到哪个时间点?无限吗,还是1天的量就够了,还是根据现有的时间,再逐个创建出来

然后我就根据这个问题进行了查证,

首先这个应该是阿里云邪大佬的博客,里面有说到可能会创建新窗口,如下:

但是我之前没有验证过,所以这次我直接去看了下源码,如下:

程序刚开始运行时,会依次调用如下:

1.
	// ------------------------------------------------------------------------
	//  Windowing
	// ------------------------------------------------------------------------

	/**
	 * Windows this {@code KeyedStream} into tumbling time windows.
	 *
	 * <p>This is a shortcut for either {@code .window(TumblingEventTimeWindows.of(size))} or
	 * {@code .window(TumblingProcessingTimeWindows.of(size))} depending on the time characteristic
	 * set using
	 * {@link org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)}
	 *
	 * @param size The size of the window.
	 */
	public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size) {
		if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) {
			return window(TumblingProcessingTimeWindows.of(size));
		} else {
			return window(TumblingEventTimeWindows.of(size));
		}
	}
	
2.
	/**
	 * Windows this data stream to a {@code WindowedStream}, which evaluates windows
	 * over a key grouped stream. Elements are put into windows by a {@link WindowAssigner}. The
	 * grouping of elements is done both by key and by window.
	 *
	 * <p>A {@link org.apache.flink.streaming.api.windowing.triggers.Trigger} can be defined to
	 * specify when windows are evaluated. However, {@code WindowAssigners} have a default
	 * {@code Trigger} that is used if a {@code Trigger} is not specified.
	 *
	 * @param assigner The {@code WindowAssigner} that assigns elements to windows.
	 * @return The trigger windows data stream.
	 */
	@PublicEvolving
	public <W extends Window> WindowedStream<T, KEY, W> window(WindowAssigner<? super T, W> assigner) {
	    //注意:这里不要看到new WindowedStream就以为是创建好window,这里只是创建Stream流对象,并不是窗口
		return new WindowedStream<>(this, assigner);
	}

3.
	@PublicEvolving
	public WindowedStream(KeyedStream<T, K> input,
			WindowAssigner<? super T, W> windowAssigner) {
		this.input = input;
		this.windowAssigner = windowAssigner;
		this.trigger = windowAssigner.getDefaultTrigger(input.getExecutionEnvironment());
	}

由以上可知:程序刚启动时,会创建好WindowedStream流对象,但是这里没有说创建窗口哦

那么窗口是在哪里创建?debug如下:

当我输入一条数据:

然后程序debug得到:

由上图可知:

window的创建是和当前进来的数据和数据所以带的时间戳有关系,而并不是系统会事先创建好许多window

如有不对之处,请指出