flink学习之process,apply

8,056 阅读2分钟

process

keyby后直接处理
map.keyBy(0)
.process(new KeyedProcessFunction<Tuple, Tuple2<String, String>, Object>() {
	@Override
	public void processElement(Tuple2<String, String> value, Context ctx, Collector<Object> out) throws Exception {
		ctx.timerService().registerEventTimeTimer(30);
		System.out.println("fdsfsd");
	}

	@Override
	public void onTimer(long timestamp, OnTimerContext ctx, Collector<Object> out) throws Exception {
		super.onTimer(timestamp, ctx, out);
		System.out.println("fdsfds");
	}})

可以看到,我们不仅可以处理每条keyby后的数据,还可以为每条数据设置一个定时器

ps:keyby后会得到KeyedStream类型的流数据,该流类型有process方法处理每条数据,无apply方法处理每条数据

keyby后调用window
DataStream<Tuple2<String, String>> map = env.socketTextStream("192.168.206.219", 9000)
		.assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<String>() {

			long currentTimeStamp = 0;
			long maxDelayAllowed = 0;//延迟为0
			long currentWaterMark;

			@Override
			public long extractTimestamp(String element, long previousElementTimestamp) {
				String[] split = element.split(",");
				long l = Long.parseLong(split[1]);
				currentTimeStamp = Math.max(l, currentTimeStamp);
				System.out.println("Key:" + split[0] + ",EventTime:" + l + ",水位线:" + currentWaterMark);
				return currentTimeStamp ;
			}

			@Nullable
			@Override
			public Watermark getCurrentWatermark() {
//						currentWaterMark = System.currentTimeMillis();
				currentWaterMark = currentTimeStamp - maxDelayAllowed;
				return new Watermark(currentWaterMark);
			}
		})
		.map(new MapFunction<String, Tuple2<String, String>>() {
			@Override
			public Tuple2<String, String> map(String s) throws Exception {
				return new Tuple2<String, String>(s.split(",")[0], s.split(",")[1]);
			}
		});
map.keyBy(0)
.timeWindow(Time.seconds(5))
.process(new ProcessWindowFunction<Tuple2<String, String>, Object, Tuple, TimeWindow>() {
	@Override
	public void process(Tuple tuple, Context context, Iterable<Tuple2<String, String>> elements, Collector<Object> out) throws Exception {
		System.out.println("fsd");
	}
})

输入: 输入的5条数据包含有hello,world的字符串key

根据代码得知,触发window计算是在(hello,1553503190000)

可以得到调试结果为:

ps:keyby分区后,在window后得到的每批数据都是相同的key值

由上图可知: 第一次触发process方法,里面方法返回来的值是一个迭代器集合,里面有3条数据,且这3条数据的key都是hello

第二次触发process方法,里面方法返回来的值是一个迭代器集合,里面有2条数据,且这2条数据的key都是world

ps:keyby分区后,在window后触发的计算方法无KeyedProcessFunction,只有ProcessWindowFunction

得出以下结论:

  • keyby后可直接调用processElement,且为每条数据设置定时器,得到每条数据也是按照输入顺序来。不能调用apply
  • keyby后直接调用window,将转换为WindowedStream流,和keyby流无关。不能调用KeyedProcessFunction

apply

只能用WindowedStream进行调用

map.keyBy(0)
.timeWindow(Time.seconds(5))
		.apply(new WindowFunction<Tuple2<String, String>, Object, Tuple, TimeWindow>() {
			@Override
			public void apply(Tuple tuple, TimeWindow window, Iterable<Tuple2<String, String>> input, Collector<Object> out) throws Exception {
				System.out.println("fds");
			}
		})

总结:

  • keyby后可直接map直接输出,也可以timeWindow进行数据范围批量处理,不能调用apply
  • timeWindow返回WindowedStreamWindowedStream后的process,apply方法,是在水印时间大于等于窗口时间才会进行调用的对窗口进行计算的方式
  • process,apply计算的方式一样,都是会得到一批key值相同的数据