process
keyby后直接处理
map.keyBy(0)
.process(new KeyedProcessFunction<Tuple, Tuple2<String, String>, Object>() {
@Override
public void processElement(Tuple2<String, String> value, Context ctx, Collector<Object> out) throws Exception {
ctx.timerService().registerEventTimeTimer(30);
System.out.println("fdsfsd");
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<Object> out) throws Exception {
super.onTimer(timestamp, ctx, out);
System.out.println("fdsfds");
}})
可以看到,我们不仅可以处理每条keyby后的数据,还可以为每条数据设置一个定时器
ps:keyby后会得到KeyedStream类型的流数据,该流类型有process方法处理每条数据,无apply方法处理每条数据
keyby后调用window
DataStream<Tuple2<String, String>> map = env.socketTextStream("192.168.206.219", 9000)
.assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<String>() {
long currentTimeStamp = 0;
long maxDelayAllowed = 0;//延迟为0
long currentWaterMark;
@Override
public long extractTimestamp(String element, long previousElementTimestamp) {
String[] split = element.split(",");
long l = Long.parseLong(split[1]);
currentTimeStamp = Math.max(l, currentTimeStamp);
System.out.println("Key:" + split[0] + ",EventTime:" + l + ",水位线:" + currentWaterMark);
return currentTimeStamp ;
}
@Nullable
@Override
public Watermark getCurrentWatermark() {
// currentWaterMark = System.currentTimeMillis();
currentWaterMark = currentTimeStamp - maxDelayAllowed;
return new Watermark(currentWaterMark);
}
})
.map(new MapFunction<String, Tuple2<String, String>>() {
@Override
public Tuple2<String, String> map(String s) throws Exception {
return new Tuple2<String, String>(s.split(",")[0], s.split(",")[1]);
}
});
map.keyBy(0)
.timeWindow(Time.seconds(5))
.process(new ProcessWindowFunction<Tuple2<String, String>, Object, Tuple, TimeWindow>() {
@Override
public void process(Tuple tuple, Context context, Iterable<Tuple2<String, String>> elements, Collector<Object> out) throws Exception {
System.out.println("fsd");
}
})
输入:
输入的5条数据包含有hello,world的字符串key

根据代码得知,触发window计算是在(hello,1553503190000)
可以得到调试结果为:

ps:keyby分区后,在
window后得到的每批数据都是相同的key值
由上图可知:
第一次触发process方法,里面方法返回来的值是一个迭代器集合,里面有3条数据,且这3条数据的key都是hello

第二次触发process方法,里面方法返回来的值是一个迭代器集合,里面有2条数据,且这2条数据的key都是world
ps:keyby分区后,在
window后触发的计算方法无KeyedProcessFunction,只有ProcessWindowFunction
得出以下结论:
- keyby后可直接调用
processElement,且为每条数据设置定时器,得到每条数据也是按照输入顺序来。不能调用apply - keyby后直接调用
window,将转换为WindowedStream流,和keyby流无关。不能调用KeyedProcessFunction
apply
只能用
WindowedStream进行调用
map.keyBy(0)
.timeWindow(Time.seconds(5))
.apply(new WindowFunction<Tuple2<String, String>, Object, Tuple, TimeWindow>() {
@Override
public void apply(Tuple tuple, TimeWindow window, Iterable<Tuple2<String, String>> input, Collector<Object> out) throws Exception {
System.out.println("fds");
}
})
总结:
keyby后可直接map直接输出,也可以timeWindow进行数据范围批量处理,不能调用applytimeWindow返回WindowedStream,WindowedStream后的process,apply方法,是在水印时间大于等于窗口时间才会进行调用的对窗口进行计算的方式process,apply计算的方式一样,都是会得到一批key值相同的数据
