Window join用于将多个stream连接操作。Flink中提供了join和interval join。其中interval join只用于event time。
多流join时watermark取最小的watermarks。
Join
join相当于inner join.
leftStream.join(rightStream)
.where((event) -> event.getKey()) //leftStream
.equalTo((event) -> event.getKey()) //rightStream
.window(TumblingEventTimeWindows.of(Time.milliseconds(2)))
.apply(new JoinFunction<MyEvent, MyEvent, Tuple2<String,Long>>() {
@Override
public String join(MyEvent first, MyEvent second) throws Exception {
return first.getValue() + "" + second.getValue();
}
});
Interval join
A流和B流,给B流的每一个元素的eventtime之间设定一个lowerBound和upperBound,如果A流有数据event time 在[B流元素event time + lowerBound, B流元素event time + upperBound]之间,那么这些A流的数据就会跟B流的这个元素join
leftStream
.keyBy((event) -> event.getKey()) //leftStream
.intervalJoin(rightStream.keyBy((event) -> event.getKey())) //rightStream
.between(Time.milliseconds(-2), Time.milliseconds(1))
.process (new ProcessJoinFunction<Integer, Integer, String(){
@Override
public void processElement(Integer left, Integer right, Context ctx, Collector<String> out) {
out.collect(first + "," + second);
}
});