Flink DataStream Join

207 阅读1分钟

Window join用于将多个stream连接操作。Flink中提供了join和interval join。其中interval join只用于event time。
多流join时watermark取最小的watermarks。

Join

join相当于inner join.

leftStream.join(rightStream)
        .where((event) -> event.getKey())  //leftStream
        .equalTo((event) -> event.getKey())   //rightStream
        .window(TumblingEventTimeWindows.of(Time.milliseconds(2)))
        .apply(new JoinFunction<MyEvent, MyEvent, Tuple2<String,Long>>() {
            @Override
            public String join(MyEvent first, MyEvent second) throws Exception {
                return first.getValue() + "" + second.getValue();
            }
        });

Interval join

A流和B流,给B流的每一个元素的eventtime之间设定一个lowerBound和upperBound,如果A流有数据event time 在[B流元素event time + lowerBound, B流元素event time + upperBound]之间,那么这些A流的数据就会跟B流的这个元素join

leftStream
    .keyBy((event) -> event.getKey()) //leftStream
    .intervalJoin(rightStream.keyBy((event) -> event.getKey()))   //rightStream
    .between(Time.milliseconds(-2), Time.milliseconds(1))
    .process (new ProcessJoinFunction<Integer, Integer, String(){
        @Override
        public void processElement(Integer left, Integer right, Context ctx, Collector<String> out) {
            out.collect(first + "," + second);
        }
    });