Flink 自定义Trigger实现基于时间和计数的窗口计算

1,958 阅读4分钟
  • 小知识,大挑战!本文正在参与“程序员必备小知识”创作活动。 Flink中,window操作需要伴随对窗口中的数据进行处理的逻辑,也就是窗口函数,而 Trigger 的作用就是决定何时触发窗口函数中的逻辑执行。

API说明

//进入窗口的每个元素都会调用该方法
public abstract TriggerResult onElement(T var1, long var2, W var4, Trigger.TriggerContext var5) throws Exception;
//事件时间timer触发的时候被调用
public abstract TriggerResult onProcessingTime(long var1, W var3, Trigger.TriggerContext var4) throws Exception;
//处理时间timer触发的时候会被调用
public abstract TriggerResult onEventTime(long var1, W var3, Trigger.TriggerContext var4) throws Exception;
//有状态的触发器相关,并在它们相应的窗口合并时合并两个触发器的状态
public void onMerge(W window, Trigger.OnMergeContext ctx) throws Exception {
    throw new UnsupportedOperationException("This trigger does not support merging.");
}
//执行窗口的删除操作
public abstract void clear(W var1, Trigger.TriggerContext var2) throws Exception;

1). 前三方法决定着如何通过返回一个TriggerResult来操作输入事件。

CONTINUE:什么都不做。

FIRE:触发计算。

PURE:清除窗口的元素。

FIRE_AND_PURE:触发计算和清除窗口元素。

 清除将仅删除窗口的内容,并将保留有关该窗口的任何潜在元信息以及任何触发状态。

2). 前三方法中的任何一个都可用于为将来的操作注册处理或事件时间计时器。

3). Flink内部有一些内置的触发器:

  • EventTimeTrigger:基于事件时间和watermark机制来对窗口进行触发计算。
  • ProcessingTimeTrigger:基于处理时间触发。
  • CountTrigger:窗口元素数超过预先给定的限制值的话会触发计算。
  • PurgingTrigger作为其它trigger的参数,将其转化为一个purging触发器。

自定义trigger

//自定义trigger必须继承Trigger类
public static  class CustomCountTriggerWithEventTime<T> extends Trigger<T, TimeWindow> {

    private static final long serialVersionUID = 6021946857731563476L;
    private static final Logger LOG = LoggerFactory.getLogger(CustomCountTriggerWithEventTime.class);

    private final long maxCount;

    private final ReducingStateDescriptor<Long> countStateDescriptor;
    //构造方法,设置最大统计数量以及初始化计数器状态描述变量
    public CustomCountTriggerWithEventTime(long maxCount) {
        this.maxCount = maxCount;
        countStateDescriptor = new ReducingStateDescriptor<>("countState", new ReduceSum(), LongSerializer.INSTANCE);
    }
    //封装一个fireAndPurge方法
    private TriggerResult fireAndPurge(long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
        clear(window, ctx);

        return TriggerResult.FIRE_AND_PURGE;
    }
    //元素进入
    @Override
    public TriggerResult onElement(T element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
        //更新
        if (window.maxTimestamp() > ctx.getCurrentWatermark()) {
            ctx.registerEventTimeTimer(window.maxTimestamp());
        }

        ReducingState<Long> countState = ctx.getPartitionedState(countStateDescriptor);

        // 新的element进来,总数需要加1
        countState.add(1L);
        // 是否出发窗口计算判断
        if (countState.get() >= maxCount) {
            //LOG.info("Count Trigger triggered on count exceed. count {}", countState.get());
            return fireAndPurge(timestamp, window, ctx);
        }

        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) throws Exception {
        // 窗口结束触发
        if(time >= window.maxTimestamp()){
            return fireAndPurge(time, window, ctx);
        }
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) throws Exception {
        //LOG.info("Count Trigger triggered on time reached time {} window end {} window max {}",  time, window.getEnd(), window.maxTimestamp());
        if (time == window.maxTimestamp()) {
            return fireAndPurge(time, window, ctx);
        } else {
            return TriggerResult.CONTINUE;
        }
    }

    @Override
    public void clear(TimeWindow window, TriggerContext ctx) throws Exception {
        ctx.deleteEventTimeTimer(window.maxTimestamp());
        ReducingState<Long> countState = ctx.getPartitionedState(countStateDescriptor);
        countState.clear();
    }

    @Override
    public boolean canMerge() {
        return true;
    }

    @Override
    public void onMerge(TimeWindow window,
                        OnMergeContext ctx) {
        // only register a timer if the watermark is not yet past the end of the merged window
        // this is in line with the logic in onElement(). If the watermark is past the end of
        // the window onElement() will fire and setting a timer here would fire the window twice.
        long windowMaxTimestamp = window.maxTimestamp();
        if (windowMaxTimestamp > ctx.getCurrentWatermark()) {
            ctx.registerEventTimeTimer(windowMaxTimestamp);
        }
    }

    /**
     * 计数方法
     */
    class ReduceSum implements ReduceFunction<Long> {
        @Override
        public Long reduce(Long value1, Long value2) throws Exception {
            return value1 + value2;
        }
    }

}

运行环境

main方法中的上下文环境:

        //0.env
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
        env.setParallelism(1);

        //1.source
        DataStream<String> lines = env.socketTextStream("127.0.0.1", 9999);
        lines.print();
        //2.transformation
        SingleOutputStreamOperator<CartInfo> souceDS = lines.map(new MapFunction<String, CartInfo>() {
            @Override
            public CartInfo map(String value) throws Exception {
                String[] arr = value.split(",");
                return new CartInfo(arr[0], Integer.parseInt(arr[1]), Long.parseLong(arr[2]));
            }
        }).assignTimestampsAndWatermarks(WatermarkStrategy.<CartInfo>forBoundedOutOfOrderness(Duration.ofSeconds(1)).withTimestampAssigner((cartInfo, l) -> cartInfo.getDatetime() * 1000));

        
        KeyedStream<CartInfo, String> keyedDS = souceDS.keyBy(CartInfo::getSensorId);
        // * 需求1:每5秒钟统计一次,最近5秒钟内,各个SensorId的count数量之和--基于时间的滚动窗口
        SingleOutputStreamOperator<CartInfo> result = keyedDS
                .window(TumblingEventTimeWindows.of(Time.seconds(5)))
                .trigger(new CustomCountTriggerWithEventTime<>(3))
                //.sum("count")
                .reduce(new ReduceFunction<CartInfo>() {
                    @Override
                    public CartInfo reduce(CartInfo cartInfo, CartInfo t1) throws Exception {
                        return new CartInfo(cartInfo.sensorId,cartInfo.getCount()+t1.getCount(),t1.getDatetime());
                    }
                });

        //3.sink
        result.print("test");
        
        //4.execute
        env.execute();

实体类:

@Data
@AllArgsConstructor
@NoArgsConstructor
public static class CartInfo {
    private String sensorId;
    private Integer count;
    private Long datetime;
}

测试数据及结果分析

通过结果可以看出来:

  1. 同一个窗口内,如果数量满足设置的最大数量则会触发一次计数窗口计算;
  2. 当watermark满足时会再次触发该窗口计算,并关闭窗口;
  3. 如果计数窗口触发,在计数窗口参与计算的数据会被清除,不再参与计算;
1,1,1634007200
1,1,1634007202
1,1,1634007204
test> WindowDemo.CartInfo(sensorId=1, count=3, datetime=1634007204)
1,1,1634007204
1,1,1634007205
1,1,1634007206
test> WindowDemo.CartInfo(sensorId=1, count=1, datetime=1634007204)
1,1,1634007211
test> WindowDemo.CartInfo(sensorId=1, count=2, datetime=1634007206)