这是我参与更文挑战的第5天,活动详情查看: 更文挑战
Flink State状态
Flink是一个有状态的流式计算引擎,所以会将中间计算结果(状态)进行保存,默认保存到TaskManager的堆内存中,但是当task挂掉,那么这个task所对应的状态都会被清空,造成了数据丢失,无法保证结果的正确性,哪怕想要得到正确结果,所有数据都要重新计算一遍,效率很低。想要保证At -leastonce和Exactly-once,需要把数据状态持久化到更安全的存储介质中,Flink提供了堆内内存、堆外内存、HDFS、RocksDB等存储介质
先来看下Flink提供的状态有哪些?
Flink中状态分为两种类型
- Keyed State
基于KeyedStream上的状态,这个状态是跟特定的Key绑定,KeyedStream流上的每一个Key都对应一个State,每一个Operator可以启动多个Thread处理,但是相同Key的数据只能由同一个Thread处理,因此一个Keyed状态只能存在于某一个Thread中,一个Thread会有多个Keyedstate
- Non-Keyed State(Operator State)
Operator State与Key无关,而是与Operator绑定,整个Operator只对应一个State。比如:Flink中的Kafka Connector就使用了OperatorState,它会在每个Connector实例中,保存该实例消费Topic的所有(partition, offffset)映射
- Flink针对Keyed State提供了以下可以保存State的数据结构
- ValueState:类型为T的单值状态,这个状态与对应的Key绑定,最简单的状态,通过update更新值,通过
value获取状态值 - ListState:Key上的状态值为一个列表,这个列表可以通过add方法往列表中添加值,也可以通过get()方法返回一个**
Iterable**来遍历状态值 - ReducingState:每次调用add()方法添加值的时候,会调用用户传入的
ReduceFunction,最后合并到一个单一的状态值 - MapState<UK, UV>:状态值为一个Map,用户通过put或putAll方法添加元素,get(key)通过指定的key获取value,使用entries()、keys()、values()检索
- AggregatingState <IN, OUT> :保留一个单值,表示添加到状态的所有值的聚合。和
ReducingState相反的是, 聚合类型可能与添加到状态的元素的类型不同。使用 add(IN) 添加的元素会调用用户指定的AggregateFunction进行聚合 FoldingState<T, ACC>:已过时建议使用AggregatingState保留一个单值,表示添加到状态的所有值的聚合。 与 ReducingState 相反,聚合类型可能与添加到状态的元素类型不同。 使用add(T) 添加的元素会调用用户指定的FoldFunction折叠成聚合值
-
ValueState
ValueState:类型为T的单值状态,这个状态与对应的Key绑定,最简单的状态,通过update更新值,通过
value获取状态值public class ValueStateDemo { public static void main(String[] args) throws Exception { StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); DataStreamSource<String> stream = environment .socketTextStream("192.168.150.110", 8888); stream.flatMap(new RichFlatMapFunction<String, CarInfo>() { @Override public void flatMap(String s, Collector<CarInfo> collector) throws Exception { try { String[] sArr = s.split(","); collector.collect(CarInfo.builder().carId(sArr[0]).speed(Long.parseLong(sArr[1])).build()); } catch (Exception ex) { System.out.println(ex); } } }).keyBy(CarInfo::getCarId).map(new RichMapFunction<CarInfo, CarInfo>() { private ValueState<Long> valueState; @Override public void open(Configuration parameters) { ValueStateDescriptor<Long> state = new ValueStateDescriptor<Long> ("state", BasicTypeInfo.LONG_TYPE_INFO); valueState = getRuntimeContext().getState(state); } @Override public CarInfo map(CarInfo carInfo) throws Exception { if (valueState.value()==null||carInfo.getSpeed()>valueState.value()){ valueState.update(carInfo.getSpeed()); } carInfo.setSpeed(valueState.value()); return carInfo; } }).print(">>>>>>>>"); environment.execute("execute"); } }### 控制台 192.168.88.180 nc -lk 8888 ###输入参数 1,200 2,101 3,103 1,102 2,201 3,303 #### sout >>>>>>>>:2> CarInfo(carId=1, speed=200) >>>>>>>>:1> CarInfo(carId=2, speed=201) >>>>>>>>:2> CarInfo(carId=3, speed=303) >>>>>>>>:2> CarInfo(carId=1, speed=200) >>>>>>>>:1> CarInfo(carId=2, speed=201) >>>>>>>>:2> CarInfo(carId=3, speed=303) ##可以看出实现了每台车只显示最大车速的功能聚合 -
ReduingState
每次调用add()方法添加值的时候,会调用用户传入的
ReduceFunction,最后合并到一个单一的状态值,与Reduce函数功能类似的聚合功能public class ReducingStateDemo { public static void main(String[] args) throws Exception { StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); DataStreamSource<String> stream = environment .socketTextStream("192.168.88.180", 8888); stream.flatMap(new RichFlatMapFunction<String, CarInfo>() { @Override public void flatMap(String s, Collector<CarInfo> collector) throws Exception { try { String[] sArr = s.split(","); collector.collect(CarInfo.builder().carId(sArr[0]).speed(Long.parseLong(sArr[1])).build()); } catch (Exception ex) { System.out.println(ex); } } }).keyBy(CarInfo::getCarId).map(new RichMapFunction<CarInfo, CarInfo>() { private ReducingState<Long> valueState; @Override public void open(Configuration parameters) { ReducingStateDescriptor state = new ReducingStateDescriptor("state", new ReduceFunction<Long>() { @Override public Long reduce(Long t1, Long t2) { return t1+t2; } }, BasicTypeInfo.LONG_TYPE_INFO); valueState = getRuntimeContext().getReducingState(state ); } @Override public CarInfo map(CarInfo carInfo) throws Exception { valueState.add(carInfo.getSpeed()); carInfo.setSpeed(valueState.get()); return carInfo; } }).print(">>>>>>>>"); environment.execute("execute"); } }### 控制台 192.168.88.180 nc -lk 8888 ###输入参数 1,100 2,101 3,103 1,101 2,201 3,303 1,100 2,101 3,103 1,101 2,201 3,303 #### sout >>>>>>>>:2> CarInfo(carId=1, speed=100) >>>>>>>>:1> CarInfo(carId=2, speed=201) >>>>>>>>:2> CarInfo(carId=3, speed=103) >>>>>>>>:2> CarInfo(carId=1, speed=201) >>>>>>>>:1> CarInfo(carId=2, speed=302) >>>>>>>>:2> CarInfo(carId=3, speed=406) >>>>>>>>:1> CarInfo(carId=2, speed=503) >>>>>>>>:2> CarInfo(carId=3, speed=509) >>>>>>>>:2> CarInfo(carId=1, speed=301) >>>>>>>>:2> CarInfo(carId=3, speed=812) >>>>>>>>:2> CarInfo(carId=1, speed=402) >>>>>>>>:1> CarInfo(carId=2, speed=604) ##与reduce功能类似的聚合功能 -
ListState
Key上的状态值为一个列表,这个列表可以通过add方法往列表中添加值,也可以通过get()方法返回一个**
Iterable**来遍历状态值public class ListStateDemo { public static void main(String[] args) throws Exception { StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); DataStreamSource<String> stream = environment .socketTextStream("192.168.150.110", 8888); stream.flatMap(new RichFlatMapFunction<String, CarInfo>() { @Override public void flatMap(String s, Collector<CarInfo> collector) throws Exception { try { String[] sArr = s.split(","); collector.collect(CarInfo.builder().carId(sArr[0]).speed(Long.parseLong(sArr[1])).build()); } catch (Exception ex) { System.out.println(ex); } } }).keyBy(CarInfo::getCarId).map(new RichMapFunction<CarInfo, String>() { private ListState<Long> listState; @Override public void open(Configuration parameters) { ListStateDescriptor<Long> state = new ListStateDescriptor<Long>("state", BasicTypeInfo.LONG_TYPE_INFO); listState = getRuntimeContext().getListState(state); } @Override public String map(CarInfo carInfo) throws Exception { listState.add(carInfo.getSpeed()); return String.format("汽车id为 :%s ,车速历史 %s",carInfo.getCarId(),listState.get().toString()); } }).print(">>>>>>>>"); environment.execute("execute"); } }### 控制台 192.168.88.180 nc -lk 8888 ###输入参数 1,100 2,201 3,301 1,102 2,201 3,303 1,101 2,201 3,301 1,102 2,201 3,302 #### sout >>>>>>>>:2> 汽车id为 :1 ,车速历史 [100] >>>>>>>>:2> 汽车id为 :3 ,车速历史 [301] >>>>>>>>:2> 汽车id为 :1 ,车速历史 [100, 101] >>>>>>>>:1> 汽车id为 :2 ,车速历史 [201] >>>>>>>>:1> 汽车id为 :2 ,车速历史 [201, 201] >>>>>>>>:2> 汽车id为 :3 ,车速历史 [301, 303] >>>>>>>>:2> 汽车id为 :1 ,车速历史 [100, 101, 102] >>>>>>>>:1> 汽车id为 :2 ,车速历史 [201, 201, 201] >>>>>>>>:2> 汽车id为 :3 ,车速历史 [301, 303, 301] >>>>>>>>:1> 汽车id为 :2 ,车速历史 [201, 201, 201, 201] >>>>>>>>:2> 汽车id为 :1 ,车速历史 [100, 101, 102, 102] >>>>>>>>:2> 汽车id为 :3 ,车速历史 [301, 303, 301, 302] ##可以看出每一段的车速都被记录下来 -
AggregatingState
保留一个单值,表示添加到状态的所有值的聚合。和
ReducingState相反的是, 聚合类型可能与添加到状态的元素的类型不同。使用 add(IN) 添加的元素会调用用户指定的AggregateFunction进行聚合public class AggregatingStateDemo { public static void main(String[] args) throws Exception { StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); DataStreamSource<String> stream = environment .socketTextStream("192.168.150.110", 8888); stream.flatMap(new RichFlatMapFunction<String, CarInfo>() { @Override public void flatMap(String s, Collector<CarInfo> collector) throws Exception { try { String[] sArr = s.split(","); collector.collect(CarInfo.builder().carId(sArr[0]).speed(Long.parseLong(sArr[1])).build()); } catch (Exception ex) { System.out.println(ex); } } }).keyBy(CarInfo::getCarId).map(new RichMapFunction<CarInfo, CarInfo>() { private AggregatingState<Long,Long> valueState; @Override public void open(Configuration parameters) { AggregatingStateDescriptor<Long,Long,Long> state = new AggregatingStateDescriptor<Long,Long,Long>("state", new AggregateFunction<Long, Long, Long>() { @Override //初始化累加器值 public Long createAccumulator() { return 0L; } @Override //往累加器中累加值 public Long add(Long v, Long acc) { return v+acc; } @Override //返回最终结果 public Long getResult(Long result) { return result; } @Override //合并两个累加器值 public Long merge(Long acc1, Long acc2) { return acc1+acc2; } },BasicTypeInfo.LONG_TYPE_INFO); valueState = getRuntimeContext().getAggregatingState(state); } @Override public CarInfo map(CarInfo carInfo) throws Exception { valueState.add(carInfo.getSpeed()); carInfo.setSpeed(valueState.get()); return carInfo; } }).print(">>>>>>>>"); environment.execute("execute"); } }### 控制台 192.168.88.180 nc -lk 8888 ###输入参数 1,100 2,201 3,301 1,102 2,201 3,303 1,101 2,201 3,301 1,102 2,201 3,302 #### sout >>>>>>>>:2> CarInfo(carId=1, speed=100) >>>>>>>>:2> CarInfo(carId=1, speed=202) >>>>>>>>:1> CarInfo(carId=2, speed=201) >>>>>>>>:2> CarInfo(carId=3, speed=302) >>>>>>>>:1> CarInfo(carId=2, speed=402) >>>>>>>>:2> CarInfo(carId=3, speed=603) >>>>>>>>:1> CarInfo(carId=2, speed=603) >>>>>>>>:1> CarInfo(carId=2, speed=804) >>>>>>>>:2> CarInfo(carId=3, speed=906) >>>>>>>>:2> CarInfo(carId=1, speed=304) >>>>>>>>:2> CarInfo(carId=3, speed=1207) >>>>>>>>:2> CarInfo(carId=1, speed=405) ##id 1 ->100+102+101+102=205 id 2 -> 201+201+201+201=804 id 3 301+303+301+302=1207