本文已参与「新人创作礼」活动,一起开启掘金创作之路。
分流场景
我们在生产实践中经常会遇到这样的场景,需把输入源按照需要进行拆分,比如我期望把用户访问日志按照访问者的地理位置进行拆分。面对这样的需求该如何操作呢?
通常来说针对不同的场景,有以下三种办法进行流的拆分。
- Filter 分流
- Split 分流
- SideOutPut 分流
1. Filter 分流
Scala 案例
/**
* Flink分流方式
* 1。 Filter 分流(原始流多次过滤,导致消耗性能)
* 2。 Split 分流(不支持二次分流)
* 3。 SideOutput 分流(官方推荐)
*/
object filterStreamExample {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
//1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
//2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
val inputStream: DataStream[String] = env.readTextFile("/home/rjxy/zlp/Code/CodePro/GuoSai/Task01/src/main/resources/day.csv")
val littleStream = inputStream.filter(_.split(",")(0).toInt < 500)
val bigStream = inputStream.filter(_.split(",")(0).toInt >= 500)
//打印结果
littleStream.print("little------")
bigStream.print("big------")
env.execute()
}
}
输出结果:
little------> 496,2012-05-10,2,1,5,0,4,1,1,0.505833,0.491783,0.552083,0.314063,1026,5546,6572
little------> 498,2012-05-12,2,1,5,0,6,0,1,0.564167,0.544817,0.480417,0.123133,2622,4807,7429
little------> 499,2012-05-13,2,1,5,0,0,0,1,0.6125,0.585238,0.57625,0.225117,2172,3946,6118
big------> 500,2012-05-14,2,1,5,0,1,1,2,0.573333,0.5499,0.789583,0.212692,342,2501,2843
big------> 501,2012-05-15,2,1,5,0,2,1,2,0.611667,0.576404,0.794583,0.147392,625,4490,5115
Filter的缺点:
Filter 的弊端:为了得到我们需要的流数据,需要多次遍历原始流,这样无形中浪费了我们集群的资源。
2. Split 分流
Scala 案例
object splitStreamExample {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// env.setParallelism(1)
//1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
//2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
val inputStream: DataStream[String] = env.readTextFile("/home/rjxy/zlp/Code/CodePro/GuoSai/Task01/src/main/resources/day.csv")
val splitStream: SplitStream[String] = inputStream.split(new OutputSelector[String] {
override def select(out: String): lang.Iterable[String] = {
val tags = new util.ArrayList[String]()
if (out.split(",")(0).toInt < 500) {
tags.add("littleStream")
} else if (out.split(",")(0).toInt >= 500) {
tags.add("bigStream")
}
return tags
}
})
splitStream.select("littleStream").print("little------")
splitStream.select("bigStream").print("big------")
env.execute()
}
}
输出结果:
little------:13> 36,2011-02-05,1,0,2,0,6,0,2,0.233333,0.243058,0.929167,0.161079,100,905,1005
little------:15> 137,2011-05-17,2,0,5,0,2,1,2,0.561667,0.538529,0.837917,0.277354,678,3445,4123
little------:13> 37,2011-02-06,1,0,2,0,0,0,1,0.285833,0.291671,0.568333,0.1418,354,1269,1623
big------:9> 592,2012-08-14,3,1,8,0,2,1,1,0.726667,0.676779,0.686667,0.169158,1128,5656,6784
little------:13> 38,2011-02-07,1,0,2,0,1,1,1,0.271667,0.303658,0.738333,0.0454083,120,1592,1712
big------:9> 593,2012-08-15,3,1,8,0,3,1,1,0.706667,0.654037,0.619583,0.169771,1198,6149,7347
请看下篇======