函数式数据处理 - 流Stream介绍
什么是流(Stream)
- 简单比喻:河流
- 流:允许以声明的方式处理数据集合,是 Java 的 API
- 位置:package java.util.stream
- Stream 与计算相关,不是数据结构
- 像更高级的 Iterator
Stream pipeline 三件套
- 构造一个流水线有三件事
- 源:需要一个数据源(如集合)
- 中间操作:一条中间操作链,形成一条流的流水线
- 终止操作:执行流水线并产生结果
为什么需要 Stream
- 可读性强:借助 Lambda 表达式,极大提高编程效率和程序可读性
- 实现复合流水线操作:利用流水线 pipeline 的原理实现类似数据库可查询的操作
- 可并行:能够利用多核处理器的优势,使用 fork/join 并行方式拆分任务与加速处理
产生 Stream
- 从集合中产生流(最常用)
List<String> fruits = Arrays.asList("apple", "banana", "cherry", "orange");
Stream<String> stream = fruits.stream();
Set<Integer> integers = new HashSet<>();
integers.add(100);
Stream<Integer> integerStream = integers.stream();
Collection<String> strings = new ArrayList<>();
strings.add("Hello");
Stream<String> stringStream = strings.stream();
- 从数组产生流
Integer[] nums = {1, 2, 3, 4, 5};
Stream<Integer> numberStream = Arrays.stream(nums);
String[] fruitsString = {"apple", "banana", "cherry", "orange", "coco"};
Stream<String> fruitsStream = Arrays.stream(fruitsString, 1, 5);
fruitsStream.forEach(System.out::println);
- 使用 Stream.of 方法 -> 直接从值产生流
Stream<Integer> numberStream2 = Stream.of(1, 2, 3, 4, 5);
Stream<String> stringStream2 = Stream.of("apple", "banana", "cherry", "orange", "coco");
numberStream2.forEach(System.out::println);
stringStream2.forEach(System.out::println);
- 从函数产生:generate/iterate(所谓的无限流)
// 从2开始的偶数流
Stream<Integer> evenStream = Stream.iterate(2, i -> i + 2);
// evenStream.forEach(System.out::println);
// 斐波那契数列: 1, 1, 2, 3, 5, 8, 13, ...... f(n) = f(n-1) + f(n-2) n>=2
Stream<Integer> fibStream = Stream.iterate(new int[]{1, 1}, t -> new int[]{t[1], t[0] + t[1]})
.limit(10)
.map(array -> array[0]);
fibStream.forEach(System.out::println);
Stream<String> genStream = Stream.generate(() -> "abc").limit(10);
genStream.forEach(System.out::println);
- 从文件产生
String filePath = "file.csv";
try(Stream<String> lines = Files.lines(Paths.get(filePath))){
lines.forEach(System.out::println);
} catch (IOException ex) {
ex.printStackTrace();
}
中间操作
流做完终止操作后不能再被消费
- filter -> 过滤
// filter: 用于筛选流中满足特定条件的元素
Stream<String> stream = Stream.of("que", "fsldkjf", "abc", "jfoe", "fsjdjfo", "ab");
Stream<String> filteredStream = stream.filter(s -> s.length() == 3);
filteredStream.forEach(System.out::println);
- map -> 映射(转换)
// map: 对流中的每一个元素执行 T -> R 映射
Stream<Integer> stringStream = stringList.stream().map(String::toUpperCase)
.filter(s -> s.startsWith("W"))
.map(String::length);
stringStream.forEach(System.out::println);
- flatmap -> 映射+扁平
// flatMap: 将流中的每个元素转换为其他流,然后再将所有这些流合并连接成一个流
stringList.stream()
.map(String::length)
.forEach(length -> System.out.println("The str length: " + length));
stringList.stream()
.flatMap(s -> Arrays.stream(s.split(" ")))
.forEach(s -> System.out.println("Str after flat map: " + s));
List<List<String>> nestedList = Arrays.asList(Arrays.asList("dfjei", "sjdfoew"),
Arrays.asList("ccccccc", "aaaaaa", "bbbbbb"),
Arrays.asList("asdf", "asdfg", "asdfgh", "asdfghjk"));
nestedList.stream()
.flatMap(strings -> strings.stream().filter(s -> s.length() > 4))
.forEach(System.out::println);
Stream<Stream<String>> streamStream = Stream.of(
Stream.of("a", "b"),
Stream.of("c", "d"),
Stream.of("e", "f", "g"));
streamStream.flatMap(s -> s.map(String::length)).forEach(System.out::println);
- distinct -> 去重
// distinct: 对流中元素去重
List<Integer> numbers = Arrays.asList(1, 1, 3, 4, 2, 5, 5, 6, 6);
Stream<Integer> distinct = numbers.stream().distinct();
distinct.forEach(System.out::println);
- sorted -> 排序
// sorted: 排序
stringList.stream()
.sorted()
.forEach(System.out::println);
numbers.stream().distinct().sorted().forEach(System.out::println);
- peek -> 观察(debug)
// peek: 用于调试
List<String> collect = stringList.stream()
.filter(s -> s.length() > 5)
.peek(s -> System.out.println("Filter string: " + s))
.map(String::toUpperCase)
.peek(System.out::println)
.collect(Collectors.toList());
- limit -> 限制
// limit: 限制流的长度,返回不超过给定长度的新流
numbers.stream().limit(3).forEach(s -> System.out.println("The number after limit: " + s));
- skip -> 跳过
// skip: 跳过流中的前n个元素
numbers.stream().skip(3).forEach(s -> System.out.println("The number after skip: " + s));
对原始类型处理的特殊流:特化流
- 特化流:避免装箱性能开销
- IntStream / LongStream / DoubleStream
// range
IntStream intStream = IntStream.range(1, 6); // 1 2 3 4 5
intStream.forEach(System.out::println);
IntStream intStream2 = IntStream.rangeClosed(1, 6); // 1 2 3 4 5 6
intStream2.forEach(num -> System.out.println("Number in range close: " + num));
LongStream longStream = LongStream.range(1000000000L, 1000000006L);
longStream.forEach(num -> System.out.println("Long number: " + num));
LongStream longStream2 = LongStream.rangeClosed(1000000000L, 1000000006L);
longStream2.forEach(num -> System.out.println("Long number close: " + num));
// of
IntStream.of(1, 3 ,5, 12).forEach(System.out::println);
LongStream.of(1001230000000L, 1000000000456L, 7891000000000L).forEach(System.out::println);
DoubleStream doubleStream = DoubleStream.of(0.123, 1.3242, 3.1415926);
doubleStream.filter(num -> num < 1.5).forEach(System.out::println);
// generate / iterate 三个都有
IntStream.generate(() -> {
Random random = new Random();
return random.nextInt(1000);})
.limit(10)
.forEach(System.out::println);
- 对象流与特化流的相互转换,特化流与特化流的相互转换
- 注意范围,特化流之间范围大的没法转成范围小的(LongStream 不能转成 IntStream)
// 普通流 -> 特化流
List<String> numList = Arrays.asList("1", "23", "456", "7890");
IntStream stream = numList.stream().mapToInt(Integer::parseInt);
stream.forEach(num -> System.out.println("Num in mapping to int: " + num));
// 特化流 -> 普通流
DoubleStream doubleStream2 = DoubleStream.of(0.123, 1.3242, 3.1415926);
Stream<Double> boxedStream = doubleStream2.boxed();
boxedStream.forEach(System.out::println);
// 特化流之间转换 Long 没法转 Int
IntStream intStream1 = IntStream.rangeClosed(200, 213);
intStream1.asDoubleStream()
.filter(num -> num % 2 == 1)
.forEach(System.out::println);