函数式数据处理简介

85 阅读3分钟

函数式数据处理 - 流Stream介绍

什么是流(Stream)

  • 简单比喻:河流
  • 流:允许以声明的方式处理数据集合,是 Java 的 API
  • 位置:package java.util.stream
  • Stream 与计算相关,不是数据结构
  • 像更高级的 Iterator

Stream pipeline 三件套

  • 构造一个流水线有三件事
    • 源:需要一个数据源(如集合)
    • 中间操作:一条中间操作链,形成一条流的流水线
    • 终止操作:执行流水线并产生结果

为什么需要 Stream

  • 可读性强:借助 Lambda 表达式,极大提高编程效率和程序可读性
  • 实现复合流水线操作:利用流水线 pipeline 的原理实现类似数据库可查询的操作
  • 可并行:能够利用多核处理器的优势,使用 fork/join 并行方式拆分任务与加速处理

产生 Stream

  • 从集合中产生流(最常用)
List<String> fruits = Arrays.asList("apple", "banana", "cherry", "orange");
Stream<String> stream = fruits.stream();
Set<Integer> integers = new HashSet<>();
integers.add(100);
Stream<Integer> integerStream = integers.stream();

Collection<String> strings = new ArrayList<>();
strings.add("Hello");
Stream<String> stringStream = strings.stream();
  • 从数组产生流
Integer[] nums = {1, 2, 3, 4, 5};
Stream<Integer> numberStream = Arrays.stream(nums);
String[] fruitsString = {"apple", "banana", "cherry", "orange", "coco"};
Stream<String> fruitsStream = Arrays.stream(fruitsString, 1, 5);
fruitsStream.forEach(System.out::println);
  • 使用 Stream.of 方法 -> 直接从值产生流
Stream<Integer> numberStream2 = Stream.of(1, 2, 3, 4, 5);
Stream<String> stringStream2 = Stream.of("apple", "banana", "cherry", "orange", "coco");
numberStream2.forEach(System.out::println);
stringStream2.forEach(System.out::println);
  • 从函数产生:generate/iterate(所谓的无限流)
// 从2开始的偶数流
Stream<Integer> evenStream = Stream.iterate(2, i -> i + 2);
//        evenStream.forEach(System.out::println);

// 斐波那契数列: 1, 1, 2, 3, 5, 8, 13, ...... f(n) = f(n-1) + f(n-2) n>=2
Stream<Integer> fibStream = Stream.iterate(new int[]{1, 1}, t -> new int[]{t[1], t[0] + t[1]})
                                    .limit(10)
                                    .map(array -> array[0]);
fibStream.forEach(System.out::println);

Stream<String> genStream = Stream.generate(() -> "abc").limit(10);
genStream.forEach(System.out::println);
  • 从文件产生
String filePath = "file.csv";
try(Stream<String> lines = Files.lines(Paths.get(filePath))){
    lines.forEach(System.out::println);
} catch (IOException ex) {
    ex.printStackTrace();
}

中间操作

流做完终止操作后不能再被消费

  • filter -> 过滤
// filter: 用于筛选流中满足特定条件的元素
Stream<String> stream = Stream.of("que", "fsldkjf", "abc", "jfoe", "fsjdjfo", "ab");
Stream<String> filteredStream = stream.filter(s -> s.length() == 3);
filteredStream.forEach(System.out::println);
  • map -> 映射(转换)
// map: 对流中的每一个元素执行 T -> R 映射
Stream<Integer> stringStream = stringList.stream().map(String::toUpperCase)
                                         .filter(s -> s.startsWith("W"))
                                         .map(String::length);
stringStream.forEach(System.out::println);
  • flatmap -> 映射+扁平
// flatMap: 将流中的每个元素转换为其他流,然后再将所有这些流合并连接成一个流
stringList.stream()
          .map(String::length)
          .forEach(length -> System.out.println("The str length: " + length));

stringList.stream()
          .flatMap(s -> Arrays.stream(s.split(" ")))
          .forEach(s -> System.out.println("Str after flat map: " + s));

List<List<String>> nestedList = Arrays.asList(Arrays.asList("dfjei", "sjdfoew"),
                                              Arrays.asList("ccccccc", "aaaaaa", "bbbbbb"),
                                              Arrays.asList("asdf", "asdfg", "asdfgh", "asdfghjk"));
nestedList.stream()
          .flatMap(strings -> strings.stream().filter(s -> s.length() > 4))
          .forEach(System.out::println);

Stream<Stream<String>> streamStream = Stream.of(
        Stream.of("a", "b"),
        Stream.of("c", "d"),
        Stream.of("e", "f", "g"));
streamStream.flatMap(s -> s.map(String::length)).forEach(System.out::println);
  • distinct -> 去重
// distinct: 对流中元素去重
List<Integer> numbers = Arrays.asList(1, 1, 3, 4, 2, 5, 5, 6, 6);
Stream<Integer> distinct = numbers.stream().distinct();
distinct.forEach(System.out::println);
  • sorted -> 排序
// sorted: 排序
stringList.stream()
          .sorted()
          .forEach(System.out::println);

numbers.stream().distinct().sorted().forEach(System.out::println);
  • peek -> 观察(debug)
// peek: 用于调试
List<String> collect = stringList.stream()
                                 .filter(s -> s.length() > 5)
                                 .peek(s -> System.out.println("Filter string: " + s))
                                 .map(String::toUpperCase)
                                 .peek(System.out::println)
                                 .collect(Collectors.toList());
  • limit -> 限制
// limit: 限制流的长度,返回不超过给定长度的新流
numbers.stream().limit(3).forEach(s -> System.out.println("The number after limit: " + s));
  • skip -> 跳过
// skip: 跳过流中的前n个元素
numbers.stream().skip(3).forEach(s -> System.out.println("The number after skip: " + s));

对原始类型处理的特殊流:特化流

  • 特化流:避免装箱性能开销
  • IntStream / LongStream / DoubleStream
// range
IntStream intStream = IntStream.range(1, 6);    // 1 2 3 4 5
intStream.forEach(System.out::println);
IntStream intStream2 = IntStream.rangeClosed(1, 6); // 1 2 3 4 5 6
intStream2.forEach(num -> System.out.println("Number in range close: " + num));

LongStream longStream = LongStream.range(1000000000L, 1000000006L);
longStream.forEach(num -> System.out.println("Long number: " + num));
LongStream longStream2 = LongStream.rangeClosed(1000000000L, 1000000006L);
longStream2.forEach(num -> System.out.println("Long number close: " + num));

// of
IntStream.of(1, 3 ,5, 12).forEach(System.out::println);
LongStream.of(1001230000000L, 1000000000456L, 7891000000000L).forEach(System.out::println);

DoubleStream doubleStream = DoubleStream.of(0.123, 1.3242, 3.1415926);
doubleStream.filter(num -> num < 1.5).forEach(System.out::println);

// generate / iterate 三个都有
IntStream.generate(() -> {
    Random random = new Random();
    return random.nextInt(1000);})
         .limit(10)
         .forEach(System.out::println);
  • 对象流与特化流的相互转换,特化流与特化流的相互转换
  • 注意范围,特化流之间范围大的没法转成范围小的(LongStream 不能转成 IntStream)
// 普通流 -> 特化流
List<String> numList = Arrays.asList("1", "23", "456", "7890");
IntStream stream = numList.stream().mapToInt(Integer::parseInt);
stream.forEach(num -> System.out.println("Num in mapping to int: " + num));

// 特化流 -> 普通流
DoubleStream doubleStream2 = DoubleStream.of(0.123, 1.3242, 3.1415926);
Stream<Double> boxedStream = doubleStream2.boxed();
boxedStream.forEach(System.out::println);

// 特化流之间转换 Long 没法转 Int
IntStream intStream1 = IntStream.rangeClosed(200, 213);
intStream1.asDoubleStream()
          .filter(num -> num % 2 == 1)
          .forEach(System.out::println);