Rpc 优化 - 自动分页/分段工具类

目前的问题

目前在 Rpc 的时候会存在以下两个问题：

定时任务刷数据的时候，需要全量拉取其他服务的数据，目前的做法是直接通过列表接口拉取
- 目前拉取数据是从读库拉，不会给主库造成压力，但是在拉取数据的时候，如果数据量比较大，目标服务节点需要一次性序列化大量数据，能用，但不优雅
- 随着日后数据量的增长，如果类似的操作不断增多，将有引发 OOM 的风险，不仅定时任务执行失败，还会导致系统不稳定
目前的内网网关对 URL 的限制较短，偶尔会出现 Get 请求传参过多，排查后发现是传了一个比较长的 List，整体 URL 较长，导致 Rpc 请求失败
- 由于该情况只是偶现且业务需求繁多，遇到这种问题时直接粗暴地把 Get 请求换成 Post 请求，通过把参数放在 Body 里面解决，不够优雅，等需求量稍微减少后，需要针对这种情况提供一个解决方案

解决方案的思考

对于第一个问题，很自然想到的方案就是分页查询，只需要列表数据有序（比如根据 id 排序），就可以很轻松地分页查询，解决方案呼之欲出：
- 弄一个分页工具类，把分页查询的代码进行封装
- 仅仅是普通的分页还不够，还需要提供并发查询的能力，不然查起来会很慢，因为一次请求变成了多次，除了网络开销以外，需要执行的 SQL 数量也将变成原来的 N 倍
对于第二个问题
- 临时的解决方案
  
  通过观察 url 发现，传递 List 时，目前的传参格式为 url?param=1&param=2，参数名会重复很多遍，但这是 Feign 默认的传参格式，且没有提供全局修改的能力，只能在接口上面通过 @CollectionFormat(feign.CollectionFormat.CSV) 注解把传参格式修改为 url?paran=1,2,3，以此减少 url 长度
- 比较整体的解决方案
  
  在进行 Rpc 查询时，先把 List 进行分段，分段以后再根据每个段发送查询请求

总结来说，最后的解决方案是编写一个工具类，提供分页/分段的功能，编写业务需求的时候直接使用工具类来进行查询即可

工具类介绍

适用场景

该工具类主要适用于以下三种场景：

参数为一个分页查询 DTO，自动分页查询

RpcUtils.listWithAutoPage(new UserQueryDTO(), feignClient::listUser)

参数为一个 List，自动对 List 进行分段

RpcUtils.listByIdsWithAutoPartition(userIdList, feignClient::listUserById)

参数为一个 Query 对象，对象里有一个很长的 List，自动对 List 分段

如果参数只有这个 List 的话，可以这样：

RpcUtils.listByIdsWithAutoPartition(
  userIdList, 
  UserQueryDTO::createWithIdList, 
  feignClient::listUser);

如果参数比较多，可以这样：

RpcUtils.listByIdsWithAutoPartition(
  userIdList, 
  // 注意, 这里必须的 DTO 不能在外面 new 出来
  // 必须保证每次调用这个 Function 都得到一个新对象, 否则会有线程安全问题
  (idList) -> UserQueryDTO.builder()
      .type(1L)
      .idList(idList)
      .build(), 
  feignClient::listUser);

分页/分段大小与多线程选项

上面只展示了最简单的用法，除了以上的必传参数以外，还支持设置两个可选项：

分页大小 pageSize / 分段大小 partitionSize，这两个大小都已经定义了一些常量，比如 PAGE_SIZE_128、PARTITION_SIZE_256 等等，建议直接使用常量
是否多线程 isConcurrent

下面的代码中，默认值是走单线程，业务场景对响应时长有要求时可以走多线程

这个是出于对数据库性能的考虑，这个工具类至少要保证 CV 到不同公司、不同项目后可以安全使用，如果数据库性能比较孱弱，默认走多线程可能会数据库打爆

两个特殊场景

场景 1

特殊的，你可能会遇到这样的场景，参数为 PageQuery，但是要传一个特别长的参数，暂时还没有 API 能直接解决这个问题，但可以采用曲线救国的方式：

直接把分页的大小设成 Integer.MAX_VALUE，时间复杂度为 O(n)，但是失去了 PageQuery 本身的意义，变成了一个类似第三种场景的状态
场景 2

如果有两个及以上很长的 List，本工具暂时无法解决，因为参数之间的关联关系一般为“与”， a in [1,2] and b in [3,4] 没有办法简单地转换为 (a in [1] and b in [3]) + (a in [2] and b in [4])

当然它可以做笛卡尔积来解决这个问题, 但是这个场景目前比较稀缺, 后续有需要再做

上代码

下面的导包隐去了 jdk 自带的一些包，以及 PageQuery、CommonResponse 等类的导入，CV 的到时候自行替换即可

import cn.hutool.core.collection.CollectionUtil;
import cn.hutool.core.collection.ListUtil;
import com.alibaba.fastjson.JSON;

import com.google.common.collect.Lists;
import org.springframework.core.ResolvableType;

public class RpcUtils {

    /**
     * 分页大小 - 64
     */
    public static final int PAGE_SIZE_64 = 64;

    /**
     * 分页大小 - 128
     */
    public static final int PAGE_SIZE_128 = 128;

    /**
     * 分页大小 - 256
     */
    public static final int PAGE_SIZE_256 = 256;

    /**
     * 分页大小 - 512
     */
    public static final int PAGE_SIZE_512 = 512;

    /**
     * 分页大小 - 1024
     */
    public static final int PAGE_SIZE_1024 = 1024;

    /**
     * 分页大小 - 2048
     */
    public static final int PAGE_SIZE_2048 = 2048;

    /**
     * 分页大小 - 4096
     */
    public static final int PAGE_SIZE_4096 = 4096;

    /**
     * 分页大小 - 8192
     */
    public static final int PAGE_SIZE_8192 = 8192;

    /**
     * 分页大小 - 16384
     */
    public static final int PAGE_SIZE_16384 = 16384;

    /**
     * 默认分页大小
     */
    public static final int DEFAULT_PAGE_SIZE = PAGE_SIZE_128;

    /**
     * 分段大小 - 64
     */
    public static final int PARTITION_SIZE_64 = 64;

    /**
     * 分段大小 - 128
     */
    public static final int PARTITION_SIZE_128 = 128;

    /**
     * 分段大小 - 256
     */
    public static final int PARTITION_SIZE_256 = 256;

    /**
     * 分段大小 - 512
     */
    public static final int PARTITION_SIZE_512 = 512;

    /**
     * 分段大小 - 1024
     */
    public static final int PARTITION_SIZE_1024 = 1024;

    /**
     * 分段大小 - 2048
     */
    public static final int PARTITION_SIZE_2048 = 2048;

    /**
     * 分段大小 - 4096
     */
    public static final int PARTITION_SIZE_4096 = 4096;

    /**
     * 分段大小 - 8192
     */
    public static final int PARTITION_SIZE_8192 = 8192;

    /**
     * 分段大小 - 16384
     */
    public static final int PARTITION_SIZE_16204 = 16384;

    /**
     * 默认分段大小 (内部网关的请求参数限制太小了, 256 都可能会报错, 128 目前还没有报错过)
     */
    public static final int DEFAULT_PARTITION_SIZE = PARTITION_SIZE_128;

    /**
     * 带自动分页的 RPC
     *
     * @param pageQuery rpc查询条件
     * @param fetcher   rpc 方法
     * @param <T>       rpc查询条件的类型
     * @param <R>       rpc 方法返回值的类型
     * @return
     */
    public static <T extends PageQuery, R> List<R> listWithAutoPage(T pageQuery, Function<T, CommonResponse<CommonPage<R>>> fetcher) {
        return listWithAutoPage(pageQuery, DEFAULT_PAGE_SIZE, false, fetcher);
    }

    /**
     * 带自动分页的 RPC
     *
     * @param pageQuery rpc查询条件
     * @param pageSize  分页大小
     * @param fetcher   rpc 方法
     * @param <T>       rpc查询条件的类型
     * @param <R>       rpc 方法返回值的类型
     * @return
     */
    public static <T extends PageQuery, R> List<R> listWithAutoPage(T pageQuery, int pageSize, Function<T, CommonResponse<CommonPage<R>>> fetcher) {
        return listWithAutoPage(pageQuery, pageSize, false, fetcher);
    }

    /**
     * 带自动分页的 RPC
     *
     * @param pageQuery    rpc查询条件
     * @param isConcurrent 是否多线程
     * @param fetcher      rpc 方法
     * @param <T>          rpc查询条件的类型
     * @param <R>          rpc 方法返回值的类型
     * @return
     */
    public static <T extends PageQuery, R> List<R> listWithAutoPage(T pageQuery, boolean isConcurrent, Function<T, CommonResponse<CommonPage<R>>> fetcher) {
        return listWithAutoPage(pageQuery, DEFAULT_PAGE_SIZE, isConcurrent, fetcher);
    }

    /**
     * 带自动分页的 RPC
     *
     * @param pageQuery    rpc查询条件
     * @param pageSize     分页大小
     * @param isConcurrent 是否多线程
     * @param fetcher      rpc 方法
     * @param <T>          rpc查询条件的类型
     * @param <R>          rpc 方法返回值的类型
     * @return
     */
    public static <T extends PageQuery, R> List<R> listWithAutoPage(T pageQuery, int pageSize, boolean isConcurrent, Function<T, CommonResponse<CommonPage<R>>> fetcher) {
        if (isConcurrent) {
            return listWithAutoPageConcurrently(pageQuery, pageSize, fetcher);
        }
        return listWithAutoPageNotConcurrently(pageQuery, pageSize, fetcher);
    }

    /**
     * 带自动分页的 RPC (单线程)
     *
     * @param pageQuery rpc查询条件
     * @param pageSize  rpc 分页大小
     * @param fetcher   rpc 方法
     * @param <T>       rpc查询条件的类型
     * @param <R>       rpc 方法返回值的类型
     * @return
     */
    public static <T extends PageQuery, R> List<R> listWithAutoPageNotConcurrently(T pageQuery, int pageSize, Function<T, CommonResponse<CommonPage<R>>> fetcher) {
        var pageIndex = 1;
        pageQuery.setPageSize(pageSize);

        var result = Lists.<R>newArrayList();
        while (true) {
            pageQuery.setPageIndex(pageIndex++);
            var list = collectToList(fetcher.apply(pageQuery));
            if (CollectionUtil.isEmpty(list)) {
                break;
            }
            result.addAll(list);
        }
        return result;
    }

    /**
     * 带自动分页的 RPC (多线程)
     * 有 BUG, 暂时停用
     *
     * @param pageQuery rpc查询条件
     * @param pageSize  rpc 分页大小
     * @param fetcher   rpc 方法
     * @param <T>       rpc查询条件的类型
     * @param <R>       rpc 方法返回值的类型
     * @return
     */
    public static <T extends PageQuery, R> List<R> listWithAutoPageConcurrently(T pageQuery, int pageSize, Function<T, CommonResponse<CommonPage<R>>> fetcher) {
        // 计算页数
        int totalCount = queryTotalCount(pageQuery, fetcher);
        int totalPageCount = computePageCount(totalCount, pageSize);

        // 开始查询
        pageQuery.setPageSize(pageSize);
        var pageQueryClass = (Class<T>) ResolvableType.forInstance(pageQuery).resolve();
        var futureList = Lists.<CompletableFuture<List<R>>>newArrayListWithCapacity(totalCount);

        for (int i = 1; i <= totalPageCount; i++) {
            // 如果用同一个 pageQuery 对象, 会出现 pageIndex 的线程竞争问题, 导致某些页被多次请求、某些页被漏查, 可以加锁解决, 但这样就跟单线程没差了, 还多了上下文切换的开销
            // 其实 BeanUtils.copy() 浅克隆也可以, 只要发送 feign 时使用的 pageQuery 对象不同就可以避免 pageIndex 的线程竞争
            final T clonedPageQuery = JSON.parseObject(JSON.toJSONString(pageQuery), pageQueryClass);
            clonedPageQuery.setPageIndex(i);

            CompletableFuture<List<R>> result = CompletableFuture.supplyAsync(() -> collectToList(fetcher.apply(clonedPageQuery)));
            futureList.add(result);
        }

        return flatCompletableFutureList(futureList, totalCount);
    }

    /**
     * 获取查询条件对应的结果总数
     *
     * @param pageQuery 查询条件, 里面的下表跟页数会被修改
     * @param fetcher
     * @param <T>
     * @param <R>
     * @return
     */
    public static <T extends PageQuery, R> int queryTotalCount(T pageQuery, Function<T, CommonResponse<CommonPage<R>>> fetcher) {
        pageQuery.setPageIndex(0);
        pageQuery.setPageSize(0);
        var resp = fetcher.apply(pageQuery).data();

        return Long.valueOf(resp.getTotalCount()).intValue();
    }

    /**
     * 计算页数, 结果并不准确, 可能会多一页
     *
     * @param totalCount
     * @param pageSize
     * @return
     */
    public static int computePageCount(int totalCount, int pageSize) {
        return totalCount / pageSize + 1;
    }

    /**
     * 自动把 idList 分段的 RPC
     * 适用于 rpc 的参数直接是 idList 的场景
     *
     * @param idList        用于查询的 id
     * @param fetcher       rpc 方法
     * @param <T>           id 的类型
     * @param <R>           rpc 方法返回值的类型
     * @return
     */
    public static <T,R> List<R> listByIdsWithAutoPartition(List<T> idList, Function<List<T>, CommonResponse<List<R>>> fetcher) {
        return listByIdsWithAutoPartition(idList, DEFAULT_PARTITION_SIZE, false, fetcher);
    }

    /**
     * 自动把 idList 分段的 RPC
     * 适用于 rpc 的参数直接是 idList 的场景
     *
     * @param idList       用于查询的 id
     * @param isConcurrent 是否多线程
     * @param fetcher      rpc 方法
     * @param <T>          id 的类型
     * @param <R>          rpc 方法返回值的类型
     * @return
     */
    public static <T,R> List<R> listByIdsWithAutoPartition(List<T> idList, boolean isConcurrent, Function<List<T>, CommonResponse<List<R>>> fetcher) {
        return listByIdsWithAutoPartition(idList, DEFAULT_PARTITION_SIZE, isConcurrent, fetcher);
    }

    /**
     * 自动把 idList 分段的 RPC
     * 适用于 rpc 的参数直接是 idList 的场景
     *
     * @param idList        用于查询的 id
     * @param partitionSize 分段的大小
     * @param fetcher       rpc 方法
     * @param <T>           id 的类型
     * @param <R>           rpc 方法返回值的类型
     * @return
     */
    public static <T,R> List<R> listByIdsWithAutoPartition(List<T> idList, int partitionSize, Function<List<T>, CommonResponse<List<R>>> fetcher) {
        return listByIdsWithAutoPartition(idList, partitionSize, false, fetcher);
    }

    /**
     * 自动把 idList 分段的 RPC
     * 适用于 rpc 的参数直接是 idList 的场景
     *
     * @param idList        用于查询的 id
     * @param partitionSize 分段的大小
     * @param isConcurrent  是否多线程
     * @param fetcher       rpc 方法
     * @param <T>           id 的类型
     * @param <R>           rpc 方法返回值的类型
     * @return
     */
    public static <T,R> List<R> listByIdsWithAutoPartition(List<T> idList, int partitionSize, boolean isConcurrent, Function<List<T>, CommonResponse<List<R>>> fetcher) {
        return listByIdsWithAutoPartition(idList, partitionSize, Function.identity(), isConcurrent, fetcher);
    }


    /**
     * 自动把 idList 分段的 RPC
     * 适用于 rpc 的参数并不直接是 idList 的场景, 需要通过 rpcParamWrapper 包装一下
     *
     * @param idList          用于查询的 id
     * @param rpcParamWrapper 包装器, 把 idList 包装为 rpc 方法的参数
     * @param fetcher         rpc 方法
     * @param <T>             id 的类型
     * @param <W>             rpc查询条件的类型
     * @param <R>             rpc 方法返回值的类型
     * @return
     */
    public static <T, W, R> List<R> listByIdsWithAutoPartition(List<T> idList, Function<List<T>, W> rpcParamWrapper, Function<W, CommonResponse<List<R>>> fetcher) {
        return listByIdsWithAutoPartition(idList, DEFAULT_PARTITION_SIZE, rpcParamWrapper, false, fetcher);
    }

    /**
     * 自动把 idList 分段的 RPC
     * 适用于 rpc 的参数并不直接是 idList 的场景, 需要通过 rpcParamWrapper 包装一下
     *
     * @param idList          用于查询的 id
     * @param rpcParamWrapper 包装器, 把 idList 包装为 rpc 方法的参数
     * @param isConcurrent    是否多线程
     * @param fetcher         rpc 方法
     * @param <T>             id 的类型
     * @param <W>             rpc查询条件的类型
     * @param <R>             rpc 方法返回值的类型
     * @return
     */
    public static <T, W, R> List<R> listByIdsWithAutoPartition(List<T> idList, Function<List<T>, W> rpcParamWrapper,
                                                               boolean isConcurrent, Function<W, CommonResponse<List<R>>> fetcher) {
        return listByIdsWithAutoPartition(idList, DEFAULT_PARTITION_SIZE, rpcParamWrapper, isConcurrent, fetcher);
    }


    /**
     * 自动把 idList 分段的 RPC
     * 适用于 rpc 的参数并不直接是 idList 的场景, 需要通过 rpcParamWrapper 包装一下
     *
     * @param idList          用于查询的 id
     * @param partitionSize   分段大小
     * @param rpcParamWrapper 包装器, 把 idList 包装为 rpc 方法的参数
     * @param fetcher         rpc 方法
     * @param <T>             id 的类型
     * @param <W>             rpc查询条件的类型
     * @param <R>             rpc 方法返回值的类型
     * @return
     */
    public static <T, W, R> List<R> listByIdsWithAutoPartition(List<T> idList, int partitionSize,
                                                               Function<List<T>, W> rpcParamWrapper,
                                                               Function<W, CommonResponse<List<R>>> fetcher) {
        return listByIdsWithAutoPartition(idList, partitionSize, rpcParamWrapper, false, fetcher);
    }

    /**
     * 自动把 idList 分段的 RPC
     * 适用于 rpc 的参数并不直接是 idList 的场景, 需要通过 rpcParamWrapper 包装一下
     *
     * @param idList          用于查询的 id
     * @param partitionSize   分段大小
     * @param rpcParamWrapper 包装器, 把 idList 包装为 rpc 方法的参数
     * @param isConcurrent    是否多线程
     * @param fetcher         rpc 方法
     * @param <T>             id 的类型
     * @param <W>             rpc查询条件的类型
     * @param <R>             rpc 方法返回值的类型
     * @return
     */
    public static <T, W, R> List<R> listByIdsWithAutoPartition(List<T> idList, int partitionSize,
                                                               Function<List<T>, W> rpcParamWrapper, boolean isConcurrent,
                                                               Function<W, CommonResponse<List<R>>> fetcher) {
        var isMoreThanOnePartition = CollectionUtil.size(idList) > partitionSize;
        if (isConcurrent && isMoreThanOnePartition) {
            return listByIdsWithAutoPartitionConcurrently(idList, partitionSize, rpcParamWrapper, fetcher);
        }
        return listByIdsWithAutoPartitionNotConcurrently(idList, partitionSize, rpcParamWrapper, fetcher);
    }

    /**
     * 自动把 idList 分段的 RPC (单线程)
     *
     * @param idList          用于查询的 id
     * @param partitionSize   分段的大小
     * @param rpcParamWrapper 包装器, 把 idList 包装为 rpc 方法的参数
     * @param fetcher         rpc 方法
     * @param <T>             id 的类型
     * @param <W>             rpc查询条件的类型
     * @param <R>             rpc 方法返回值的类型
     * @return
     */
    public static <T, W, R> List<R> listByIdsWithAutoPartitionNotConcurrently(List<T> idList, int partitionSize,
                                                                              Function<List<T>, W> rpcParamWrapper,
                                                                              Function<W, CommonResponse<List<R>>> fetcher) {
        var partitionIdList = ListUtil.partition(idList, partitionSize);

        return partitionIdList.stream()
                .flatMap(ids -> {
                    var result = collectList(fetcher, rpcParamWrapper.apply(ids));
                    return CollectionUtil.isEmpty(result) ? Stream.empty() : result.stream();
                })
                .collect(Collectors.toList());
    }

    /**
     * 自动把 idList 分段的 RPC (多线程)
     *
     * @param idList          用于查询的 id
     * @param partitionSize   分段的大小
     * @param rpcParamWrapper 包装器, 把 idList 包装为 rpc 方法的参数
     * @param fetcher         rpc 方法
     * @param <T>             id 的类型
     * @param <W>             rpc查询条件的类型
     * @param <R>             rpc 方法返回值的类型
     * @return
     */
    public static <T, W, R> List<R> listByIdsWithAutoPartitionConcurrently(List<T> idList, int partitionSize, Function<List<T>, W> rpcParamWrapper, Function<W, CommonResponse<List<R>>> fetcher) {
        var estimateResultSize = CollectionUtil.size(idList);
        var partitionedIdList = ListUtil.partition(idList, partitionSize);

        var futureList = partitionedIdList.stream()
                .map(ids -> CompletableFuture.supplyAsync(() -> collectList(fetcher, rpcParamWrapper.apply(ids))))
                .collect(Collectors.toList());

        return flatCompletableFutureList(futureList, estimateResultSize);
    }

    /**
     * 把 CompletableFuture 的结果聚合起来
     * 由于直接使用 steam 的 api 会报 unhandled exceptions , 需要用 for, 所以有了这个方法
     *
     * @param futureList         异步任务列表
     * @param estimateResultSize 最后返回的 List 的预计大小
     * @param <R>                返回值的类型
     * @return 异步任务列表里面的所有返回值
     */
    public static <R> List<R> flatCompletableFutureList(List<CompletableFuture<List<R>>> futureList, Integer estimateResultSize) {
        var resultList = Objects.nonNull(estimateResultSize) && estimateResultSize >= 0 ?  Lists.<R>newArrayListWithCapacity(estimateResultSize) : Lists.<R>newArrayList();

        try {
            CompletableFuture.allOf(futureList.toArray(new CompletableFuture[0])).get();

            for (var future : futureList) {
                resultList.addAll(future.get());
            }

        } catch (InterruptedException | ExecutionException e) {
            e.printStackTrace();
            throw new BizException("CompletableFuture 获取结果异常: " + e.getMessage());
        }

        return resultList;
    }

    public static <R> List<R> collectToList(CommonResponse<CommonPage<R>> response) {
        return Lists.newArrayList(response.data().getList());
    }

    public static <T, R> List<R> collectList(Function<T, CommonResponse<List<R>>> func, T arg) {
        var data = func.apply(arg).data();
        return CollectionUtil.isEmpty(data) ? Collections.emptyList() : data;
    }

}