EasyExcel 百万数据导出实战篇大家好，我是雪荷。今天给大家分享一个较硬核的干货——利用多线程技术实现百万数据的导

前言

大家好，我是雪荷。今天给大家分享一个较硬核的干货——利用多线程技术实现百万数据的导出，学的好的话甚至可以写在简历上，话不说开始实战。

EasyExcel 如何实现数据写入 Excel

在我以前的文章介绍了如何利用 EasyExcel 实现多种方式 Excel 的写入，不了解的小伙伴可以👀这篇文章📓：blog.csdn.net/xyendjsj/ar…

前置准备

创建一个数据库表，模拟真实需要导出的场景。

create database if not exists lingxibi;

create table lingxibi.user
(
    id              bigint auto_increment comment 'id'
        primary key,
    userAccount     varchar(256)                                                                                                           not null comment '账号',
    userPassword    varchar(512)                                                                                                           not null comment '密码',
    userName        varchar(256)                                                                                                           null comment '用户昵称',
    userAvatar      varchar(1024) default 'https://th.bing.com/th/id/R.54a295a86f04aaf12f1285d4e00fd6be?rik=QAdEADu3LNh9Hg&pid=ImgRaw&r=0' null comment '用户头像',
    userRole        varchar(256)  default 'user'                                                                                           not null comment '用户角色：user/admin/ban',
    score           int           default 0                                                                                                not null comment '积分',
    generatingCount int           default 0                                                                                                not null comment '正在生成的图表数量',
    createTime      datetime      default CURRENT_TIMESTAMP                                                                                not null comment '创建时间',
    updateTime      datetime      default CURRENT_TIMESTAMP                                                                                not null on update CURRENT_TIMESTAMP comment '更新时间',
    isDelete        tinyint       default 0                                                                                                not null comment '是否删除'
)
    comment '用户' collate = utf8mb4_unicode_ci;

create index idx_userAccount
    on lingxibi.user (userAccount);

定义一个存储过程，往 user 表插入 100w 数据。说句题外话，插入大量数据建议用 Navicat ，别用 IDEA 自带的数据库工具，容易卡死。

use lingxibi;

DROP PROCEDURE IF EXISTS insert_users;

DELIMITER $$

CREATE PROCEDURE insert_users()
BEGIN
    -- 变量声明
    DECLARE i INT DEFAULT 1;

    -- 异常处理
    DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
    BEGIN
        -- 出现异常时回滚事务
        ROLLBACK;
        SELECT 'Error occurred, transaction rolled back.' AS message;
    END;

    -- 开启事务
    START TRANSACTION;

    -- 循环插入数据
    WHILE i <= 1000000 DO

        -- 插入数据
        INSERT INTO `lingxibi`.`user` (userAccount, userPassword, userName, userAvatar, userRole, score, generatingCount, createTime, updateTime, isDelete)
        VALUES ('hanbao', 'b0dd3697a192885d7c055db46155b26a', '汉堡', NULL, 'user', FLOOR(RAND() * 1000), 0, NOW(), NOW(), 0);

        SET i = i + 1;
    END WHILE;

    -- 提交事务
    COMMIT;

    SELECT 'Data inserted successfully.' AS message;
END$$

DELIMITER ;

CALL insert_users();

调用结果如下：

随后创建一个 Spring Boot 工程，记得勾选 lombok 依赖，spring web 想要就要吧，其他就不需要了。

引入 EasyExcel 和 mysql 驱动、mybatis-plus 依赖。

        <!-- easyexcel -->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>easyexcel</artifactId>
            <version>2.1.7</version>
        </dependency>
        <dependency>
            <groupId>com.mysql</groupId>
            <artifactId>mysql-connector-j</artifactId>
        </dependency>
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>mybatis-plus-boot-starter</artifactId>
            <version>3.5.2</version> <!-- 根据实际需求调整版本 -->
        </dependency>

用 MyBatisX 插件生成 Mapper 和 Service 层代码，将生成代码移入指定的包，在 resources 目录新建一个 user.xlsx 文件，具体结构如下：

技术难点

我们思考一下海量数据数据导出的技术难点是啥，我觉得主要有以下几点：

OOM：一次性导出数据会导致 OOM，不然也不会有这篇文章了
海量数据的查询：如何按照顺序依次导出数据，有些数据要求导出数据不能变
耗时问题：如果你足够有钱能升级内存，可能不会导致 OOM，但是导出时间肯定很长，导致服务崩溃。可能是垃圾回收耗时，可能是数据查询耗时。

如何解决

针对第一个和第三个问题，可以使用多线程分块写入数据，减少线程持有数据的时间，因为 GC 回收垃圾是在线程执行完毕后进行的
按照多线程分批的数据量和批次进行分页查询，但这样会导致深分页问题使得查询性能下降。因此我们可以按照主键 id 进行范围查询，比如 0 < id < 10000，10000 < id < 20000 以此类推，前提是主键 id 是顺序递增的。如果不行的话可以添加一个冗余字段，用代码维护字段的递增，或者如果表中没有用到递增列就设为递增列并设置二级索引，但这样插入性能低，而且数据量大时占用空间可不小呢。

代码实现

话不多说直接看代码吧，我创建了一个固定线程池，线程数为 5，然后每批数据量为 10 w，总共分 10 批。下面代码我就不解释了，相信看到这里的帅老们都能看懂。

@Component
public class ExcelUtil {

    @Resource
    private UserService userService;

    private ExecutorService executorService = Executors.newFixedThreadPool(5);

    public void exportData() {
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        // 获取 JVM 当前内存状态
        Runtime runtime = Runtime.getRuntime();
        long beforeUsedMemory = runtime.totalMemory() - runtime.freeMemory();
        // 获取数据总量和每页大小
        int pageSize = 100000;
        long totalSize = userService.count();
        // 计算总页数
        int totalPage = (int) ((totalSize + pageSize - 1) / pageSize);
        List<CompletableFuture<Void>> futureList = new ArrayList<>();
        final long[] minId = {0};
        for (int i = 0; i < totalPage; i++) {
            long currentMinId = minId[0];
            CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {
                List<User> users = userService.queryByIdRange(currentMinId, pageSize);
                if (!users.isEmpty()) {
                    minId[0] = users.get(users.size() - 1).getId();
                }
                writeDataToExcel(users);
            }, executorService);
            futureList.add(future);
        }
        CompletableFuture.allOf(futureList.toArray(new CompletableFuture[0])).join();
        stopWatch.stop();
        long afterUsedMemory = runtime.totalMemory() - runtime.freeMemory();
        long memoryUsed = afterUsedMemory - beforeUsedMemory;
        System.out.println("导出数据耗时：" + stopWatch.getTotalTimeMillis() + "ms");
        System.out.println("导出数据内存使用：" + memoryUsed / (1024 * 1024) + " MB");
    }

    public void writeDataToExcel(List<User> users) {
        String fileName = "/Users/hejiajun/Documents/study/projects-demo/excel-demo/src/main/resources/user.xlsx";
        EasyExcel.write(fileName).sheet().doWrite(users);
    }
}

创建一个 Test 类，看看耗时时间和用了多少内存。

@SpringBootTest
public class ExcelTest {

    @Resource
    private ExcelUtil excelUtil;

    @Test
    public void test() {
        excelUtil.exportData();
    }
}

UserSerive.queryByIdRange 实现：

    public List<User> queryByIdRange(long minId, int limit) {
        return this.baseMapper.selectList(
                new LambdaQueryWrapper<User>()
                        .gt(User::getId, minId)   // 查询 ID 大于 minId 的记录
                        .orderByAsc(User::getId) // 按 ID 升序排序
                        .last("LIMIT " + limit  )  // 限制查询条数
        );
    }

结果如下，我们看到花了 14s 左右，内存为 611 MB

那么尝试将 pageSize 参数改为 25w 再导出试试。没想到吧，时间还长了，内存也变大了。

那我改下线程数试试呢？我改为 9 试试，pageSize依旧是 10w。好吗时间和内存都变大了。

我再把pageSize改为 25w，线程数还为 9。时间几乎差不多，但是内存变多了。

综上所述，你问我该如何调整每批的数据量还线程数我也不知道，唯有实践出真知。下面给一些线程池参数的建议：

CPU 密集型（数学计算）：CPU 核心数 + 1
IO 密集型（文件读取，数据库读取）：CPU 核心数 * 2 或更多些

值得注意的是我用的是固定线程池，即任务数量变多都线程数并不会变多和减少，适用于任务数量相对稳定，不会出现突发的大量任务的场景。

常见线程池的创建方式

1）FixedThreadPool：创建一个固定数量的线程池

线程池中的线程数是固定的，空闲的线程会被复用。如果所有线程都在忙，则新任务会放入队列中等待。适合负载稳定的场景，任务数量确定且不需要动态调整线程数。

2）CachedThreadPool：一个可以根据需要创建新线程的线程池

线程池的线程数量没有上限，空闲线程会在 60 秒后被回收，如果有新任务且没有可用线程，会创建新线程。

适合短期大量并发任务的场景，任务执行时间短且线程数需求变化较大。

3）SignleThreadExecutor：创建一个只有单线程的线程池

只有一个线程处理任务，任务会按照提交顺序执行。

适用于按顺序执行的场景，或不需要并发处理任务的场景

4）ScheduledThreadPool：支持定时任务和周期性任务的线程池

以定时或固定频次执行任务，如定时任务调度器。

使用周期性执行任务的场景，如定时任务调度器。

5）WorkStealingPool：基于任务窃取算法的线程池

线程池中的每个线程维护一个双端队列(deque)，线程可以从自己的队列中取任务执行。如果线程的任务队列为空，它可以从其他线程的队列中“窃取“任务来执行，达到负载均衡的效果。

适合大量小任务并行执行，特别是递归算法或大任务分解成小任务的场景。

注意

EasyExcel的 write 方法是线程不安全的，因此在多线程环境下需要进行加锁，以防出现多个线程写入文件导致损坏文件，但是不知道我没加锁也没损坏文件，有知道的大佬可以指导一下。另外有任何讲的不对的地方欢迎各位指正。

EasyExcel.write 方法加锁实例：

@Component
public class ExcelUtil {

    @Resource
    private UserService userService;

    private ExecutorService executorService = Executors.newFixedThreadPool(5);
    
    private final ReentrantLock lock = new ReentrantLock();

    public void exportData() {
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        // 获取 JVM 当前内存状态
        Runtime runtime = Runtime.getRuntime();
        long beforeUsedMemory = runtime.totalMemory() - runtime.freeMemory();
        // 获取数据总量和每页大小
        int pageSize = 100000;
        long totalSize = userService.count();
        // 计算总页数
        int totalPage = (int) ((totalSize + pageSize - 1) / pageSize);
        List<CompletableFuture<Void>> futureList = new ArrayList<>();
        final long[] minId = {0};
        for (int i = 0; i < totalPage; i++) {
            long currentMinId = minId[0];
            CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {
                List<User> users = userService.queryByIdRange(currentMinId, pageSize);
                if (!users.isEmpty()) {
                    minId[0] = users.get(users.size() - 1).getId();
                }
                writeDataToExcel(users);
            }, executorService);
            futureList.add(future);
        }
        CompletableFuture.allOf(futureList.toArray(new CompletableFuture[0])).join();
        stopWatch.stop();
        long afterUsedMemory = runtime.totalMemory() - runtime.freeMemory();
        long memoryUsed = afterUsedMemory - beforeUsedMemory;
        System.out.println("导出数据耗时：" + stopWatch.getTotalTimeMillis() + "ms");
        System.out.println("导出数据内存使用：" + memoryUsed / (1024 * 1024) + " MB");
    }

    public void writeDataToExcel(List<User> users) {
        String fileName = "/Users/hejiajun/Documents/study/projects-demo/excel-demo/src/main/resources/user.xlsx";
        lock.lock();
        try {
            EasyExcel.write(fileName).sheet().doWrite(users);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            lock.unlock();
        }
    }
}

开源项目

厚米匹配

网址：hm.hejiajun.icu/

前端仓库：github.com/dnwwdwd/hom…

后端仓库：github.com/dnwwdwd/hom…

灵犀 BI

网址：bi.hejiajun.icu/

前端仓库：github.com/dnwwdwd/Lin…

后端仓库：github.com/dnwwdwd/Lin…