EasyExcel导出百万数据优化记录

5,486 阅读3分钟

背景

最近公司一个项目,需要将系统里面的用户相关的数据导出到Excel,随着用户数据不断增长,现在已经有200多万的数据了。原来的导出方式,导致系统频繁出现内存溢出,迫不得已必须进行改造。

调优过程

以前的代码

    String fileName = URLEncoder.encode("志愿者数据" + DateUtil.today(), "UTF-8");
    response.setContentType("application/vnd.ms-excel");
    response.setHeader("Content-Disposition", "attachment;filename=" + fileName + ".xlsx");
    ServletOutputStream outputStream = response.getOutputStream();
	//根据条件查询符合条件的全部数据,作为系统管理员的时候,可以导出数据库中的全部数据。
	List<VolunteerUserVo> result = volunteerUserService.searchAll(x, null, null, null, volunteerName, labelName, phoneNumber, areaCode, StatusEnum.AUDIT.getValue() + "",
                    i, EXCEL_EXPORT_THRESHOLD, null, getSessionUser(token)).getResult();
	EasyExcel.write(outputStream).sheet("sheet").doWrite(result);

重构第一版

/**
     * 导出Excel时,单次查询返回数量的阈值50W
     * 如果需导出的数量超过该阈值,分批次查询数据
	 * 这里这个阈值的设定,根据各位机器的实际情况而定
     */
    private static final Integer EXCEL_EXPORT_THRESHOLD = 500000;
	//查询符合条件的志愿者数量
	Long count = volunteerUserService.searchCount(sex, areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
        long page = 1;
        if (count > EXCEL_EXPORT_THRESHOLD) {
            if (count % EXCEL_EXPORT_THRESHOLD == 0) {
                page = count / EXCEL_EXPORT_THRESHOLD;
            } else {
                page = count / EXCEL_EXPORT_THRESHOLD + 1;
            }
        }
        String fileName = URLEncoder.encode("志愿者数据" + DateUtil.today(), "UTF-8");
        response.setContentType("application/vnd.ms-excel");
        response.setHeader("Content-Disposition", "attachment;filename=" + fileName + ".xlsx");
        ServletOutputStream outputStream = response.getOutputStream();
        ExcelWriter excelWriter = EasyExcel.write(outputStream).build();
        WriteSheet writeSheet;
        for (int i = 1; i <= page; i++) {
            //TODO 这个SearchPage方法还需要重构,不适用于Excel导出;思路:这次查询的数据的最后一条数据的ID,作为下次查询时候的ID偏移量,避免扫表
            List<VolunteerUserVo> result = volunteerUserService.searchPage(sex, null, null, null, volunteerName, labelName, phoneNumber, areaCode, StatusEnum.AUDIT.getValue() + "",
                    i, EXCEL_EXPORT_THRESHOLD, null, getSessionUser(token)).getResult();

            writeSheet = EasyExcel.writerSheet(i, "sheet" + i).head(VolunteerUserVo.class).build();
            excelWriter.write(result, writeSheet);
        }
        excelWriter.finish();
        outputStream.close();
    }

这一版体现了分治的思想,但是对于数据库来说会多出很多次的查询,如果使用普通的 limit n,m ,当越往后面分页的时候效率越慢,这里需要优化;思路:每次分页查询的最后一条数据的ID,作为分页查询的Id偏移量,尽量避免不必要的扫表操作

这一版,还是会出现一个问题。那就是当多个系统管理员同时操作导出志愿者数据的时候,依旧可能会出现OOM异常。 是否需要有个保护措施?我认为是需要的,需要限制同时可以操作导出功能的用户量。

重构第二版

    Long count = volunteerUserService.searchCount(sex, areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
        long page = 1;
        if (count > EXCEL_EXPORT_THRESHOLD) {
            if (count % EXCEL_EXPORT_THRESHOLD == 0) {
                page = count / EXCEL_EXPORT_THRESHOLD;
            } else {
                page = count / EXCEL_EXPORT_THRESHOLD + 1;
            }
        }
        String fileName = URLEncoder.encode("志愿者数据" + DateUtil.today(), "UTF-8");
        response.setContentType("application/vnd.ms-excel");
        response.setHeader("Content-Disposition", "attachment;filename=" + fileName + ".xlsx");
        ServletOutputStream outputStream = response.getOutputStream();
        ExcelWriter excelWriter = EasyExcel.write(outputStream).build();
        WriteSheet writeSheet;
        int offsetId = 0;
        for (int i = 1; i <= page; i++) {
            List<VolunteerUserVo> result = volunteerUserService.searchPageForExcel(offsetId, EXCEL_EXPORT_THRESHOLD, sex,
                    areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
            writeSheet = EasyExcel.writerSheet(i, "sheet" + i).head(VolunteerUserVo.class).build();
            excelWriter.write(result, writeSheet);
            if (CollectionUtils.isNotEmpty(result)){
                offsetId = result.get(result.size()-1).getId();
            }
        }
        excelWriter.finish();
        outputStream.close();

这一版,还是会出现一个问题。那就是当多个系统管理员同时操作导出志愿者数据的时候,依旧可能会出现OOM异常。 是否需要有个保护措施?我认为是需要的,需要限制同时可以操作导出功能的用户量。

重构第三版

		/**
		* 当前正在内存中准备导出的Excle的行数的阈值
		* 根据各位的机器硬件和需要导出的列的数据长度而定
		**/
       private static final Integer CURRENT_EXCEL_EXPORT_ROWS_THRESHOLD = 2300000;
       Long count = volunteerUserService.searchCount(sex, areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
        synchronized (this){
            log.info("Before--当前的currentExcelExportRows:{}",currentExcelExportRows);
            if ((currentExcelExportRows+count) > CURRENT_EXCEL_EXPORT_ROWS_THRESHOLD){
                throw new ApiResponseException(ApiResponseEnum.ERROR,"系统繁忙,请稍后再试");
            }
        }
        currentExcelExportRows+=count;
        log.info("After--当前的currentExcelExportRows:{}",currentExcelExportRows);
        long page = 1;
        if (count > EXCEL_EXPORT_THRESHOLD) {
            if (count % EXCEL_EXPORT_THRESHOLD == 0) {
                page = count / EXCEL_EXPORT_THRESHOLD;
            } else {
                page = count / EXCEL_EXPORT_THRESHOLD + 1;
            }
        }
        String fileName = URLEncoder.encode("志愿者数据" + DateUtil.today(), "UTF-8");
        response.setContentType("application/vnd.ms-excel");
        response.setHeader("Content-Disposition", "attachment;filename=" + fileName + ".xlsx");
        ServletOutputStream outputStream = response.getOutputStream();
        ExcelWriter excelWriter = EasyExcel.write(outputStream).build();
        WriteSheet writeSheet;
        int offsetId = 0;
        for (int i = 1; i <= page; i++) {
            List<VolunteerUserVo> result = volunteerUserService.searchPageForExcel(offsetId, EXCEL_EXPORT_THRESHOLD, sex,
                    areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
            writeSheet = EasyExcel.writerSheet(i, "sheet" + i).head(VolunteerUserVo.class).build();
            excelWriter.write(result, writeSheet);
            if (CollectionUtils.isNotEmpty(result)){
                offsetId = result.get(result.size()-1).getId();
            }
            synchronized (this){
                currentExcelExportRows-=result.size();
            }
        }
        excelWriter.finish();
        outputStream.close();

总结

至此Excel导出优化完成。通过这次的优化,深刻的认识了将一个大问题划分为多个小问题,再优化各个小问题,这种思想可以做到原来看起来不可实现的事情。