背景
最近公司一个项目,需要将系统里面的用户相关的数据导出到Excel,随着用户数据不断增长,现在已经有200多万的数据了。原来的导出方式,导致系统频繁出现内存溢出,迫不得已必须进行改造。
调优过程
以前的代码
String fileName = URLEncoder.encode("志愿者数据" + DateUtil.today(), "UTF-8");
response.setContentType("application/vnd.ms-excel");
response.setHeader("Content-Disposition", "attachment;filename=" + fileName + ".xlsx");
ServletOutputStream outputStream = response.getOutputStream();
//根据条件查询符合条件的全部数据,作为系统管理员的时候,可以导出数据库中的全部数据。
List<VolunteerUserVo> result = volunteerUserService.searchAll(x, null, null, null, volunteerName, labelName, phoneNumber, areaCode, StatusEnum.AUDIT.getValue() + "",
i, EXCEL_EXPORT_THRESHOLD, null, getSessionUser(token)).getResult();
EasyExcel.write(outputStream).sheet("sheet").doWrite(result);
重构第一版
/**
* 导出Excel时,单次查询返回数量的阈值50W
* 如果需导出的数量超过该阈值,分批次查询数据
* 这里这个阈值的设定,根据各位机器的实际情况而定
*/
private static final Integer EXCEL_EXPORT_THRESHOLD = 500000;
//查询符合条件的志愿者数量
Long count = volunteerUserService.searchCount(sex, areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
long page = 1;
if (count > EXCEL_EXPORT_THRESHOLD) {
if (count % EXCEL_EXPORT_THRESHOLD == 0) {
page = count / EXCEL_EXPORT_THRESHOLD;
} else {
page = count / EXCEL_EXPORT_THRESHOLD + 1;
}
}
String fileName = URLEncoder.encode("志愿者数据" + DateUtil.today(), "UTF-8");
response.setContentType("application/vnd.ms-excel");
response.setHeader("Content-Disposition", "attachment;filename=" + fileName + ".xlsx");
ServletOutputStream outputStream = response.getOutputStream();
ExcelWriter excelWriter = EasyExcel.write(outputStream).build();
WriteSheet writeSheet;
for (int i = 1; i <= page; i++) {
//TODO 这个SearchPage方法还需要重构,不适用于Excel导出;思路:这次查询的数据的最后一条数据的ID,作为下次查询时候的ID偏移量,避免扫表
List<VolunteerUserVo> result = volunteerUserService.searchPage(sex, null, null, null, volunteerName, labelName, phoneNumber, areaCode, StatusEnum.AUDIT.getValue() + "",
i, EXCEL_EXPORT_THRESHOLD, null, getSessionUser(token)).getResult();
writeSheet = EasyExcel.writerSheet(i, "sheet" + i).head(VolunteerUserVo.class).build();
excelWriter.write(result, writeSheet);
}
excelWriter.finish();
outputStream.close();
}
这一版体现了分治的思想,但是对于数据库来说会多出很多次的查询,如果使用普通的 limit n,m ,当越往后面分页的时候效率越慢,这里需要优化;思路:每次分页查询的最后一条数据的ID,作为分页查询的Id偏移量,尽量避免不必要的扫表操作
这一版,还是会出现一个问题。那就是当多个系统管理员同时操作导出志愿者数据的时候,依旧可能会出现OOM异常。 是否需要有个保护措施?我认为是需要的,需要限制同时可以操作导出功能的用户量。
重构第二版
Long count = volunteerUserService.searchCount(sex, areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
long page = 1;
if (count > EXCEL_EXPORT_THRESHOLD) {
if (count % EXCEL_EXPORT_THRESHOLD == 0) {
page = count / EXCEL_EXPORT_THRESHOLD;
} else {
page = count / EXCEL_EXPORT_THRESHOLD + 1;
}
}
String fileName = URLEncoder.encode("志愿者数据" + DateUtil.today(), "UTF-8");
response.setContentType("application/vnd.ms-excel");
response.setHeader("Content-Disposition", "attachment;filename=" + fileName + ".xlsx");
ServletOutputStream outputStream = response.getOutputStream();
ExcelWriter excelWriter = EasyExcel.write(outputStream).build();
WriteSheet writeSheet;
int offsetId = 0;
for (int i = 1; i <= page; i++) {
List<VolunteerUserVo> result = volunteerUserService.searchPageForExcel(offsetId, EXCEL_EXPORT_THRESHOLD, sex,
areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
writeSheet = EasyExcel.writerSheet(i, "sheet" + i).head(VolunteerUserVo.class).build();
excelWriter.write(result, writeSheet);
if (CollectionUtils.isNotEmpty(result)){
offsetId = result.get(result.size()-1).getId();
}
}
excelWriter.finish();
outputStream.close();
这一版,还是会出现一个问题。那就是当多个系统管理员同时操作导出志愿者数据的时候,依旧可能会出现OOM异常。 是否需要有个保护措施?我认为是需要的,需要限制同时可以操作导出功能的用户量。
重构第三版
/**
* 当前正在内存中准备导出的Excle的行数的阈值
* 根据各位的机器硬件和需要导出的列的数据长度而定
**/
private static final Integer CURRENT_EXCEL_EXPORT_ROWS_THRESHOLD = 2300000;
Long count = volunteerUserService.searchCount(sex, areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
synchronized (this){
log.info("Before--当前的currentExcelExportRows:{}",currentExcelExportRows);
if ((currentExcelExportRows+count) > CURRENT_EXCEL_EXPORT_ROWS_THRESHOLD){
throw new ApiResponseException(ApiResponseEnum.ERROR,"系统繁忙,请稍后再试");
}
}
currentExcelExportRows+=count;
log.info("After--当前的currentExcelExportRows:{}",currentExcelExportRows);
long page = 1;
if (count > EXCEL_EXPORT_THRESHOLD) {
if (count % EXCEL_EXPORT_THRESHOLD == 0) {
page = count / EXCEL_EXPORT_THRESHOLD;
} else {
page = count / EXCEL_EXPORT_THRESHOLD + 1;
}
}
String fileName = URLEncoder.encode("志愿者数据" + DateUtil.today(), "UTF-8");
response.setContentType("application/vnd.ms-excel");
response.setHeader("Content-Disposition", "attachment;filename=" + fileName + ".xlsx");
ServletOutputStream outputStream = response.getOutputStream();
ExcelWriter excelWriter = EasyExcel.write(outputStream).build();
WriteSheet writeSheet;
int offsetId = 0;
for (int i = 1; i <= page; i++) {
List<VolunteerUserVo> result = volunteerUserService.searchPageForExcel(offsetId, EXCEL_EXPORT_THRESHOLD, sex,
areaCode, volunteerName, phoneNumber, labelName, getSessionUser(token));
writeSheet = EasyExcel.writerSheet(i, "sheet" + i).head(VolunteerUserVo.class).build();
excelWriter.write(result, writeSheet);
if (CollectionUtils.isNotEmpty(result)){
offsetId = result.get(result.size()-1).getId();
}
synchronized (this){
currentExcelExportRows-=result.size();
}
}
excelWriter.finish();
outputStream.close();
总结
至此Excel导出优化完成。通过这次的优化,深刻的认识了将一个大问题划分为多个小问题,再优化各个小问题,这种思想可以做到原来看起来不可实现的事情。