前置知识
- 多线程&线程池
- JDBC手动事务提交
- 线程锁&线程同步工具
核心代码
@Service
@Slf4j
public class UserServiceImpl extends ServiceImpl<UserMapper, User> implements UserService {
private final ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(8, 8, 60, TimeUnit.SECONDS, new LinkedBlockingQueue<>());
@Autowired
private TransactionTemplate transactionTemplate;
@Override
public void addAll() {
long startTime = System.currentTimeMillis();
int total = 1000000;
int size = 1000;
int batch = total / size;
log.info("总条数:{},批次:{},每批:{}条", total, batch, size);
CountDownLatch latch = new CountDownLatch(batch);
for (int i = 0; i < batch; i++) {
int finalI = i;
threadPoolExecutor.execute(() -> {
CopyOnWriteArrayList<User> arrayList = new CopyOnWriteArrayList<>();
for (int y = 0; y < size; y++) {
User user = new User();
user.setAge(finalI);
user.setSex(0);
user.setName(String.valueOf(finalI + y));
user.setMark(new BigDecimal(y));
arrayList.add(user);
}
transactionTemplate.execute(new TransactionCallbackWithoutResult() {
@Override
protected void doInTransactionWithoutResult(TransactionStatus transactionStatus) {
try {
boolean b = saveBatch(arrayList);
log.info("批量插入数据结果:{}", b);
transactionStatus.flush();
} catch (Exception e) {
e.printStackTrace();
log.error("批量插入数据错误:{}", e.getMessage());
transactionStatus.setRollbackOnly();
} finally {
latch.countDown();
}
}
});
});
}
try {
latch.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
long endTime = System.currentTimeMillis();
long elapsedTime = endTime - startTime;
System.out.println("数据插入执行时间: " + elapsedTime + "毫秒");
}
}
思路讲解
- 百万级数据插入&新增,肯定需要考虑分批分次,优先使用自定义线程池 ThreadPoolExecutor
- 定义及计算出分出的批次和每次提交的数据数量(需考虑数据宽度)
- 使用多线程同步类控制流程执行确保执行
- 在每个批次中启用手动提交jdbc事务,减少数据库io
参考项目
tip&思考
- 实际使用中数据数量很少会被定义的每批次条数整除,考虑结果向上取整 int batch = Math.ceil(total / size);
- 虽然分批插入数据可以减少执行时间,但是还会导致cpu飙升,实际需要根据服务器的硬件配置定义线程数量,根据数据库的接收宽度定义每批条数,在这种情况下有没有更好的方案?