1.背景
使用的版本是4.1.x,本人技术比较菜轻喷
由于数据量较大,但是领导又不给使用tidb等时序数据库,所以按照了日分表,由于是在阿里云买的mysql,自带读写分离,这里先不考虑读写分离
2.分片策略
精确分片(按天):
...
@Override
public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<Date> shardingValue) {
return shardingValue.getLogicTableName() + DateUtil.format(shardingValue.getValue(), TABLE_NAME_SPILT);
}
...
范围分片(按天):
...
@Override
public Collection<String> doSharding(Collection<String> availableTargetNames, RangeShardingValue<Date> rangeShardingValue) {
Range<Date> valueRange = rangeShardingValue.getValueRange();
Date lowerDate = valueRange.lowerEndpoint();
Date upperDate = valueRange.upperEndpoint();
List<String> tableNameList = new ArrayList<>();
for (DateTime dateTime : DateUtil.rangeToList(DateUtil.beginOfDay(lowerDate), DateUtil.endOfDay(upperDate), DateField.DAY_OF_YEAR)) {
String resultTableName = rangeShardingValue.getLogicTableName() + DateUtil.format(dateTime, "_yyyyMd");
tableNameList.add(resultTableName);
}
return tableNameList;
}
...
3.遇到的坑
3.1 刷新表达式
按日期分表的时候,在每个月第一天24时的时候需要自动刷新sharding-jdbc的表达式:
比如2022年11月的表达式:
ds0.records_->{10..11}$->{1..31}
比如2022年12月的表达式:
ds0.records_->{11..12}$->{1..31}
private final static String TABLE_NAMES_PREFIX = "cloudhub.sharding.jdbc.sharding-table-names";
@Override
@SneakyThrows
public void initialize(ConfigurableApplicationContext applicationContext) {
ConfigurableEnvironment environment = applicationContext.getEnvironment();
String shardingTableNames = environment.getProperty(TABLE_NAMES_PREFIX);
if(StringUtils.isBlank(shardingTableNames)){
//阿波罗或者nacos配置中心加载之前必为空的
log.warn("sharding-jdbc plugin initialization failed , please check spring configs!!!");
return;
}
super.init(environment, shardingTableNames);
}
private final static String SHARDING_DATA_SOURCES_PREFIX = "spring.shardingsphere.datasource.names";
public void init(ConfigurableEnvironment environment, String shardingTableNames){
String dataSourceStr = environment.getProperty(SHARDING_DATA_SOURCES_PREFIX);
if(StringUtils.isBlank(dataSourceStr)){
return;
}
Set<String> dataSources = Arrays.stream(dataSourceStr.split(",")).collect(Collectors.toSet());
Map<String, String> needUpdateProperties = this.handleShardingDataNodes(shardingTableNames, dataSources);
this.reloadProperties(needUpdateProperties, environment);
}
/**
* 处理sharding-jdbc表达式,并返回需要修改的集合
* @param tables
* @param dataSources
*/
private Map<String, String> handleShardingDataNodes(String tables, Set<String> dataSources){
//获取当前是几月(注意:这个方法获取的是从0开始算的), 目前需要保存两个月数据,保证月初数据完整
int startMonth = DateUtil.thisMonth();
int endMonth = DateUtil.thisMonth() + 1;
//如果开始时间是0 其实就是1月,避免去创建0开头月份的表
if(startMonth == 0){
startMonth = 1;
}
//获取当前的年份
int currentYear = DateUtil.thisYear();
Integer lastDay = LocalDateTimeUtil.getMonthLastDay();
String prefix1 = "spring.shardingsphere.sharding.tables.";
String prefix2 = ".actual-data-nodes";
//生成的配置样式ds0.records_$->{2022..2022}$->{10..11}$->{1..31}
Map<String, String> needUpdateProperties = new HashMap<>();
final int finalStartMonth = startMonth;
for(String dataSource : dataSources){
Arrays.stream(tables.split(",")).distinct().forEach(table -> {
String needUpdateKey = prefix1 + table + prefix2;
String needUpdateValue = dataSource + "." + table +
"_$->{" + currentYear + ".." + currentYear + "}" +
"$->{" + finalStartMonth + ".." + endMonth + "}" +
"$->{1..31}";
needUpdateProperties.put(needUpdateKey, needUpdateValue);
});
}
return needUpdateProperties;
}
/**
* 修改配置,生成PropertySource
* @param needUpdateProperties
* @param environment
*/
public void reloadProperties(Map<String, String> needUpdateProperties, ConfigurableEnvironment environment){
MutablePropertySources sources = environment.getPropertySources();
needUpdateProperties.forEach((k, v)->{
PropertySource<?> source = sources.get(k);
if (source == null) {
source = new MapPropertySource(k, new HashMap<>());
sources.addFirst(source);
}
((Map<String, Object>) source.getSource()).put(k, v);
});
}
但是阅读了sharding-jdbc的源码,如果只是刷新表达式,内部的元数据并没有变化,但是也没找到刷新元数据的入口。原本通过元数据自动创建表只能改为通过表达式规则创建表,自动删除过期数据表也是同理,问题解决。
3.2 jdbc批量插入
首先jdbc批量插入需要增加jdbc参数:rewriteBatchedStatements=true。经过实验,当前4.1.x版本的shardingjdbc的批量插入还是很慢,于是改为先按时间分成多个list:
/**
* 根据年/月/日生成sql,并且返回需要批处理的数据集合
* @param list 需要存储数据的总集合
*/
private Map<String, List<RecordBatchSaveBO>> spiltListByDate(List<RecordBatchSaveBO> list){
Map<String, List<RecordBatchSaveBO>> map = new LinkedHashMap<>();
for(RecordBatchSaveBO recordBatchSaveBO : list){
LocalDateTime createDate = recordBatchSaveBO.getModifiedDate();
String sqlSpilt = "_" + dateTimeFormatter.format(createDate).intern();
List<RecordBatchSaveBO> mapList = map.get(sqlSpilt);
if(CollectionUtil.isEmpty(mapList)){
mapList = new ArrayList<>();
}
mapList.add(recordBatchSaveBO);
map.put(sqlSpilt, mapList);
}
return map;
}
然后使用jdbc批量插入:
@SneakyThrows
private void batchSaveByJDBC(String sqlSpilt, List<XXX> XXXList, Connection conn){
StopWatch stopWatch = new StopWatch();
stopWatch.start();
PreparedStatement preparedStatement = null;
//需要手动提交
try{
String sql = config.getInsertSqlTable().intern() + sqlSpilt + config.getInsertSqlValue().intern();
preparedStatement = conn.prepareStatement(sql);
LocalDateTime now = LocalDateTime.now();
for(XXX xxx : XXXList){
...
preparedStatement.setLong(1, snowFlake.nextId());
...
}
preparedStatement.executeBatch();
conn.commit();
stopWatch.stop();
if(log.isDebugEnabled()){
log.debug("the current mysql batch processing insert {} records use {} ms", XXXList.size(), stopWatch.getTotalTimeMillis());
}
} finally {
if(preparedStatement != null){
preparedStatement.clearBatch();
preparedStatement.close();
}
}
}
至此问题解决:一秒大约能批量插入8500条左右(配置为mysql8 8核32g的)
DEBUG org.xxx.MysqlRecordsBatchProcessor - the current mysql batch processing insert 13514 records use 1588 ms
4.总结
因为时间紧迫做的不是太好,可能有更好的解决方法,以后再记录