sharding-jdbc 4.1.x使用过程以及踩坑记录

566 阅读2分钟

1.背景

使用的版本是4.1.x,本人技术比较菜轻喷

由于数据量较大,但是领导又不给使用tidb等时序数据库,所以按照了日分表,由于是在阿里云买的mysql,自带读写分离,这里先不考虑读写分离

2.分片策略

精确分片(按天):

...
    @Override
    public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<Date> shardingValue) {
        return shardingValue.getLogicTableName() + DateUtil.format(shardingValue.getValue(), TABLE_NAME_SPILT);
}
...

范围分片(按天):

...
    @Override
    public Collection<String> doSharding(Collection<String> availableTargetNames, RangeShardingValue<Date> rangeShardingValue) {
        Range<Date> valueRange = rangeShardingValue.getValueRange();
        Date lowerDate = valueRange.lowerEndpoint();
        Date upperDate = valueRange.upperEndpoint();
        List<String> tableNameList = new ArrayList<>();
        for (DateTime dateTime : DateUtil.rangeToList(DateUtil.beginOfDay(lowerDate), DateUtil.endOfDay(upperDate), DateField.DAY_OF_YEAR)) {
            String resultTableName = rangeShardingValue.getLogicTableName() + DateUtil.format(dateTime, "_yyyyMd");
            tableNameList.add(resultTableName);
        }
        return tableNameList;
    }
...    

3.遇到的坑

3.1 刷新表达式

按日期分表的时候,在每个月第一天24时的时候需要自动刷新sharding-jdbc的表达式:

比如2022年11月的表达式:

ds0.records_>2022..2022->{2022..2022}->{10..11}$->{1..31}

比如2022年12月的表达式:

ds0.records_>2022..2022->{2022..2022}->{11..12}$->{1..31}

private final static String TABLE_NAMES_PREFIX = "cloudhub.sharding.jdbc.sharding-table-names";

    @Override
    @SneakyThrows
    public void initialize(ConfigurableApplicationContext applicationContext) {
        ConfigurableEnvironment environment = applicationContext.getEnvironment();
        String shardingTableNames = environment.getProperty(TABLE_NAMES_PREFIX);
        if(StringUtils.isBlank(shardingTableNames)){
            //阿波罗或者nacos配置中心加载之前必为空的
            log.warn("sharding-jdbc plugin initialization failed , please check spring configs!!!");
            return;
        }
        super.init(environment, shardingTableNames);
    }
private final static String SHARDING_DATA_SOURCES_PREFIX = "spring.shardingsphere.datasource.names";

    public void init(ConfigurableEnvironment environment, String shardingTableNames){
        String dataSourceStr = environment.getProperty(SHARDING_DATA_SOURCES_PREFIX);
        if(StringUtils.isBlank(dataSourceStr)){
            return;
        }
        Set<String> dataSources = Arrays.stream(dataSourceStr.split(",")).collect(Collectors.toSet());
        Map<String, String> needUpdateProperties = this.handleShardingDataNodes(shardingTableNames, dataSources);
        this.reloadProperties(needUpdateProperties, environment);
    }

    /**
     * 处理sharding-jdbc表达式,并返回需要修改的集合
     * @param tables
     * @param dataSources
     */
    private Map<String, String> handleShardingDataNodes(String tables, Set<String> dataSources){
        //获取当前是几月(注意:这个方法获取的是从0开始算的), 目前需要保存两个月数据,保证月初数据完整
        int startMonth = DateUtil.thisMonth();
        int endMonth = DateUtil.thisMonth() + 1;
        //如果开始时间是0 其实就是1月,避免去创建0开头月份的表
        if(startMonth == 0){
            startMonth = 1;
        }
        //获取当前的年份
        int currentYear = DateUtil.thisYear();
        Integer lastDay = LocalDateTimeUtil.getMonthLastDay();
        String prefix1 = "spring.shardingsphere.sharding.tables.";
        String prefix2 = ".actual-data-nodes";
        //生成的配置样式ds0.records_$->{2022..2022}$->{10..11}$->{1..31}
        Map<String, String> needUpdateProperties = new HashMap<>();
        final int finalStartMonth = startMonth;
        for(String dataSource : dataSources){
            Arrays.stream(tables.split(",")).distinct().forEach(table -> {
                String needUpdateKey = prefix1 + table + prefix2;
                String needUpdateValue = dataSource + "." + table +
                        "_$->{" + currentYear + ".." + currentYear + "}" +
                        "$->{" + finalStartMonth + ".." + endMonth + "}" +
                        "$->{1..31}";
                needUpdateProperties.put(needUpdateKey, needUpdateValue);
            });
        }
        return needUpdateProperties;
    }

    /**
     * 修改配置,生成PropertySource
     * @param needUpdateProperties
     * @param environment
     */
    public void reloadProperties(Map<String, String> needUpdateProperties, ConfigurableEnvironment environment){
        MutablePropertySources sources = environment.getPropertySources();
        needUpdateProperties.forEach((k, v)->{
            PropertySource<?> source = sources.get(k);
            if (source == null) {
                source = new MapPropertySource(k, new HashMap<>());
                sources.addFirst(source);
            }
            ((Map<String, Object>) source.getSource()).put(k, v);
        });
    }

但是阅读了sharding-jdbc的源码,如果只是刷新表达式,内部的元数据并没有变化,但是也没找到刷新元数据的入口。原本通过元数据自动创建表只能改为通过表达式规则创建表,自动删除过期数据表也是同理,问题解决。

3.2 jdbc批量插入

首先jdbc批量插入需要增加jdbc参数:rewriteBatchedStatements=true。经过实验,当前4.1.x版本的shardingjdbc的批量插入还是很慢,于是改为先按时间分成多个list:

    /**
     * 根据年/月/日生成sql,并且返回需要批处理的数据集合
     * @param list 需要存储数据的总集合
     */
    private Map<String, List<RecordBatchSaveBO>> spiltListByDate(List<RecordBatchSaveBO> list){
        Map<String, List<RecordBatchSaveBO>> map = new LinkedHashMap<>();
        for(RecordBatchSaveBO recordBatchSaveBO : list){
            LocalDateTime createDate = recordBatchSaveBO.getModifiedDate();
            String sqlSpilt = "_" + dateTimeFormatter.format(createDate).intern();
            List<RecordBatchSaveBO> mapList = map.get(sqlSpilt);
            if(CollectionUtil.isEmpty(mapList)){
                mapList = new ArrayList<>();
            }
            mapList.add(recordBatchSaveBO);
            map.put(sqlSpilt, mapList);
        }
        return map;
    }

然后使用jdbc批量插入:

@SneakyThrows
    private void batchSaveByJDBC(String sqlSpilt, List<XXX> XXXList, Connection conn){
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        PreparedStatement preparedStatement = null;
        //需要手动提交
        try{
            String sql = config.getInsertSqlTable().intern() + sqlSpilt + config.getInsertSqlValue().intern();
            preparedStatement = conn.prepareStatement(sql);
            LocalDateTime now = LocalDateTime.now();
            for(XXX xxx : XXXList){
                ...
                preparedStatement.setLong(1, snowFlake.nextId());
                ...
            }
            preparedStatement.executeBatch();
            conn.commit();
            stopWatch.stop();
            if(log.isDebugEnabled()){
                log.debug("the current mysql batch processing insert {} records use {} ms", XXXList.size(), stopWatch.getTotalTimeMillis());
            }
        } finally {
            if(preparedStatement != null){
                preparedStatement.clearBatch();
                preparedStatement.close();
            }
        }
    }

至此问题解决:一秒大约能批量插入8500条左右(配置为mysql8 8核32g的)

DEBUG org.xxx.MysqlRecordsBatchProcessor - the current mysql batch processing insert 13514 records use 1588 ms

4.总结

因为时间紧迫做的不是太好,可能有更好的解决方法,以后再记录