ShardingSphere从零单排

0 阅读12分钟

ShardingSphere

ShardingSphere的核心价值在于对业务透明的数据库中间件能力,通过JDBC驱动或代理层的形式,让应用像使用单库单表一样使用分库分表,同时提供读写分离、分布式事务、数据脱敏等企业级功能。理解其SQL解析→路由→改写→执行→归并的执行流程,是掌握ShardingSphere的关键。

一、分库分表背景与问题

1.1 为什么要分库分表

graph TB
    subgraph 单体数据库瓶颈
        A[单体数据库] -->|用户量增长| B["连接数耗尽<br/>max_connections"]
        A -->|数据量增长| C[单表数据过大<br/>查询慢/索引膨胀]
        A -->|并发增长| D[CPU/IO瓶颈<br/>TPS/QPS受限]
        A -->|存储增长| E[磁盘空间不足<br/>备份恢复慢]
    end
    
    subgraph 分库分表解决
        F[分库分表] -->|水平拆分| G[数据分散到多库多表]
        F -->|读写分离| H[读流量分散到从库]
        F -->|垂直拆分| I["业务解耦,冷热分离"]
    end
    
    A -.->|演进| F
    
    style A fill:#ffcdd2
    style F fill:#c8e6c9
    style G fill:#e3f2fd
    style H fill:#fff3e0
    style I fill:#fce4ec

分库分表的核心驱动力:

问题类型具体表现分库分表解决方案
单库连接数瓶颈MySQL默认max_connections=151,高并发时连接耗尽分库分散连接压力
单表数据量过大单表超过500万行,查询性能急剧下降水平分表,数据分散
写入性能瓶颈单库写入TPS有上限(约2000-3000)分库提升写入并发
存储容量瓶颈单库磁盘空间受限分库存储扩展
业务耦合多业务共用一个库,互相影响垂直分库,业务隔离

1.2 分库分表的挑战

graph TB
    subgraph 分库分表带来的问题
        A[SQL路由] -->|如何确定SQL该发到哪个库/表| B[分片键选择<br/>路由算法]
        C[分布式事务] -->|跨库操作如何保证ACID| D[最终一致性<br/>补偿机制]
        E[全局ID] -->|自增ID在分片后冲突| F[雪花算法<br/>UUID]
        G[跨分片查询] -->|聚合/排序/分页| H[结果集合并<br/>性能问题]
        I[数据迁移] -->|如何平滑扩容| J[双写方案<br/>增量同步]
        K[运维复杂度] -->|多实例管理| L[监控/备份/故障恢复]
    end
    
    style A fill:#ffcdd2
    style C fill:#ffcdd2
    style E fill:#ffcdd2
    style G fill:#ffcdd2
    style I fill:#ffcdd2
    style K fill:#ffcdd2

二、分库分表核心概念

2.1 垂直拆分 vs 水平拆分

image.png

垂直拆分 vs 水平拆分对比:

维度垂直拆分水平拆分
拆分依据字段/业务数据行
数据结构不同相同
解决问题IO竞争、字段过多数据量、并发写入
复杂度较低较高
JOIN问题跨库JOIN跨分片聚合
扩容方式按业务扩容按数据量扩容

2.2 水平分片核心概念

graph TB
    subgraph 水平分片架构
        A[逻辑表: t_order] -->|分片规则| B[分片键: order_id]
        B -->|路由算法| C[ds_0.t_order_0]
        B -->|路由算法| D[ds_0.t_order_1]
        B -->|路由算法| E[ds_1.t_order_0]
        B -->|路由算法| F[ds_1.t_order_1]
        
        G[数据源: ds_0] --> C
        G --> D
        H[数据源: ds_1] --> E
        H --> F
    end
    
    style A fill:#e3f2fd
    style B fill:#fff3e0
    style G fill:#c8e6c9
    style H fill:#c8e6c9

核心术语:

术语定义示例
逻辑表面向应用的数据表名t_order
真实表数据库中实际存在的表t_order_0, t_order_1
数据节点真实表所在的数据源和表ds_0.t_order_0
分片键用于分片的数据库字段order_id
分片算法如何将数据分布到各分片hash(order_id) % 4
绑定表具有相同分片规则的关联表t_ordert_order_item
广播表每个数据源都存在的表t_config

三、ShardingSphere 概述

3.1 ShardingSphere是什么

graph TB
    subgraph ShardingSphere生态
        A[Apache ShardingSphere] --> B["ShardingSphere-JDBC<br/>Java驱动,直连模式"]
        A --> C["ShardingSphere-Proxy<br/>代理服务,独立部署"]
        A --> D["ShardingSphere-Sidecar<br/>云原生,Sidecar模式"]
        
        B -->|定位| B1[轻量级Java框架<br/>与业务应用一起部署]
        C -->|定位| C1[数据库代理<br/>类似MyCat<br/>支持任意语言]
        D -->|定位| D1["Kubernetes Sidecar<br/>云原生架构"]
    end
    
    style A fill:#e3f2fd
    style B fill:#c8e6c9
    style C fill:#fff3e0
    style D fill:#fce4ec

三种接入方式对比:

特性ShardingSphere-JDBCShardingSphere-ProxyShardingSphere-Sidecar
部署方式嵌入应用独立服务Kubernetes Sidecar
性能最高(无网络转发)较高(多一层代理)较高
异构语言仅Java任意语言任意语言
运维成本
适用场景Java应用、高性能要求多语言、集中管理云原生架构
连接数消耗数据库连接共享连接池共享连接池

3.2 发展历程

graph LR
    A[2016<br/>Sharding-JDBC 1.x] --> B[2018<br/>ShardingSphere 3.x<br/>Apache孵化]
    B --> C[2020<br/>ShardingSphere 4.x<br/>Apache顶级项目]
    C --> D[2021+<br/>ShardingSphere 5.x<br/>可插拔架构]
    D --> E[2024+<br/>ShardingSphere 5.5+<br/>生态完善]
    
    style A fill:#e3f2fd
    style B fill:#c8e6c9
    style C fill:#fff3e0
    style D fill:#fce4ec
    style E fill:#e8f5e9

3.3 核心功能

graph TB
    subgraph ShardingSphere功能全景
        A[数据分片] --> A1[分库分表]
        A --> A2[读写分离]
        A --> A3[分片策略]
        
        B[分布式事务] --> B1[XA强一致]
        B --> B2[BASE柔性事务]
        B --> B3[Seata AT]
        
        C[数据库治理] --> C1[配置动态化]
        C --> C2[编排和治理]
        C --> C3[数据脱敏]
        C --> C4[权限控制]
        
        D[可观测性] --> D1[链路追踪]
        D --> D2[指标监控]
        
        E[SQL兼容性] --> E1[MySQL/PostgreSQL/Oracle/SQLServer]
    end
    
    style A fill:#e3f2fd
    style B fill:#c8e6c9
    style C fill:#fff3e0
    style D fill:#fce4ec
    style E fill:#e8f5e9

四、ShardingSphere-JDBC 详解

4.1 架构与定位

graph TB
    subgraph ShardingSphere-JDBC架构
        A[Java应用] -->|JDBC API| B[ShardingSphere-JDBC]
        B -->|SQL解析| C[SQL Parser]
        C -->|路由| D[Router]
        D -->|改写| E[Rewriter]
        E -->|执行| F[Executor]
        F -->|结果合并| G[Merger]
        
        B -->|配置| H[ShardingRule<br/>DataSource配置]
        
        F -->|JDBC连接| I[ds_0<br/>MySQL]
        F -->|JDBC连接| J[ds_1<br/>MySQL]
        F -->|JDBC连接| K[ds_2<br/>MySQL]
    end
    
    style A fill:#e3f2fd
    style B fill:#c8e6c9
    style C fill:#fff3e0
    style D fill:#fce4ec
    style E fill:#e8f5e9
    style F fill:#f3e5f5
    style G fill:#c8e6c9

SQL执行流程:

sequenceDiagram
    participant App as Java应用
    participant SS as ShardingSphere-JDBC
    participant P as SQL Parser
    participant R as Router
    participant RW as Rewriter
    participant E as Executor
    participant M as Merger
    participant DB as 数据库
    
    App->>SS: 1. 执行SQL<br/>SELECT * FROM t_order WHERE order_id = 100
    SS->>P: 2. SQL解析
    P-->>SS: 返回AST抽象语法树
    
    SS->>R: 3. 路由计算
    Note over R: order_id=100<br/>分片算法: order_id % 4 = 0<br/>路由到: ds_0.t_order_0
    R-->>SS: 路由结果: ds_0.t_order_0
    
    SS->>RW: 4. SQL改写
    Note over RW: 改写为:<br/>SELECT * FROM t_order_0 WHERE order_id = 100
    RW-->>SS: 改写后SQL
    
    SS->>E: 5. SQL执行
    E->>DB: 执行SQL
    DB-->>E: 返回结果集
    
    E->>M: 6. 结果处理
    M-->>SS: 合并结果(单分片无需合并)
    
    SS-->>App: 7. 返回结果
    
    

4.2 快速入门示例

// ========== 1. 引入依赖 ==========
// pom.xml
/*
<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>shardingsphere-jdbc-core</artifactId>
    <version>5.5.0</version>
</dependency>
*/

// ========== 2. 配置数据源(Java API方式)==========
import org.apache.shardingsphere.driver.api.ShardingSphereDataSourceFactory;
import org.apache.shardingsphere.infra.config.algorithm.ShardingSphereAlgorithmConfiguration;
import org.apache.shardingsphere.sharding.api.config.ShardingRuleConfiguration;
import org.apache.shardingsphere.sharding.api.config.rule.ShardingTableRuleConfiguration;
import org.apache.shardingsphere.sharding.api.config.strategy.keygen.KeyGenerateStrategyConfiguration;
import org.apache.shardingsphere.sharding.api.config.strategy.sharding.StandardShardingStrategyConfiguration;

import javax.sql.DataSource;
import java.sql.*;
import java.util.*;

public class ShardingSphereDemo {
    
    public static void main(String[] args) throws Exception {
        // 创建真实数据源
        Map<String, DataSource> dataSourceMap = new HashMap<>();
        
        // 配置第一个数据源 ds_0
        HikariConfig hikariConfig0 = new HikariConfig();
        hikariConfig0.setDriverClassName("com.mysql.cj.jdbc.Driver");
        hikariConfig0.setJdbcUrl("jdbc:mysql://localhost:3306/db0?useSSL=false");
        hikariConfig0.setUsername("root");
        hikariConfig0.setPassword("password");
        dataSourceMap.put("ds_0", new HikariDataSource(hikariConfig0));
        
        // 配置第二个数据源 ds_1
        HikariConfig hikariConfig1 = new HikariConfig();
        hikariConfig1.setDriverClassName("com.mysql.cj.jdbc.Driver");
        hikariConfig1.setJdbcUrl("jdbc:mysql://localhost:3306/db1?useSSL=false");
        hikariConfig1.setUsername("root");
        hikariConfig1.setPassword("password");
        dataSourceMap.put("ds_1", new HikariDataSource(hikariConfig1));
        
        // ========== 3. 配置分片规则 ==========
        ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration();
        
        // 配置t_order表的分片规则
        ShardingTableRuleConfiguration orderTableRuleConfig = 
            new ShardingTableRuleConfiguration("t_order", "ds_${0..1}.t_order_${0..1}");
        
        // 配置分库策略:根据user_id分库
        orderTableRuleConfig.setDatabaseShardingStrategy(
            new StandardShardingStrategyConfiguration("user_id", "db_inline"));
        
        // 配置分表策略:根据order_id分表
        orderTableRuleConfig.setTableShardingStrategy(
            new StandardShardingStrategyConfiguration("order_id", "table_inline"));
        
        // 配置分布式主键生成策略
        orderTableRuleConfig.setKeyGenerateStrategy(
            new KeyGenerateStrategyConfiguration("order_id", "snowflake"));
        
        shardingRuleConfig.getTables().add(orderTableRuleConfig);
        
        // 配置分片算法
        Properties dbProps = new Properties();
        dbProps.setProperty("algorithm-expression", "ds_${user_id % 2}");
        shardingRuleConfig.getShardingAlgorithms().put("db_inline", 
            new ShardingSphereAlgorithmConfiguration("INLINE", dbProps));
        
        Properties tableProps = new Properties();
        tableProps.setProperty("algorithm-expression", "t_order_${order_id % 2}");
        shardingRuleConfig.getShardingAlgorithms().put("table_inline", 
            new ShardingSphereAlgorithmConfiguration("INLINE", tableProps));
        
        // 配置雪花算法
        Properties snowflakeProps = new Properties();
        snowflakeProps.setProperty("worker-id", "0");
        shardingRuleConfig.getKeyGenerators().put("snowflake", 
            new ShardingSphereAlgorithmConfiguration("SNOWFLAKE", snowflakeProps));
        
        // ========== 4. 创建ShardingSphere数据源 ==========
        Properties props = new Properties();
        props.setProperty("sql-show", "true");  // 开启SQL日志
        
        DataSource dataSource = ShardingSphereDataSourceFactory.createDataSource(
            dataSourceMap, 
            Collections.singleton(shardingRuleConfig), 
            props
        );
        
        // ========== 5. 使用(与普通JDBC完全一致)==========
        try (Connection conn = dataSource.getConnection();
             PreparedStatement ps = conn.prepareStatement(
                 "INSERT INTO t_order (user_id, order_name, status) VALUES (?, ?, ?)")) {
            
            ps.setLong(1, 1);           // user_id = 1 → ds_1 (1 % 2 = 1)
            ps.setString(2, "订单1");
            ps.setString(3, "PAID");
            
            int rows = ps.executeUpdate();
            System.out.println("插入成功,影响行数: " + rows);
            // 输出SQL日志:
            // Logic SQL: INSERT INTO t_order (user_id, order_name, status) VALUES (?, ?, ?)
            // Actual SQL: ds_1 ::: INSERT INTO t_order_1 (user_id, order_name, status, order_id) VALUES (?, ?, ?, ?)
        }
        
        // 查询示例
        try (Connection conn = dataSource.getConnection();
             PreparedStatement ps = conn.prepareStatement(
                 "SELECT * FROM t_order WHERE user_id = ? AND order_id = ?")) {
            
            ps.setLong(1, 1);
            ps.setLong(2, 100);
            
            ResultSet rs = ps.executeQuery();
            while (rs.next()) {
                System.out.println("order_id: " + rs.getLong("order_id"));
            }
        }
    }
}

4.3 YAML配置方式(推荐)

# shardingsphere-config.yaml
mode:
  type: Standalone  # 单机模式,还有Cluster模式
  repository:
    type: File

dataSources:
  ds_0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/db0?useSSL=false&serverTimezone=UTC
    username: root
    password: password
    maximumPoolSize: 10
  
  ds_1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/db1?useSSL=false&serverTimezone=UTC
    username: root
    password: password
    maximumPoolSize: 10

rules:
  - !SHARDING
    tables:
      t_order:
        actualDataNodes: ds_${0..1}.t_order_${0..1}
        tableStrategy:
          standard:
            shardingColumn: order_id
            shardingAlgorithmName: t_order_inline
        databaseStrategy:
          standard:
            shardingColumn: user_id
            shardingAlgorithmName: ds_inline
        keyGenerateStrategy:
          column: order_id
          keyGeneratorName: snowflake
      
      # 绑定表:t_order_item与t_order使用相同的分片策略
      t_order_item:
        actualDataNodes: ds_${0..1}.t_order_item_${0..1}
        tableStrategy:
          standard:
            shardingColumn: order_id
            shardingAlgorithmName: t_order_item_inline
        databaseStrategy:
          standard:
            shardingColumn: user_id
            shardingAlgorithmName: ds_inline
    
    # 广播表:每个库都有完整数据
    broadcastTables:
      - t_config
    
    # 绑定表规则
    bindingTables:
      - t_order,t_order_item
    
    # 默认分库策略
    defaultDatabaseStrategy:
      standard:
        shardingColumn: user_id
        shardingAlgorithmName: ds_inline
    
    # 默认分表策略
    defaultTableStrategy:
      none:
    
    # 分片算法定义
    shardingAlgorithms:
      ds_inline:
        type: INLINE
        props:
          algorithm-expression: ds_${user_id % 2}
      
      t_order_inline:
        type: INLINE
        props:
          algorithm-expression: t_order_${order_id % 2}
      
      t_order_item_inline:
        type: INLINE
        props:
          algorithm-expression: t_order_item_${order_id % 2}
    
    # 分布式主键生成器
    keyGenerators:
      snowflake:
        type: SNOWFLAKE
        props:
          worker-id: 0

props:
  sql-show: true  # 打印SQL日志
// 使用YAML配置
public class YamlConfigDemo {
    public static void main(String[] args) throws Exception {
        // 加载YAML配置
        File yamlFile = new File("shardingsphere-config.yaml");
        DataSource dataSource = YamlShardingSphereDataSourceFactory.createDataSource(yamlFile);
        
        // 后续使用与普通DataSource完全一致
        try (Connection conn = dataSource.getConnection()) {
            // ...
        }
    }
}

五、分片算法详解

5.1 标准分片策略

graph TB
    subgraph 分片算法分类
        A[自动分片算法] --> A1[MOD<br/>取模分片]
        A --> A2[HASH_MOD<br/>哈希取模]
        A --> A3[VOLUME_RANGE<br/>基于数据量的范围分片]
        A --> A4[BOUNDARY_RANGE<br/>基于分片边界的范围分片]
        A --> A5[AUTO_INTERVAL<br/>自动时间段分片]
        
        B[标准分片算法] --> B1[INLINE<br/>行表达式]
        B --> B2["STANDARD<br/>标准分片(需自定义)"]
        
        C[复合分片算法] --> C1[COMPLEX_INLINE<br/>复合行表达式]
        C --> C2[COMPLEX_STANDARD<br/>复合标准分片]
        
        D[Hint分片算法] --> D1[HINT_INLINE<br/>Hint行表达式]
        
        E[自定义分片算法] --> E1[CLASS_BASED<br/>自定义类分片]
    end
    
    style A fill:#e3f2fd
    style B fill:#c8e6c9
    style C fill:#fff3e0
    style D fill:#fce4ec
    style E fill:#e8f5e9

5.2 常用分片算法对比

算法类型适用场景优点缺点
INLINE自动简单取模/范围配置简单扩容需迁移数据
MOD自动均匀分布内置算法扩容不便
HASH_MOD自动避免热点哈希打散范围查询需全分片
BOUNDARY_RANGE自动按ID范围扩容方便可能热点
AUTO_INTERVAL自动按时间分片自动创建分片仅支持时间
VOLUME_RANGE自动按数据量分片自动扩容配置复杂

5.3 分片算法配置示例

rules:
  - !SHARDING
    shardingAlgorithms:
      # 1. INLINE - 行表达式(最常用)
      order_inline:
        type: INLINE
        props:
          algorithm-expression: t_order_${order_id % 4}
          allow-range-query-with-inline-sharding: true  # 允许范围查询
      
      # 2. MOD - 取模分片
      mod_algorithm:
        type: MOD
        props:
          sharding-count: 4  # 分片数量
      
      # 3. HASH_MOD - 哈希取模
      hash_mod_algorithm:
        type: HASH_MOD
        props:
          sharding-count: 4
      
      # 4. BOUNDARY_RANGE - 边界范围分片
      boundary_range:
        type: BOUNDARY_RANGE
        props:
          sharding-ranges: 100000,200000,300000  # 分片边界
      
      # 5. AUTO_INTERVAL - 自动时间段分片
      auto_interval:
        type: AUTO_INTERVAL
        props:
          datetime-lower: 2024-01-01 00:00:00
          datetime-upper: 2025-12-31 23:59:59
          sharding-seconds: 2592000  # 一个月
      
      # 6. CLASS_BASED - 自定义分片算法
      custom_algorithm:
        type: CLASS_BASED
        props:
          strategy: standard
          algorithmClassName: com.example.CustomShardingAlgorithm

5.4 自定义分片算法

/**
 * 自定义标准分片算法
 * 实现:根据用户ID哈希分片,保证同一用户的数据在同一分片
 */
public class UserIdHashShardingAlgorithm implements StandardShardingAlgorithm<Long> {
    
    private Properties props;
    
    @Override
    public String doSharding(Collection<String> availableTargetNames, 
                            PreciseShardingValue<Long> shardingValue) {
        // 获取分片键值
        Long userId = shardingValue.getValue();
        
        // 计算哈希值
        int hash = Math.abs(userId.hashCode());
        
        // 取模得到分片索引
        int index = hash % availableTargetNames.size();
        
        // 返回对应的分片名
        return new ArrayList<>(availableTargetNames).get(index);
    }
    
    @Override
    public Collection<String> doSharding(Collection<String> availableTargetNames,
                                        RangeShardingValue<Long> shardingValue) {
        // 范围查询:返回所有分片(或根据范围优化)
        Range<Long> range = shardingValue.getValueRange();
        
        // 如果范围较小,可以计算涉及的分片
        if (range.hasLowerBound() && range.hasUpperBound()) {
            Long lower = range.lowerEndpoint();
            Long upper = range.upperEndpoint();
            
            // 根据业务逻辑判断涉及哪些分片
            // 这里简化处理,返回所有分片
            return availableTargetNames;
        }
        
        return availableTargetNames;
    }
    
    @Override
    public void init(Properties props) {
        this.props = props;
    }
    
    @Override
    public String getType() {
        return "USER_ID_HASH";
    }
}

六、分布式主键

6.1 主键生成策略

graph TB
    subgraph 分布式主键方案
        A[SNOWFLAKE<br/>雪花算法] -->|默认推荐| B[64位Long<br/>时间戳+工作节点+序列号]
        C[UUID] -->|36位字符串| D[全局唯一<br/>无序/占用大]
        E[LEAF<br/>美团开源] -->|号段模式| F[数据库批量取号]
        E -->|Snowflake模式| G[改进的雪花算法]
        H[REDIS] -->|INCR| I[原子递增<br/>依赖Redis]
    end
    
    style A fill:#c8e6c9
    style E fill:#e3f2fd
    style H fill:#fff3e0

雪花算法详解:

graph LR
    subgraph 雪花算法 64位结构
        A[1位<br/>符号位<br/>固定0] --> B[41位<br/>时间戳<br/>毫秒级]
        B --> C[10位<br/>工作节点ID<br/>5位数据中心+5位机器]
        C --> D[12位<br/>序列号<br/>每毫秒4096个]
    end
    
    Note[可用69年<br/>支持1024个节点<br/>每节点每毫秒4096个ID]
    D  --> Note
# 雪花算法配置
keyGenerators:
  snowflake:
    type: SNOWFLAKE
    props:
      worker-id: 0           # 工作节点ID(0-1023)
      max-vibration-offset: 1  # 最大抖动上限(解决时钟回拨)
      max-tolerate-time-difference-milliseconds: 10  # 最大容忍时钟回拨毫秒数

6.2 自定义主键生成器

/**
 * 自定义UUID主键生成器(演示)
 */
public class CustomUUIDKeyGenerator implements KeyGenerateAlgorithm {
    
    @Override
    public Comparable<?> generateKey() {
        // 生成无横线的UUID
        return UUID.randomUUID().toString().replace("-", "");
    }
    
    @Override
    public String getType() {
        return "CUSTOM_UUID";
    }
    
    @Override
    public Properties getProps() {
        return new Properties();
    }
    
    @Override
    public void init(Properties props) {
        // 初始化配置
    }
}

七、读写分离

7.1 读写分离架构

graph TB
    subgraph 读写分离
        A[应用] -->|读写请求| B[ShardingSphere-JDBC]
        
        B -->|写操作| C[主库 ds_master<br/>写/实时读]
        B -->|读操作| D[从库 ds_slave_0<br/>读]
        B -->|读操作| E[从库 ds_slave_1<br/>读]
        
        C -->|主从复制| D
        C -->|主从复制| E
        
        B -->|负载均衡| F[负载均衡策略]
        F -->|轮询| D
        F -->|轮询| E
    end
    
    style C fill:#ffcdd2
    style D fill:#c8e6c9
    style E fill:#c8e6c9

7.2 读写分离配置

# 读写分离配置
dataSources:
  ds_master:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    jdbcUrl: jdbc:mysql://master:3306/db?useSSL=false
    username: root
    password: password
  
  ds_slave_0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    jdbcUrl: jdbc:mysql://slave0:3306/db?useSSL=false
    username: root
    password: password
  
  ds_slave_1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    jdbcUrl: jdbc:mysql://slave1:3306/db?useSSL=false
    username: root
    password: password

rules:
  - !READWRITE_SPLITTING
    dataSources:
      readwrite_ds:
        staticStrategy:
          writeDataSourceName: ds_master
          readDataSourceNames:
            - ds_slave_0
            - ds_slave_1
        loadBalancerName: round_robin  # 负载均衡算法
    
    loadBalancers:
      round_robin:
        type: ROUND_ROBIN  # 轮询
      random:
        type: RANDOM      # 随机
      weight:
        type: WEIGHT      # 权重
        props:
          ds_slave_0: 2   # 权重2
          ds_slave_1: 1   # 权重1

读写分离+分库分表组合配置:

# 读写分离 + 分库分表组合
rules:
  - !READWRITE_SPLITTING
    dataSources:
      readwrite_ds_0:
        staticStrategy:
          writeDataSourceName: ds_master_0
          readDataSourceNames:
            - ds_slave_0_0
            - ds_slave_0_1
        loadBalancerName: round_robin
      
      readwrite_ds_1:
        staticStrategy:
          writeDataSourceName: ds_master_1
          readDataSourceNames:
            - ds_slave_1_0
            - ds_slave_1_1
        loadBalancerName: round_robin

  - !SHARDING
    tables:
      t_order:
        actualDataNodes: readwrite_ds_${0..1}.t_order_${0..1}
        # 分片策略配置...

八、分布式事务

8.1 事务类型对比

graph TB
    subgraph 分布式事务方案
        A[XA事务] -->|强一致性| B[两阶段提交<br/>性能较低]
        C[BASE事务] -->|最终一致性| D[柔性事务<br/>性能较高]
        E[Seata AT] -->|自动补偿| F[自动解析SQL生成反向SQL]
    end
    
    style A fill:#ffcdd2
    style C fill:#c8e6c9
    style E fill:#e3f2fd
事务类型一致性性能适用场景配置复杂度
LOCAL最高单分片事务
XA强一致金融、支付
BASE最终一致普通业务
Seata AT最终一致复杂业务

8.2 XA事务配置

# XA事务配置
rules:
  - !TRANSACTION
    defaultType: XA
    providerType: Atomikos  # 或 Narayana

# Java代码中使用
try (Connection conn = dataSource.getConnection()) {
    conn.setAutoCommit(false);
    
    try (PreparedStatement ps1 = conn.prepareStatement(
            "INSERT INTO t_order (user_id, order_name) VALUES (?, ?)")) {
        ps1.setLong(1, 1);
        ps1.setString(2, "订单1");
        ps1.executeUpdate();
    }
    
    try (PreparedStatement ps2 = conn.prepareStatement(
            "INSERT INTO t_order_item (order_id, item_name) VALUES (?, ?)")) {
        ps2.setLong(1, 1);
        ps2.setString(2, "商品1");
        ps2.executeUpdate();
    }
    
    conn.commit();  // XA两阶段提交
} catch (SQLException e) {
    conn.rollback();
}

8.3 BASE事务(Seata AT)

# Seata AT模式配置
rules:
  - !TRANSACTION
    defaultType: BASE
    providerType: Seata

# 需要额外配置Seata TC服务
# seata.conf
client {
    application.id = sharding-sphere
    transaction.service.group = my_test_tx_group
}
// Seata AT使用(与本地事务代码一致)
@GlobalTransactional  // Seata全局事务注解
public void createOrder(Order order) {
    // ShardingSphere自动解析SQL
    // 生成反向SQL用于回滚
    orderMapper.insert(order);
    orderItemMapper.insert(order.getItems());
    // 事务提交时,Seata自动协调各分支事务
}

九、数据脱敏

9.1 脱敏架构

graph TB
    subgraph 数据脱敏
        A[应用] -->|明文查询| B[ShardingSphere]
        B -->|改写SQL| C[数据库]
        C -->|密文返回| B
        B -->|解密/脱敏| D[明文返回应用]
        
        E[INSERT] -->|加密存储| F[数据库密文]
        G[SELECT] -->|解密/脱敏| H[应用明文/脱敏]
    end
    
    style B fill:#e3f2fd
    style F fill:#c8e6c9
    style H fill:#fff3e0

9.2 脱敏配置

rules:
  - !ENCRYPT
    tables:
      t_user:
        columns:
          # 手机号脱敏
          mobile:
            cipherColumn: mobile_cipher    # 密文字段
            assistedQueryColumn: mobile_assisted  # 辅助查询字段(用于模糊查询)
            plainColumn: mobile_plain      # 明文字段(可选,用于迁移)
            encryptorName: aes_encryptor   # 加密算法
            
          # 身份证号脱敏
          id_card:
            cipherColumn: id_card_cipher
            encryptorName: aes_encryptor
            
          # 密码脱敏(单向哈希)
          password:
            cipherColumn: password_cipher
            encryptorName: md5_encryptor
    
    encryptors:
      aes_encryptor:
        type: AES
        props:
          aes-key-value: 1234567890abcdef  # AES密钥(生产环境需安全存储)
      
      md5_encryptor:
        type: MD5  # 单向哈希,不可解密

# 查询时自动处理
# SQL: SELECT * FROM t_user WHERE mobile = '13800138000'
# 实际执行: SELECT * FROM t_user WHERE mobile_cipher = '加密后的密文'
# 返回结果自动解密:mobile = '13800138000'(或脱敏为 '138****8000')

十、ShardingSphere-Proxy 详解

10.1 架构定位

graph TB
    subgraph ShardingSphere-Proxy架构
        A[Java应用] -->|MySQL协议| B[ShardingSphere-Proxy<br/>端口3307]
        C[Python应用] -->|MySQL协议| B
        D[Go应用] -->|MySQL协议| B
        
        B -->|JDBC| E[ds_0<br/>MySQL 3306]
        B -->|JDBC| F[ds_1<br/>MySQL 3306]
        B -->|JDBC| G[ds_2<br/>MySQL 3306]
        
        H[配置中心<br/>ZooKeeper/Nacos] -->|动态配置| B
    end
    
    style B fill:#e3f2fd
    style E fill:#c8e6c9
    style F fill:#c8e6c9
    style G fill:#c8e6c9
    style H fill:#fff3e0

10.2 Proxy配置与启动

# server.yaml - Proxy服务器配置
mode:
  type: Cluster
  repository:
    type: ZooKeeper
    props:
      namespace: governance_ds
      server-lists: localhost:2181
      retryIntervalMilliseconds: 500
      timeToLiveSeconds: 60

authority:
  users:
    - user: root@%
      password: root
    - user: sharding
      password: sharding
  privilege:
    type: ALL_PERMITTED

props:
  sql-show: true
  sql-simple: true
  kernel-executor-size: 16
# config-sharding.yaml - 分片配置
databaseName: sharding_db

dataSources:
  ds_0:
    url: jdbc:mysql://127.0.0.1:3306/db0?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50
    minPoolSize: 1
  
  ds_1:
    url: jdbc:mysql://127.0.0.1:3306/db1?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    maxPoolSize: 50

rules:
  - !SHARDING
    tables:
      t_order:
        actualDataNodes: ds_${0..1}.t_order_${0..1}
        tableStrategy:
          standard:
            shardingColumn: order_id
            shardingAlgorithmName: t_order_inline
        keyGenerateStrategy:
          column: order_id
          keyGeneratorName: snowflake
    
    shardingAlgorithms:
      t_order_inline:
        type: INLINE
        props:
          algorithm-expression: t_order_${order_id % 2}
    
    keyGenerators:
      snowflake:
        type: SNOWFLAKE
# 启动Proxy
$ bin/start.sh

# 连接Proxy(像连接普通MySQL一样)
$ mysql -h 127.0.0.1 -P 3307 -u root -p

# 使用
mysql> USE sharding_db;
mysql> SELECT * FROM t_order WHERE order_id = 100;
# Proxy自动路由到对应分片

十一、ShardingSphere-JDBC vs Proxy 对比

graph TB
    subgraph 对比维度
        A[性能] -->|JDBC更高| B[JDBC: 无网络转发<br/>Proxy: 多一层代理]
        C[异构语言] -->|Proxy更好| D[JDBC: 仅Java<br/>Proxy: 任意语言]
        E[运维] -->|JDBC更简单| F[JDBC: 应用自带<br/>Proxy: 独立部署维护]
        G[连接数] -->|Proxy更优| H[JDBC: 消耗DB连接<br/>Proxy: 连接池共享]
        I[升级] -->|Proxy更灵活| J[JDBC: 应用重启<br/>Proxy: 独立升级]
        K[监控] -->|Proxy更集中| L[JDBC: 分散<br/>Proxy: 集中]
    end
    
    style B fill:#c8e6c9
    style D fill:#c8e6c9
    style F fill:#fff3e0
    style H fill:#c8e6c9
    style J fill:#c8e6c9
    style L fill:#c8e6c9
对比维度ShardingSphere-JDBCShardingSphere-Proxy
性能⭐⭐⭐⭐⭐ 最高⭐⭐⭐⭐ 较高
异构语言❌ 仅Java✅ 任意语言
连接数消耗高(每个应用连DB)低(共享连接池)
运维复杂度
升级灵活性需重启应用独立升级
集中管控
适用场景Java应用、高性能多语言、集中管理

十二、完整实战示例

12.1 电商订单系统分库分表

graph TB
    subgraph 电商订单分片设计
        A[用户请求] -->|user_id路由| B[ShardingSphere]
        
        B -->|user_id % 2 = 0| C[ds_0<br/>订单库0]
        B -->|user_id % 2 = 1| D[ds_1<br/>订单库1]
        
        C -->|order_id % 2 = 0| E[t_order_0]
        C -->|order_id % 2 = 1| F[t_order_1]
        D -->|order_id % 2 = 0| G[t_order_0]
        D -->|order_id % 2 = 1| H[t_order_1]
        
        C -->|order_id % 2 = 0| I[t_order_item_0<br/>绑定表]
        C -->|order_id % 2 = 1| J[t_order_item_1<br/>绑定表]
        
        E -.->|同库关联| I
        F -.->|同库关联| J
    end
    
    style B fill:#e3f2fd
    style C fill:#c8e6c9
    style D fill:#c8e6c9
    style E fill:#fff3e0
    style F fill:#fff3e0
    style I fill:#fce4ec
    style J fill:#fce4ec
# 电商订单系统完整配置
databaseName: ecommerce

dataSources:
  ds_0:
    url: jdbc:mysql://192.168.1.10:3306/ecommerce_0?useSSL=false
    username: root
    password: password
    maxPoolSize: 100
  
  ds_1:
    url: jdbc:mysql://192.168.1.11:3306/ecommerce_1?useSSL=false
    username: root
    password: password
    maxPoolSize: 100

rules:
  - !SHARDING
    tables:
      # 订单表:按user_id分库,按order_id分表
      t_order:
        actualDataNodes: ds_${0..1}.t_order_${0..3}
        databaseStrategy:
          standard:
            shardingColumn: user_id
            shardingAlgorithmName: db_by_user_id
        tableStrategy:
          standard:
            shardingColumn: order_id
            shardingAlgorithmName: table_by_order_id
        keyGenerateStrategy:
          column: order_id
          keyGeneratorName: snowflake
      
      # 订单项表:绑定表,与订单表相同分片策略
      t_order_item:
        actualDataNodes: ds_${0..1}.t_order_item_${0..3}
        databaseStrategy:
          standard:
            shardingColumn: user_id
            shardingAlgorithmName: db_by_user_id
        tableStrategy:
          standard:
            shardingColumn: order_id
            shardingAlgorithmName: table_by_order_id
      
      # 支付记录表:绑定表
      t_payment:
        actualDataNodes: ds_${0..1}.t_payment_${0..3}
        databaseStrategy:
          standard:
            shardingColumn: user_id
            shardingAlgorithmName: db_by_user_id
        tableStrategy:
          standard:
            shardingColumn: order_id
            shardingAlgorithmName: table_by_order_id
    
    # 绑定表:保证关联数据在同一分片
    bindingTables:
      - t_order,t_order_item,t_payment
    
    # 广播表:每个库都有完整数据
    broadcastTables:
      - t_region
      - t_product_category
    
    shardingAlgorithms:
      db_by_user_id:
        type: INLINE
        props:
          algorithm-expression: ds_${user_id % 2}
      
      table_by_order_id:
        type: INLINE
        props:
          algorithm-expression: t_order_${order_id % 4}
          # 注意:t_order_item和t_payment需要单独配置
    
    keyGenerators:
      snowflake:
        type: SNOWFLAKE
        props:
          worker-id: ${WORKER_ID}

  - !READWRITE_SPLITTING
    dataSources:
      ds_0:
        staticStrategy:
          writeDataSourceName: ds_0_master
          readDataSourceNames:
            - ds_0_slave_0
            - ds_0_slave_1
        loadBalancerName: round_robin
      
      ds_1:
        staticStrategy:
          writeDataSourceName: ds_1_master
          readDataSourceNames:
            - ds_1_slave_0
            - ds_1_slave_1
        loadBalancerName: round_robin

  - !ENCRYPT
    tables:
      t_user:
        columns:
          phone:
            cipherColumn: phone_cipher
            encryptorName: aes_encryptor
          id_card:
            cipherColumn: id_card_cipher
            encryptorName: aes_encryptor
    
    encryptors:
      aes_encryptor:
        type: AES
        props:
          aes-key-value: ${AES_KEY}

props:
  sql-show: true
  sql-federation-enabled: true  # 启用联邦查询(跨分片JOIN)

12.2 Java应用代码

@SpringBootApplication
@MapperScan("com.example.mapper")
public class EcommerceApplication {
    public static void main(String[] args) {
        SpringApplication.run(EcommerceApplication.class, args);
    }
}

// ========== Entity ==========
@Data
public class Order {
    private Long orderId;      // 分布式主键,雪花算法生成
    private Long userId;       // 分片键
    private String orderNo;
    private BigDecimal totalAmount;
    private String status;
    private LocalDateTime createTime;
}

// ========== Mapper ==========
@Mapper
public interface OrderMapper {
    
    @Insert("INSERT INTO t_order (user_id, order_no, total_amount, status) " +
            "VALUES (#{userId}, #{orderNo}, #{totalAmount}, #{status})")
    @Options(useGeneratedKeys = true, keyProperty = "orderId")
    int insert(Order order);
    
    @Select("SELECT * FROM t_order WHERE order_id = #{orderId}")
    Order selectById(@Param("orderId") Long orderId);
    
    @Select("SELECT * FROM t_order WHERE user_id = #{userId} ORDER BY create_time DESC")
    List<Order> selectByUserId(@Param("userId") Long userId);
}

// ========== Service ==========
@Service
public class OrderService {
    
    @Autowired
    private OrderMapper orderMapper;
    
    /**
     * 创建订单
     * ShardingSphere自动处理:
     * 1. 生成雪花算法order_id
     * 2. 根据user_id路由到对应库
     * 3. 根据order_id路由到对应表
     */
    @Transactional
    public Order createOrder(Long userId, BigDecimal amount) {
        Order order = new Order();
        order.setUserId(userId);
        order.setOrderNo(generateOrderNo());
        order.setTotalAmount(amount);
        order.setStatus("CREATED");
        
        orderMapper.insert(order);
        
        // orderId由雪花算法自动生成
        System.out.println("Generated orderId: " + order.getOrderId());
        
        return order;
    }
    
    /**
     * 查询订单
     * 携带分片键时,直接路由到对应分片
     */
    public Order getOrder(Long orderId) {
        return orderMapper.selectById(orderId);
    }
    
    /**
     * 查询用户订单列表
     * 使用user_id分片键,路由到对应库
     */
    public List<Order> getUserOrders(Long userId) {
        return orderMapper.selectByUserId(userId);
    }
    
    private String generateOrderNo() {
        return "ORD" + System.currentTimeMillis() + RandomUtil.randomNumbers(4);
    }
}

十三、SQL执行原理深度解析

13.1 SQL解析与路由流程

graph TB
    subgraph SQL执行全流程
        A[原始SQL] -->|1. 词法分析| B[Token序列]
        B -->|2. 语法分析| C[AST抽象语法树]
        C -->|3. 语义分析| D[SQLStatement]
        
        D -->|4. 分片上下文提取| E[ShardingContext]
        E -->|提取分片键| F[ShardingCondition]
        
        F -->|5. 路由引擎| G[RouteContext]
        G -->|计算数据节点| H[RouteUnit集合]
        
        H -->|6. SQL改写| I[RewriteContext]
        I -->|表名改写| J[真实表名]
        I -->|分页改写| K[分页参数调整]
        I -->|聚合改写| L[ORDER BY/GROUP BY补充]
        
        J -->|7. SQL执行| M[ExecutionContext]
        K --> M
        L --> M
        
        M -->|8. 结果归并| N[MergeEngine]
        N -->|遍历| O[StreamMergedResult]
        N -->|内存| P[MemoryMergedResult]
        
        O -->|9. 返回结果| Q[ResultSet]
        P --> Q
    end
    
    style A fill:#e3f2fd
    style C fill:#c8e6c9
    style G fill:#fff3e0
    style I fill:#fce4ec
    style M fill:#e8f5e9
    style N fill:#f3e5f5

13.2 SQL改写示例

-- 原始逻辑SQL
SELECT o.order_id, o.user_id, i.item_name 
FROM t_order o 
JOIN t_order_item i ON o.order_id = i.order_id 
WHERE o.user_id = 10 AND o.order_id = 100
ORDER BY o.create_time DESC 
LIMIT 10;

-- ShardingSphere改写后的实际SQL(假设user_id=10路由到ds_1,order_id=100路由到t_order_0)
-- ds_1.t_order_0 和 ds_1.t_order_item_0 执行:
SELECT o.order_id, o.user_id, i.item_name 
FROM t_order_0 o 
JOIN t_order_item_0 i ON o.order_id = i.order_id 
WHERE o.user_id = 10 AND o.order_id = 100
ORDER BY o.create_time DESC 
LIMIT 10;

-- 如果是跨分片查询(如user_id IN (10, 11)),则会在多个分片执行后归并结果
-- ds_0.t_order_0, ds_0.t_order_1, ds_1.t_order_0, ds_1.t_order_1 都执行
-- 然后归并结果

十四、性能优化与最佳实践

14.1 分片键选择原则

graph TB
    subgraph 分片键选择
        A[查询频率高] -->|优先| B[WHERE条件常用字段]
        C[数据分布均匀] -->|避免| D[热点数据]
        E[业务稳定性] -->|避免| F[频繁变更的字段]
        G[关联查询] -->|优先| H[JOIN关联字段]
    end
    
    style B fill:#c8e6c9
    style D fill:#ffcdd2
    style F fill:#ffcdd2
    style H fill:#c8e6c9

分片键选择最佳实践:

场景推荐分片键说明
用户相关表user_id用户维度查询最多
订单表user_id(分库)+ order_id(分表)按用户查询、按订单查询都支持
日志表create_time按时间范围查询、清理
商品表category_id按类目查询

14.2 避免全库表路由

// ❌  bad: 不带分片键,导致全库表路由
@Select("SELECT * FROM t_order WHERE status = 'PAID'")
List<Order> selectByStatus(String status);
// 会在所有分片执行,性能极差

// ✅ good: 携带分片键,精确路由
@Select("SELECT * FROM t_order WHERE user_id = #{userId} AND status = 'PAID'")
List<Order> selectByUserAndStatus(@Param("userId") Long userId, String status);
// 只路由到特定分片

// ✅ good: 使用绑定表进行关联查询
@Select("SELECT o.*, i.item_name FROM t_order o " +
        "JOIN t_order_item i ON o.order_id = i.order_id " +
        "WHERE o.user_id = #{userId}")
List<OrderDetail> selectOrderDetail(@Param("userId") Long userId);
// 绑定表保证同库,避免跨库JOIN

14.3 分页优化

-- ❌  bad: 深分页跨分片性能差
SELECT * FROM t_order ORDER BY create_time DESC LIMIT 1000000, 10;

-- ✅ good: 使用流式分页或限制分页深度
-- 1. 限制最大分页深度
SELECT * FROM t_order WHERE user_id = 1 
ORDER BY create_time DESC LIMIT 100, 10;

-- 2. 使用上次查询的最大ID优化
SELECT * FROM t_order 
WHERE user_id = 1 AND create_time < '2024-01-01 00:00:00'
ORDER BY create_time DESC LIMIT 10;

总结:ShardingSphere核心架构图

graph TB
    subgraph ShardingSphere全景
        A[接入层] -->|ShardingSphere-JDBC| B[Java应用内嵌]
        A -->|ShardingSphere-Proxy| C[独立代理服务]
        A -->|ShardingSphere-Sidecar| D[云原生Sidecar]
        
        E[核心功能层] -->|数据分片| F[分库分表/读写分离]
        E -->|分布式事务| G[XA/BASE/Seata]
        E -->|数据库治理| H[配置中心/注册中心]
        E -->|数据安全| I[数据脱敏/加密]
        E -->|可观测性| J[链路追踪/指标监控]
        
        K[基础设施层] -->|存储| L[MySQL/PostgreSQL/Oracle]
        K -->|协调| M[ZooKeeper/Nacos/Etcd]
        K -->|事务协调| N[Seata TC]
    end
    
    style A fill:#e3f2fd
    style E fill:#c8e6c9
    style K fill:#fff3e0