ShardingJDBC源码阅读(十)数据脱敏实战

1,543 阅读12分钟

前言

本章基于shardingsphere数据加密实现原理的官方文档,根据迁移前、迁移中、迁移后三个阶段,观察sql执行的变化,深入源码分析其中的原理。

建议先阅读下面的文章:

一、准备工作

分片策略配置

根据user_id%2分库,根据user_id%4/2分表。也就是user_id=1分配到ds_1.my_user_0;user_id=2分配到ds_0.my_user_0;user_id=3分配到ds_1.my_user_1。

public DataSource getDataSource() throws SQLException {
    ShardingRuleConfiguration shardingConfig = new ShardingRuleConfiguration();
    // 表规则
    TableRuleConfiguration userTable = new TableRuleConfiguration("my_user", "ds_${0..1}.my_user_${[0, 1]}");
    // 分库策略
    ShardingStrategyConfiguration shardingDb = new InlineShardingStrategyConfiguration("id", "ds_${id%2}");
    userTable.setDatabaseShardingStrategyConfig(shardingDb);
    // 分表策略
    ShardingStrategyConfiguration shardingTable = new StandardShardingStrategyConfiguration("id", (PreciseShardingAlgorithm<Integer>) (availableTargetNames, shardingValue) -> {
        Integer l = shardingValue.getValue() % 4 / 2;
        return availableTargetNames.stream().filter(t -> t.endsWith(String.valueOf(l))).findFirst().orElse(null);
    });
    userTable.setTableShardingStrategyConfig(shardingTable);
    shardingConfig.setTableRuleConfigs(Collections.singletonList(userTable));
    // 加密策略
    EncryptRuleConfiguration encryptRuleConfig = createEncryptRuleConfig();
    shardingConfig.setEncryptRuleConfig(encryptRuleConfig);
    // 全局配置
    Properties properties = new Properties();
    properties.setProperty(ConfigurationPropertyKey.SQL_SHOW.getKey(), "true");
    // query.with.cipher.column
    properties.setProperty(ConfigurationPropertyKey.QUERY_WITH_CIPHER_COLUMN.getKey(), "false");
    return ShardingDataSourceFactory.createDataSource(createDataSourceMap(), shardingConfig, properties);
}

DDL

private static void addCipherColumn(Connection connection) throws SQLException {
  String sql = "alter table my_user add column pwd_encrypt varchar(255) not null";
  Statement statement = connection.createStatement();
  statement.execute(sql);
  statement.close();
}

加密配置

private EncryptRuleConfiguration createEncryptRuleConfig() {
  EncryptRuleConfiguration configuration = new EncryptRuleConfiguration();
  // 加密解密器
  Properties properties = new Properties();
  properties.setProperty("aes.key.value", "123456");
  EncryptorRuleConfiguration aesRuleConfiguration = new EncryptorRuleConfiguration("aes", properties);
  configuration.getEncryptors().put("user_pwd_encryptor", aesRuleConfiguration);
  // 表 - 字段 - 加密规则
  EncryptColumnRuleConfiguration pwdEncryptConfig = new EncryptColumnRuleConfiguration("pwd", "pwd_encrypt", "", "user_pwd_encryptor");
  Map<String, EncryptColumnRuleConfiguration> columns = new HashMap<>();
  columns.put("pwd", pwdEncryptConfig);// 逻辑字段 - 加密配置
  configuration.getTables().put("my_user", new EncryptTableRuleConfiguration(columns));
  return configuration;
}

二、迁移前:query.with.cipher.column=false

老用户

查询

Logic SQL: select * from my_user where id = ? 
Actual SQL: ds_1 ::: select id, pwd AS pwd from my_user_1 where id = ? ::: [3] 

Logic SQL: select id, pwd from my_user where id = ? 
Actual SQL: ds_1 ::: select id, pwd AS pwd from my_user_1 where id = ? ::: [3] 

Logic SQL: select * from my_user where pwd = ? 
Actual SQL: ds_0 ::: select id, pwd AS pwd from my_user_0 where pwd = ? ::: [1608881254186] 
Actual SQL: ds_0 ::: select id, pwd AS pwd from my_user_1 where pwd = ? ::: [1608881254186] 
Actual SQL: ds_1 ::: select id, pwd AS pwd from my_user_0 where pwd = ? ::: [1608881254186] 
Actual SQL: ds_1 ::: select id, pwd AS pwd from my_user_1 where pwd = ? ::: [1608881254186] 

结论:

  • *会被重写为id, pwd as pwd;id, pwd会被重写为pwd As pwd。
  • where查询条件不会被重写。

为什么查询字段会被重写?

回顾SQLRewriteEntry创建SQL重写上下文。

在启用加密规则配置时,SQL重写SQLRewriteEntry#createSQLRewriteContext时会做两个事情,一个是执行EncryptSQLRewriteContextDecorator把加密相关的Token生成器(SQLTokenGenerator)放入重写上下文,另一个就是生成SQLToken,而SQLToken如果实现了Substitutable接口会导致sql中原有的部分字符串被替换(见AbstractSQLBuilder#toSQL)

public SQLRewriteContext createSQLRewriteContext(final String sql, final List<Object> parameters, final SQLStatementContext sqlStatementContext, final RouteContext routeContext) {
  SQLRewriteContext result = new SQLRewriteContext(schemaMetaData, sqlStatementContext, sql, parameters);
  // 执行所有SQLRewriteContextDecorator,重写Param,创建SQLTokenGenerators
  decorate(decorators, result, routeContext);
  // 生成SQLToken
  result.generateSQLTokens();
  return result;
}

EncryptSQLRewriteContextDecorator会把EncryptProjectionTokenGenerator放入生成器列表,最终会执行EncryptProjectionTokenGenerator。

private Collection<SubstitutableColumnNameToken> generateSQLTokens(final ProjectionsSegment segment, final String tableName, 
                                                                       final SelectStatementContext selectStatementContext, final EncryptTable encryptTable) {
  Collection<SubstitutableColumnNameToken> result = new LinkedList<>();
  for (ProjectionSegment each : segment.getProjections()) {
      if (each instanceof ColumnProjectionSegment) { // id, pwd
          if (encryptTable.getLogicColumns().contains(((ColumnProjectionSegment) each).getColumn().getIdentifier().getValue())) {
              result.add(generateSQLToken((ColumnProjectionSegment) each, tableName));
          }
      }
      if (each instanceof ShorthandProjectionSegment) { // *
          result.add(generateSQLToken((ShorthandProjectionSegment) each, selectStatementContext.getProjectionsContext(), tableName, encryptTable));
      }
  }
  return result;
}

对于ShorthandProjectionSegment,如select * 会循环所有实际字段(来源于TableMetaData,注意密文字段不会存在于这个列表里,EncryptTableMetaDataDecorator在装饰TableMetaData时,排除了密文字段),普通字段会直接加入结果集,加密规则配置的逻辑字段可能会被重写。

private SubstitutableColumnNameToken generateSQLToken(final ShorthandProjectionSegment segment, 
                                                          final ProjectionsContext projectionsContext, final String tableName, final EncryptTable encryptTable) {
    ShorthandProjection shorthandProjection = getShorthandProjection(segment, projectionsContext);
    List<String> shorthandExtensionProjections = new LinkedList<>();
    // 获取所有实际的列
    for (ColumnProjection each : shorthandProjection.getActualColumns()) {
        // 如果逻辑字段与sql实际字段一致,可能会被重写
        if (encryptTable.getLogicColumns().contains(each.getName())) {
            shorthandExtensionProjections.add(new ColumnProjection(each.getOwner(), getEncryptColumnName(tableName, each.getName()), each.getName()).getExpressionWithAlias());
        } else {
        // 其他普通字段,直接add
            shorthandExtensionProjections.add(each.getExpression());
        }
    }
    return new SubstitutableColumnNameToken(segment.getStartIndex(), segment.getStopIndex(), Joiner.on(", ").join(shorthandExtensionProjections));
}

重点要看getEncryptColumnName方法,如果query.with.cipher.column为true,返回密文字段,否则返回明文字段。

private String getEncryptColumnName(final String tableName, final String logicEncryptColumnName) {
    // 根据logicEncryptColumnName逻辑列名找到原文字段
    Optional<String> plainColumn = getEncryptRule().findPlainColumn(tableName, logicEncryptColumnName);
    // 如果原文字段存在 且 query.with.cipher.column=false,返回原文字段
    // 如果原文字段不存在 或 query.with.cipher.column=true,返回密文字段
    return plainColumn.isPresent() && !queryWithCipherColumn 
    	? plainColumn.get() 
    	: getEncryptRule().getCipherColumn(tableName, logicEncryptColumnName);
}

对于ColumnProjectionSegment,如id, pwd这种。pwd会匹配加密规则的逻辑字段,会进入generateSQLToken方法。也是根据上面的getEncryptColumnName方法,生成原文或密文,当sql中没有自定义as别名时,拼上逻辑字段名。

private SubstitutableColumnNameToken generateSQLToken(final ColumnProjectionSegment segment, final String tableName) {
    String encryptColumnName = getEncryptColumnName(tableName, segment.getColumn().getIdentifier().getValue());
    if (!segment.getAlias().isPresent()) {
        // 拼接 as 逻辑列名 保证不影响客户端逻辑
        encryptColumnName += " AS " + segment.getColumn().getIdentifier().getValue();
    }
    return segment.getColumn().getOwner().isPresent() 
    	? new SubstitutableColumnNameToken(segment.getColumn().getOwner().get().getStopIndex() + 2, segment.getStopIndex(), encryptColumnName)
        : new SubstitutableColumnNameToken(segment.getStartIndex(), segment.getStopIndex(), encryptColumnName);
}

更新

Logic SQL: update my_user set pwd = ? where id = ? 
Actual SQL: ds_1 ::: update my_user_1 set pwd_encrypt = ?, pwd = ? where id = ? ::: [UVZAHLRN/LfAgbNp2BP7KA==, 1608881254186, 3] 

为什么更新字段会新增加密字段?

1、新增了SQLToken

定位到SQLTokenGenerators#generateSQLTokens看看哪个SQLTokenGenerator创建了新的SQLToken。很快定位到了EncryptAssignmentTokenGenerator

public final class EncryptAssignmentTokenGenerator extends BaseEncryptSQLTokenGenerator implements CollectionSQLTokenGenerator {
    
  @Override
  public Collection<EncryptAssignmentToken> generateSQLTokens(final SQLStatementContext sqlStatementContext) {
      Collection<EncryptAssignmentToken> result = new LinkedList<>();
      String tableName = ((TableAvailable) sqlStatementContext).getAllTables().iterator().next().getTableName().getIdentifier().getValue();
      // 循环每个update的param,比如pwd 
      for (AssignmentSegment each : getSetAssignmentSegment(sqlStatementContext.getSqlStatement()).getAssignments()) {
          // 根据逻辑名找到EncryptColumnRuleConfiguration
          // 根据EncryptColumnRuleConfiguration的encryptor找到EncryptorRuleConfiguration
          if (getEncryptRule().findEncryptor(tableName, each.getColumn().getIdentifier().getValue()).isPresent()) {
              // 创建SQLToken放入result
              generateSQLToken(tableName, each).ifPresent(result::add);
          }
      }
      return result;
  }
}

这里创建EncryptAssignmentToken分为两种情况,一种是update my_user set pwd = ?带占位符的Param模式;一种是update my_user set pwd = 'xxx'普通的Literal模式。

private Optional<EncryptAssignmentToken> generateSQLToken(final String tableName, final AssignmentSegment assignmentSegment) {
  // param
  if (assignmentSegment.getValue() instanceof ParameterMarkerExpressionSegment) {
      return Optional.of(generateParameterSQLToken(tableName, assignmentSegment));
  }
  // literal
  if (assignmentSegment.getValue() instanceof LiteralExpressionSegment) {
      return Optional.of(generateLiteralSQLToken(tableName, assignmentSegment));
  }
  return Optional.empty();
}

看一下Param模式。

private EncryptAssignmentToken generateParameterSQLToken(final String tableName, final AssignmentSegment assignmentSegment) {
  EncryptParameterAssignmentToken result = new EncryptParameterAssignmentToken(assignmentSegment.getColumn().getStartIndex(), assignmentSegment.getStopIndex());
  String columnName = assignmentSegment.getColumn().getIdentifier().getValue();
  // 加入密文列
  addCipherColumn(tableName, columnName, result);
  // 加入辅助查询列
  addAssistedQueryColumn(tableName, columnName, result);
  // 加入原文列
  addPlainColumn(tableName, columnName, result);
  return result;
}

EncryptParameterAssignmentToken是最终生成的SQLToken,重点看toString方法,会作为之后AbstractSQLBuilder#toSQL拼接sql的String。

public final class EncryptParameterAssignmentToken extends EncryptAssignmentToken {
    private final Collection<String> columnNames = new LinkedList<>();
    public EncryptParameterAssignmentToken(final int startIndex, final int stopIndex) {
        super(startIndex, stopIndex);
    }
    public void addColumnName(final String columnName) {
        columnNames.add(columnName);
    }
    // 返回pwd_encrypt = ?, pwd = ?
    @Override
    public String toString() {
        Collection<String> items = Collections2.transform(columnNames, input -> String.format("%s = ?", input));
        return Joiner.on(", ").join(items);
    }
}

2、新增了Param

SQLRewriteEntry#createSQLRewriteContext,先执行了SQLRewriteContextDecorator重写param,然后执行SQLTokenGenerators创建SQLToken。

public SQLRewriteContext createSQLRewriteContext(final String sql, final List<Object> parameters, final SQLStatementContext sqlStatementContext, final RouteContext routeContext) {
  SQLRewriteContext result = new SQLRewriteContext(schemaMetaData, sqlStatementContext, sql, parameters);
  // 执行所有SQLRewriteContextDecorator,重写Param,创建SQLTokenGenerators
  decorate(decorators, result, routeContext);
  // 生成SQLToken
  result.generateSQLTokens();
  return result;
}

EncryptSQLRewriteContextDecorator执行所有ParameterRewriter。

public final class EncryptSQLRewriteContextDecorator implements SQLRewriteContextDecorator<EncryptRule> {
    
@Override
public void decorate(final EncryptRule encryptRule, final ConfigurationProperties properties, final SQLRewriteContext sqlRewriteContext) {
    boolean isQueryWithCipherColumn = properties.<Boolean>getValue(ConfigurationPropertyKey.QUERY_WITH_CIPHER_COLUMN);
    // 构造ParameterRewriter列表
    for (ParameterRewriter each : new EncryptParameterRewriterBuilder(encryptRule, isQueryWithCipherColumn).getParameterRewriters(sqlRewriteContext.getSchemaMetaData())) {
        if (!sqlRewriteContext.getParameters().isEmpty() && each.isNeedRewrite(sqlRewriteContext.getSqlStatementContext())) {
            // 执行ParameterRewriter
            each.rewrite(sqlRewriteContext.getParameterBuilder(), sqlRewriteContext.getSqlStatementContext(), sqlRewriteContext.getParameters());
        }
    }
    sqlRewriteContext.addSQLTokenGenerators(new EncryptTokenGenerateBuilder(encryptRule, isQueryWithCipherColumn).getSQLTokenGenerators());
}

最终定位到EncryptAssignmentParameterRewriter

public final class EncryptAssignmentParameterRewriter extends EncryptParameterRewriter<SQLStatementContext> {
    
  @Override
  public void rewrite(final ParameterBuilder parameterBuilder, final SQLStatementContext sqlStatementContext, final List<Object> parameters) {
      String tableName = ((TableAvailable) sqlStatementContext).getAllTables().iterator().next().getTableName().getIdentifier().getValue();
      for (AssignmentSegment each : getSetAssignmentSegment(sqlStatementContext.getSqlStatement()).getAssignments()) {
          // 是parameter模式,且这是个加密规则配置中的逻辑字段
          if (each.getValue() instanceof ParameterMarkerExpressionSegment && getEncryptRule().findEncryptor(tableName, each.getColumn().getIdentifier().getValue()).isPresent()) {
              StandardParameterBuilder standardParameterBuilder = parameterBuilder instanceof StandardParameterBuilder
                      ? (StandardParameterBuilder) parameterBuilder : ((GroupedParameterBuilder) parameterBuilder).getParameterBuilders().get(0);
              // 执行parameters重写
              encryptParameters(standardParameterBuilder, tableName, each, parameters);
          }
      }
  }
}

encryptParameters方法重写StandardParameterBuilder里的参数列表。

private void encryptParameters(final StandardParameterBuilder parameterBuilder, final String tableName, final AssignmentSegment assignmentSegment, final List<Object> parameters) {
  String columnName = assignmentSegment.getColumn().getIdentifier().getValue();
  // pwd初始值(明文)对应下标
  int parameterMarkerIndex = ((ParameterMarkerExpressionSegment) assignmentSegment.getValue()).getParameterMarkerIndex();
  // pwd初始值(明文)
  Object originalValue = parameters.get(parameterMarkerIndex);
  // 找到 Encryptor 执行加密 得到 加密值
  Object cipherValue = getEncryptRule().getEncryptValues(tableName, columnName, Collections.singletonList(originalValue)).iterator().next();
  // 替换 明文值为加密值
  parameterBuilder.addReplacedParameters(parameterMarkerIndex, cipherValue);
  Collection<Object> addedParameters = new LinkedList<>();
  // 加入辅助查询值
  if (getEncryptRule().findAssistedQueryColumn(tableName, columnName).isPresent()) {
      Object assistedQueryValue = getEncryptRule().getEncryptAssistedQueryValues(tableName, columnName, Collections.singletonList(originalValue)).iterator().next();
      addedParameters.add(assistedQueryValue);
  }
  // 加入原文值
  if (getEncryptRule().findPlainColumn(tableName, columnName).isPresent()) {
      addedParameters.add(originalValue);
  }
  if (!addedParameters.isEmpty()) {
      parameterBuilder.addAddedParameters(parameterMarkerIndex + 1, addedParameters);
  }
}

为什么这里是替换原文值为密文值,然后判断加密规则中的原文字段是否存在,再重新加入原文值呢?而不是选择直接加入密文值呢?这个就涉及到第四步迁移后删除原文列,因为原文字段不可能永远存在。

新用户

新增

Logic SQL: insert into my_user (id, pwd) values (?, ?) 
Actual SQL: ds_0 ::: insert into my_user_1 (id, pwd_encrypt, pwd) values (?, ?, ?) ::: [2, TcZjXe0lAwrT19nIJ9+0Iw==, 1608881254477]

为什么插入语句会新增加密字段?

1、新增了Param

EncryptInsertValueParameterRewriter执行加密规则insert语句的param重写。

@Override
public void rewrite(final ParameterBuilder parameterBuilder, final InsertStatementContext insertStatementContext, final List<Object> parameters) {
    String tableName = insertStatementContext.getSqlStatement().getTable().getTableName().getIdentifier().getValue();
    // 倒序排列的字段名
    Iterator<String> descendingColumnNames = insertStatementContext.getDescendingColumnNames();
    while (descendingColumnNames.hasNext()) {
        String columnName = descendingColumnNames.next();
        getEncryptRule()
                // 找到Encryptor
                .findEncryptor(tableName, columnName)
                // 重写Param放入ParameterBuilder
                .ifPresent(encryptor -> {
                    encryptInsertValues((GroupedParameterBuilder) parameterBuilder, insertStatementContext, encryptor, tableName, columnName);
                });
    }
}

最终定位到encryptInsertValue方法,将重写参数放入StandardParameterBuilder。

private void encryptInsertValue(final Encryptor encryptor, final String tableName, final int columnIndex,
                                    final Object originalValue, final StandardParameterBuilder parameterBuilder, final String encryptLogicColumnName) {
    // FIXME: can process all part of insert value is ? or literal, can not process mix ? and literal
    // For example: values (?, ?), (1, 1) can process
    // For example: values (?, 1), (?, 2) can not process
    // 加密明文为密文放入parameterBuilder
    parameterBuilder.addReplacedParameters(columnIndex, encryptor.encrypt(originalValue));
    Collection<Object> addedParameters = new LinkedList<>();
    // 辅助查询值放入parameterBuilder
    if (encryptor instanceof QueryAssistedEncryptor) {
        Optional<String> assistedColumnName = getEncryptRule().findAssistedQueryColumn(tableName, encryptLogicColumnName);
        Preconditions.checkArgument(assistedColumnName.isPresent(), "Can not find assisted query Column Name");
        addedParameters.add(((QueryAssistedEncryptor) encryptor).queryAssistedEncrypt(originalValue.toString()));
    }
    // 明文值放入parameterBuilder
    if (getEncryptRule().findPlainColumn(tableName, encryptLogicColumnName).isPresent()) {
        addedParameters.add(originalValue);
    }
    if (!addedParameters.isEmpty()) {
        if (!parameterBuilder.getAddedIndexAndParameters().containsKey(columnIndex + 1)) {
            parameterBuilder.getAddedIndexAndParameters().put(columnIndex + 1, new LinkedList<>());
        }
        parameterBuilder.getAddedIndexAndParameters().get(columnIndex + 1).addAll(addedParameters);
    }
}

上面的源码注释看到,配置了加密规则的情况下,不支持literal和param模式混合在一个数据行中做插入。比如下面这种插入方式,这真的是个大坑。

private static void insert(Connection connection, int id, String pwd) throws SQLException {
    String sql = "insert into my_user (id, pwd) values (9998, ?)";
    PreparedStatement statement = connection.prepareStatement(sql);
    statement.setString(1, String.valueOf(System.currentTimeMillis()));
    statement.execute();
    statement.close();
}

问题就是StandardParameterBuilder#getParameters这个方法执行时,会报java.lang.IndexOutOfBoundsException异常。

在SQLRouteRewriteEngine#rewrite重写引擎执行完sql语句拼接之后(AbstractSQLBuilder#toSQL),获取重写完param列表时调用StandardParameterBuilder#getParameters这个方法。

@RequiredArgsConstructor
public final class StandardParameterBuilder implements ParameterBuilder {
    // 原始参数列表
    private final List<Object> originalParameters;
    // index - 需要添加的参数
    @Getter
    private final Map<Integer, Collection<Object>> addedIndexAndParameters = new TreeMap<>();
    // index - 需要替换的参数
    private final Map<Integer, Object> replacedIndexAndParameters = new LinkedHashMap<>();
    // 需要移除的index
    private final List<Integer> removeIndexAndParameters = new ArrayList<>();
    // 暴露给外部
    public void addAddedParameters(final int index, final Collection<Object> parameters) {
        addedIndexAndParameters.put(index, parameters);
    }
    // 暴露给外部
    public void addReplacedParameters(final int index, final Object parameter) {
        replacedIndexAndParameters.put(index, parameter);
    }
    // 暴露给外部
    public void addRemovedParameters(final int index) {
        removeIndexAndParameters.add(index);
    }
    @Override
    public List<Object> getParameters() {
        List<Object> result = new LinkedList<>(originalParameters);
        // 替换下标对应的值
        for (Entry<Integer, Object> entry : replacedIndexAndParameters.entrySet()) {
            // IndexOutOfBoundsException
            result.set(entry.getKey(), entry.getValue());
        }
        // 新增下标对应的值
        for (Entry<Integer, Collection<Object>> entry : ((TreeMap<Integer, Collection<Object>>) addedIndexAndParameters).descendingMap().entrySet()) {
            if (entry.getKey() > result.size()) {
                result.addAll(entry.getValue());
            } else {
                result.addAll(entry.getKey(), entry.getValue());
            }
        }
        // 移除下标对应的值
        for (int index : removeIndexAndParameters) {
            result.remove(index);
        }
        return result;
    }
}

2、替换了SQLToken

EncryptInsertValuesTokenGenerator创建加密insert的SQLToken。没什么特别的,就是把(?,?)改为(?,?,?)。

 @Override
public InsertValuesToken generateSQLToken(final InsertStatementContext insertStatementContext) {
    Optional<SQLToken> insertValuesToken = findPreviousSQLToken(InsertValuesToken.class);
    if (insertValuesToken.isPresent()) {
        processPreviousSQLToken(insertStatementContext, (InsertValuesToken) insertValuesToken.get());
        return (InsertValuesToken) insertValuesToken.get();
    }
    return generateNewSQLToken(insertStatementContext);
}

// 核心逻辑
private void encryptToken(final InsertValue insertValueToken, final String tableName, final InsertStatementContext insertStatementContext, final InsertValueContext insertValueContext) {
    Optional<SQLToken> useDefaultInsertColumnsToken = findPreviousSQLToken(UseDefaultInsertColumnsToken.class);
    Iterator<String> descendingColumnNames = insertStatementContext.getDescendingColumnNames();
    while (descendingColumnNames.hasNext()) {
        String columnName = descendingColumnNames.next();
        // 找到逻辑列名对应的Encryptor
        Optional<Encryptor> encryptor = getEncryptRule().findEncryptor(tableName, columnName);
        if (encryptor.isPresent()) {
            int columnIndex = useDefaultInsertColumnsToken.map(sqlToken -> ((UseDefaultInsertColumnsToken) sqlToken).getColumns().indexOf(columnName))
                    .orElseGet(() -> insertStatementContext.getColumnNames().indexOf(columnName));
            // 明文
            Object originalValue = insertValueContext.getValue(columnIndex);
            // 添加原文列
            addPlainColumn(insertValueToken, columnIndex, tableName, columnName, insertValueContext, originalValue);
            // 添加辅助查询列
            addAssistedQueryColumn(insertValueToken, encryptor.get(), columnIndex, tableName, columnName, insertValueContext, originalValue);
            // 设置密文列
            setCipherColumn(insertValueToken, encryptor.get(), columnIndex, insertValueContext.getValueExpressions().get(columnIndex), originalValue);
        }
    }
}

历史数据更新

sharding-sphere没有提供历史数据的脱敏方案。pwd_encrypt字段始终为空,需要刷脚本初始化。

新增数据在插入时,就通过 Apache ShardingSphere 加密为密文数据,并被存储到了 cipherColumn。而现在就需要处理历史明文存量数据。 由于Apache ShardingSphere 目前并未提供相关迁移洗数工具,此时需要业务方自行将 pwd 中的明文数据进行加密处理存储到 pwd_cipher。

三、迁移中:query.with.cipher.column=true

老用户

查询

Logic SQL: select * from my_user where id = ? 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where id = ? ::: [3] 

Logic SQL: select id, pwd from my_user where id = ? 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where id = ? ::: [3] 

Logic SQL: select * from my_user where pwd = ? 
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_0 where pwd_encrypt = ? ::: [/sRcoTnWHoY1+PT9F6tpAg==] 
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_1 where pwd_encrypt = ? ::: [/sRcoTnWHoY1+PT9F6tpAg==] 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_0 where pwd_encrypt = ? ::: [/sRcoTnWHoY1+PT9F6tpAg==] 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where pwd_encrypt = ? ::: [/sRcoTnWHoY1+PT9F6tpAg==] 

1、查询字段被重写 pwd_encrypt AS pwd

原因和前面迁移前一样,见EncryptProjectionTokenGenerator#getEncryptColumnName,由于query.with.cipher.column=true这里会返回密文字段pwd_encrypt。

private String getEncryptColumnName(final String tableName, final String logicEncryptColumnName) {
  // 根据logicEncryptColumnName逻辑列名找到原文字段
  Optional<String> plainColumn = getEncryptRule().findPlainColumn(tableName, logicEncryptColumnName);
  // 如果原文字段存在 且 query.with.cipher.column=false,返回原文字段
  // 如果原文字段不存在 或 query.with.cipher.column=true,返回密文字段
  return plainColumn.isPresent() && !queryWithCipherColumn ? plainColumn.get() : getEncryptRule().getCipherColumn(tableName, logicEncryptColumnName);
}

2、where条件被重写 where pwd_encrypt = ?

首先EncryptPredicateColumnTokenGenerator创建了SQLToken。

private Collection<SubstitutableColumnNameToken> generateSQLTokens(final SQLStatementContext sqlStatementContext, final AndPredicate andPredicate) {
  Collection<SubstitutableColumnNameToken> result = new LinkedList<>();
  for (PredicateSegment each : andPredicate.getPredicates()) {
      Optional<EncryptTable> encryptTable = findEncryptTable(sqlStatementContext, each);
      if (!encryptTable.isPresent() || !encryptTable.get().findEncryptor(each.getColumn().getIdentifier().getValue()).isPresent()) {
          continue;
      }
      int startIndex = each.getColumn().getOwner().isPresent() ? each.getColumn().getOwner().get().getStopIndex() + 2 : each.getColumn().getStartIndex();
      int stopIndex = each.getColumn().getStopIndex();
      // query.with.cipher.column=false 这里会取原文字段
      if (!queryWithCipherColumn) { 
          Optional<String> plainColumn = encryptTable.get().findPlainColumn(each.getColumn().getIdentifier().getValue());
          if (plainColumn.isPresent()) {
              result.add(new SubstitutableColumnNameToken(startIndex, stopIndex, plainColumn.get()));
              continue;
          }
      }
      // query.with.cipher.column=true
      // 判断辅助查询列是否匹配当前PredicateSegment断言,如果是辅助查询列则返回辅助查询列
      Optional<String> assistedQueryColumn = encryptTable.get().findAssistedQueryColumn(each.getColumn().getIdentifier().getValue());
      SubstitutableColumnNameToken encryptColumnNameToken = assistedQueryColumn.map(columnName -> new SubstitutableColumnNameToken(startIndex, stopIndex, columnName))
              // 否则返回密文列
              .orElseGet(() -> new SubstitutableColumnNameToken(startIndex, stopIndex, encryptTable.get().getCipherColumn(each.getColumn().getIdentifier().getValue())));
      result.add(encryptColumnNameToken);
  }
  return result;
}

其次EncryptPredicateParameterRewriter将查询参数重写为密文。

@Setter
public final class EncryptPredicateParameterRewriter extends EncryptParameterRewriter<SQLStatementContext> implements SchemaMetaDataAware, QueryWithCipherColumnAware {
  private SchemaMetaData schemaMetaData;
  private boolean queryWithCipherColumn;
  @Override
  public void rewrite(final ParameterBuilder parameterBuilder, final SQLStatementContext sqlStatementContext, final List<Object> parameters) {
      // 创建EncryptCondition,类似ShardingCondition,表示某个表某个字段的 = 或 in 断言
      List<EncryptCondition> encryptConditions = new EncryptConditionEngine(getEncryptRule(), schemaMetaData).createEncryptConditions(sqlStatementContext);
      if (encryptConditions.isEmpty()) {
          return;
      }
      for (EncryptCondition each : encryptConditions) {
          if (queryWithCipherColumn) {
              // 加密原始查询参数
              List<Object> encryptedValues = getEncryptedValues(each, each.getValues(parameters));
              // parameterBuilder替换查询参数
              encryptParameters(parameterBuilder, each.getPositionIndexMap(), encryptedValues);
          }
      }
  }
}

EncryptConditionEngine#createEncryptCondition创建EncryptCondition,不支持between...and...比较。

private Optional<EncryptCondition> createEncryptCondition(final PredicateSegment predicateSegment, final String tableName) {
  // 比较运算符 如 <> != =
  if (predicateSegment.getRightValue() instanceof PredicateCompareRightValue) {
      PredicateCompareRightValue compareRightValue = (PredicateCompareRightValue) predicateSegment.getRightValue();
      return isSupportedOperator(compareRightValue.getOperator()) ? createCompareEncryptCondition(tableName, predicateSegment, compareRightValue) : Optional.empty();
  }
  // in
  if (predicateSegment.getRightValue() instanceof PredicateInRightValue) {
      return createInEncryptCondition(tableName, predicateSegment, (PredicateInRightValue) predicateSegment.getRightValue());
  }
   // 不支持between and
  if (predicateSegment.getRightValue() instanceof PredicateBetweenRightValue) {
      throw new ShardingSphereException("The SQL clause 'BETWEEN...AND...' is unsupported in encrypt rule.");
  }
  return Optional.empty();
}

EncryptPredicateParameterRewriter#getEncryptedValues优先使用辅助查询值作为查询条件值。

private List<Object> getEncryptedValues(final EncryptCondition encryptCondition, final List<Object> originalValues) {
  String tableName = encryptCondition.getTableName();
  String columnName = encryptCondition.getColumnName();
  return getEncryptRule().findAssistedQueryColumn(tableName, columnName).isPresent()
          // 如果辅助查询存在,使用辅助查询值
          ? getEncryptRule().getEncryptAssistedQueryValues(tableName, columnName, originalValues)
          // 否则使用加密值
          : getEncryptRule().getEncryptValues(tableName, columnName, originalValues);
}

EncryptPredicateParameterRewriter#encryptParameters,还是将重写参数通过addReplacedParameters方法放入StandardParameterBuilder。

private void encryptParameters(final ParameterBuilder parameterBuilder, final Map<Integer, Integer> positionIndexes, final List<Object> encryptValues) {
    if (!positionIndexes.isEmpty()) {
        for (Entry<Integer, Integer> entry : positionIndexes.entrySet()) {
            ((StandardParameterBuilder) parameterBuilder).addReplacedParameters(entry.getValue(), encryptValues.get(entry.getKey()));
        }
    }
}

更新

Logic SQL: update my_user set pwd = ? where id = ? 
Actual SQL: ds_1 ::: update my_user_1 set pwd_encrypt = ?, pwd = ? where id = ? ::: [b9ValhsVZfJ2Mu7GKmsHag==, 1609060586708, 3]

和开启query.with.cipher.column之前一样,EncryptAssignmentTokenGenerator创建SQLToken,EncryptAssignmentParameterRewriter新增了加密参数,SQL没发生变化。

新用户

新增

Logic SQL: insert into my_user (id, pwd) values (?, ?) 
Actual SQL: ds_0 ::: insert into my_user_0 (id, pwd_encrypt, pwd) values (?, ?, ?) ::: [5140, NHj62xHlEngXKI9YV7twyg==, 1609061001092] 

和开启query.with.cipher.column之前一样,EncryptInsertValueParameterRewriter新增加密参数,EncryptInsertValuesTokenGenerator创建SQLToken。

四、迁移后

配置去除明文列

注意这里EncryptColumnRuleConfiguration配置的第一个参数,原来是pwd代表数据表实际的原文字段,这个时候去除了。

private EncryptRuleConfiguration createEncryptRuleConfig2() {
  EncryptRuleConfiguration configuration = new EncryptRuleConfiguration();
  // 加密解密器
  Properties properties = new Properties();
  properties.setProperty("aes.key.value", "123456");
  EncryptorRuleConfiguration aesRuleConfiguration = new EncryptorRuleConfiguration("aes", properties);
  configuration.getEncryptors().put("user_pwd_encryptor", aesRuleConfiguration);
  // 表 - 字段 - 加密规则 注意已经去除了pwdEncryptConfig的原文字段pwd
  EncryptColumnRuleConfiguration pwdEncryptConfig = new EncryptColumnRuleConfiguration(null, "pwd_encrypt", "", "user_pwd_encryptor");
  Map<String, EncryptColumnRuleConfiguration> columns = new HashMap<>();
  columns.put("pwd", pwdEncryptConfig);// 逻辑字段 - 加密配置
  configuration.getTables().put("my_user", new EncryptTableRuleConfiguration(columns));
  return configuration;
}

DDL删除明文字段

private static void dropPlainPwdColumn(Connection connection) throws SQLException {
  String sql = "alter table my_user drop column pwd";
  Statement statement = connection.createStatement();
  statement.execute(sql);
  statement.close();
}

增改查

Logic SQL: select * from my_user where id = ? 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where id = ? ::: [3] 

Logic SQL: select id, pwd from my_user where id = ? 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where id = ? ::: [3] 

Logic SQL: update my_user set pwd = ? where id = ? 
Actual SQL: ds_1 ::: update my_user_1 set pwd_encrypt = ? where id = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==, 3] 
Logic SQL: select * from my_user where pwd = ? 
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_0 where pwd_encrypt = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==] 
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_1 where pwd_encrypt = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==] 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_0 where pwd_encrypt = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==] 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where pwd_encrypt = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==] 

Logic SQL: insert into my_user (id, pwd) values (?, ?) 
Actual SQL: ds_1 ::: insert into my_user_1 (id, pwd_encrypt) values (?, ?) ::: [9343, enAqjGiquCRS7lhheLgmnA==] 

对于select没有变化。

对于insert,比如EncryptInsertValuesTokenGenerator#addPlainColumn在创建SQLToken时,判断原文字段配置不存在,就不会生成原文字段SQLToken。

private void addPlainColumn(final InsertValue insertValueToken, final int columnIndex,
                          final String tableName, final String columnName, final InsertValueContext insertValueContext, final Object originalValue) {
  if (getEncryptRule().findPlainColumn(tableName, columnName).isPresent()) {
      DerivedSimpleExpressionSegment derivedExpressionSegment = insertValueContext.getParameters().isEmpty()
              ? new DerivedLiteralExpressionSegment(originalValue) : new DerivedParameterMarkerExpressionSegment(getParameterIndexCount(insertValueToken));
      insertValueToken.getValues().add(columnIndex + 1, derivedExpressionSegment);
  }
}

同样的,EncryptInsertValueParameterRewriter#encryptInsertValue重写参数列表时,也因为找不到原文字段配置,不会把原文参数放入参数列表。

private void encryptInsertValue(final Encryptor encryptor, final String tableName, final int columnIndex,
                                final Object originalValue, final StandardParameterBuilder parameterBuilder, final String encryptLogicColumnName) {
    // 加密明文为密文放入parameterBuilder
    parameterBuilder.addReplacedParameters(columnIndex, encryptor.encrypt(originalValue));
    Collection<Object> addedParameters = new LinkedList<>();
    // 辅助查询值放入parameterBuilder
    if (encryptor instanceof QueryAssistedEncryptor) {
        Optional<String> assistedColumnName = getEncryptRule().findAssistedQueryColumn(tableName, encryptLogicColumnName);
        Preconditions.checkArgument(assistedColumnName.isPresent(), "Can not find assisted query Column Name");
        addedParameters.add(((QueryAssistedEncryptor) encryptor).queryAssistedEncrypt(originalValue.toString()));
    }
    // 这里找不到原文字段配置,不会放入结果集
    if (getEncryptRule().findPlainColumn(tableName, encryptLogicColumnName).isPresent()) {
        addedParameters.add(originalValue);
    }
    if (!addedParameters.isEmpty()) {
        if (!parameterBuilder.getAddedIndexAndParameters().containsKey(columnIndex + 1)) {
            parameterBuilder.getAddedIndexAndParameters().put(columnIndex + 1, new LinkedList<>());
        }
        parameterBuilder.getAddedIndexAndParameters().get(columnIndex + 1).addAll(addedParameters);
    }
}

对于update也是一样,见EncryptAssignmentTokenGenerator#addPlainColumn和EncryptAssignmentParameterRewriter#encryptParameters。

四、辅助查询列

上面一直提到辅助查询列,为什么需要这个字段呢?主要是有这种需求:

即使是相同的数据,如两个用户的密码相同,它们在数据库里存储的加密数据也应当是不一样的。这种理念更有利于保护用户信息,防止撞库成功。

针对于这种需求,org.apache.shardingsphere.encrypt.strategy.spi.Encryptor有一个子接口org.apache.shardingsphere.encrypt.strategy.spi.QueryAssistedEncryptor会提供辅助查询方法,将明文值转换为辅助查询值去查询辅助查询字段(言外之意,只有where条件中有加密配置的逻辑字段限制条件时,这个才有用,如果平常业务用不到,只需要实现一个Encryptor保证可逆且同样的原文密文不同即可)。

public interface QueryAssistedEncryptor extends Encryptor {
    String queryAssistedEncrypt(String plaintext);
}

它提供三种函数进行实现,分别是encrypt(), decrypt(), queryAssistedEncrypt()。在encrypt()阶段,用户通过设置某个变动种子,例如时间戳。 针对原始数据+变动种子组合的内容进行加密,就能保证即使原始数据相同,也因为有变动种子的存在,致使加密后的加密数据是不一样的。在decrypt()可依据之前规定的加密算法,利用种子数据进行解密。

由于queryAssistedEncrypt()和encrypt()产生不同加密数据进行存储,而decrypt()可逆,queryAssistedEncrypt()不可逆。 在查询原始数据的时候,我们会自动对SQL进行解析、改写、路由,利用辅助查询列进行WHERE条件的查询,却利用 decrypt()对encrypt()加密后的数据进行解密,并将原始数据返回给用户。 这一切都是对用户透明化的。

下面是根据官方文档实现的案例,AES加密方法带上时间戳随机种子保证落库数据不一样,AES解密过后刨除随机种子返回给用户,queryAssistedEncrypt辅助查询使用MD5算法保证where查询。

QueryAssistedEncryptor实现如下,记得META-INFO/services下要通过SPI注入:

public class CustomQueryAssistedEncryptor implements QueryAssistedEncryptor {
	// 委托AES执行encrypt和decrypt
    private final Encryptor aesEncryptor;
    // 委托MD5执行queryAssistedEncrypt
    private final Encryptor digestEncryptor;
    private Properties properties;
	// 随机种子长度
    private final int seedLength = String.valueOf(System.currentTimeMillis()).length();

    public CustomQueryAssistedEncryptor() {
        this.aesEncryptor = new AESEncryptor();
        this.digestEncryptor = new MD5Encryptor();
    }

    @Override
    public String queryAssistedEncrypt(String plaintext) {
        String salt = this.getProperties().getProperty("md5.salt");
        return digestEncryptor.encrypt(plaintext + salt);
    }

    @Override
    public void init() {
    }

    @Override
    public String encrypt(Object plaintext) {
        return aesEncryptor.encrypt(plaintext + String.valueOf(System.currentTimeMillis()));
    }

    @Override
    public Object decrypt(String ciphertext) {
        String decrypt = (String) aesEncryptor.decrypt(ciphertext);
        return decrypt.substring(0, decrypt.length() - seedLength);
    }

    @Override
    public String getType() {
        return "CustomQueryAssistedEncryptor";
    }

    @Override
    public Properties getProperties() {
        return this.properties;
    }

    @Override
    public void setProperties(Properties properties) {
        this.properties = properties;
        aesEncryptor.setProperties(properties);
        digestEncryptor.setProperties(properties);
    }
}

加密配置如下:

 private EncryptRuleConfiguration createEncryptRuleConfig() {
    EncryptRuleConfiguration configuration = new EncryptRuleConfiguration();
    // 加密解密器
    Properties properties = new Properties();
    properties.setProperty("aes.key.value", "123456");
    properties.setProperty("md5.salt", "abcdef");
    EncryptorRuleConfiguration aesRuleConfiguration = new EncryptorRuleConfiguration("CustomQueryAssistedEncryptor", properties);
    configuration.getEncryptors().put("user_pwd_encryptor", aesRuleConfiguration);
    // 表 - 字段 - 加密规则
    EncryptColumnRuleConfiguration encryptConfig = new EncryptColumnRuleConfiguration(null, "pwd_encrypt", "pwd_assist", "user_pwd_encryptor");
    Map<String, EncryptColumnRuleConfiguration> columns = new HashMap<>();
    columns.put("pwd", encryptConfig);// 逻辑字段 - 加密配置
    configuration.getTables().put("my_user_assist", new EncryptTableRuleConfiguration(columns));
    return configuration;
}

对应DDL:

create table my_user_assist (id bigint(20) not null primary key, pwd_encrypt varchar(255) not null, pwd_assist varchar(255) not null)

增改查重写结果:

// 插入两个用户,密码都是123456,pwd_encrypt不同,pwd_assist相同
Logic SQL: insert into my_user_assist (id, pwd) values (?, ?) 
Actual SQL: ds_1 ::: insert into my_user_assist_1 (id, pwd_encrypt, pwd_assist) values (?, ?, ?) ::: [3, 1d/2wa8LM9FlepSZPmy3EC9HI2/xGzx9sq0qEBVSvC4=, 6f3b8ded65bd7a4db11625ac84e579bb] 
Logic SQL: insert into my_user_assist (id, pwd) values (?, ?) 
Actual SQL: ds_0 ::: insert into my_user_assist_0 (id, pwd_encrypt, pwd_assist) values (?, ?, ?) ::: [4, NPdqE1U22vN2awqLfcQGVmq90U7dDOEve/JY/7PbzCQ=, 6f3b8ded65bd7a4db11625ac84e579bb] 

Logic SQL: select * from my_user_assist where id = ? 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_assist_1 where id = ? ::: [3] 

Logic SQL: select id, pwd from my_user_assist where id = ? 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_assist_1 where id = ? ::: [3] 

Logic SQL: update my_user_assist set pwd = ? where id = ? 
Actual SQL: ds_1 ::: update my_user_assist_1 set pwd_encrypt = ?, pwd_assist = ? where id = ? ::: [4KpzieXQFNUpjdBToqGmYt1os93mcmEDK2zLftdNxIE=, 6f3b8ded65bd7a4db11625ac84e579bb, 3] 

Logic SQL: select * from my_user_assist where pwd = ? 
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_assist_0 where pwd_assist = ? ::: [6f3b8ded65bd7a4db11625ac84e579bb] 
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_assist_1 where pwd_assist = ? ::: [6f3b8ded65bd7a4db11625ac84e579bb] 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_assist_0 where pwd_assist = ? ::: [6f3b8ded65bd7a4db11625ac84e579bb] 
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_assist_1 where pwd_assist = ? ::: [6f3b8ded65bd7a4db11625ac84e579bb] 

总结

  • sharding-jdbc的数据脱敏功能点的主要逻辑都集中在SQL重写这一步骤,而且是集中在SQLRewriteEntry#createSQLRewriteContext方法的第一步和第二步。

    • 第一步,由EncryptSQLRewriteContextDecorator执行ParameterRewriter重写Parameter。
    • 第二步,由EncryptSQLRewriteContextDecorator创建的SQLTokenGenerator,创建SQLToken。
  • query.with.cipher.column代表是否启用密文查询,如果是,即select字段需要查询密文字段出来后解密,where条件需要加密后再执行查询。

  • 需要注意的一个坑,只要配置了EncryptRuleConfiguration加密配置,要保证insert时要么全是问号占位符,要么全是死值。即不能有这种sql:insert into table (id, pwd, deleted) values (?, ?, 0),否则会报错。

  • 辅助查询列。如果同样的明文需要生成不同的密文落库,且又需要通过这个逻辑列做where条件查询,需要用到辅助查询列,需要用户自己实现QueryAssistedEncryptor接口。