前言
本章基于shardingsphere数据加密实现原理的官方文档,根据迁移前、迁移中、迁移后三个阶段,观察sql执行的变化,深入源码分析其中的原理。
建议先阅读下面的文章:
一、准备工作
分片策略配置
根据user_id%2分库,根据user_id%4/2分表。也就是user_id=1分配到ds_1.my_user_0;user_id=2分配到ds_0.my_user_0;user_id=3分配到ds_1.my_user_1。
public DataSource getDataSource() throws SQLException {
ShardingRuleConfiguration shardingConfig = new ShardingRuleConfiguration();
// 表规则
TableRuleConfiguration userTable = new TableRuleConfiguration("my_user", "ds_${0..1}.my_user_${[0, 1]}");
// 分库策略
ShardingStrategyConfiguration shardingDb = new InlineShardingStrategyConfiguration("id", "ds_${id%2}");
userTable.setDatabaseShardingStrategyConfig(shardingDb);
// 分表策略
ShardingStrategyConfiguration shardingTable = new StandardShardingStrategyConfiguration("id", (PreciseShardingAlgorithm<Integer>) (availableTargetNames, shardingValue) -> {
Integer l = shardingValue.getValue() % 4 / 2;
return availableTargetNames.stream().filter(t -> t.endsWith(String.valueOf(l))).findFirst().orElse(null);
});
userTable.setTableShardingStrategyConfig(shardingTable);
shardingConfig.setTableRuleConfigs(Collections.singletonList(userTable));
// 加密策略
EncryptRuleConfiguration encryptRuleConfig = createEncryptRuleConfig();
shardingConfig.setEncryptRuleConfig(encryptRuleConfig);
// 全局配置
Properties properties = new Properties();
properties.setProperty(ConfigurationPropertyKey.SQL_SHOW.getKey(), "true");
// query.with.cipher.column
properties.setProperty(ConfigurationPropertyKey.QUERY_WITH_CIPHER_COLUMN.getKey(), "false");
return ShardingDataSourceFactory.createDataSource(createDataSourceMap(), shardingConfig, properties);
}
DDL
private static void addCipherColumn(Connection connection) throws SQLException {
String sql = "alter table my_user add column pwd_encrypt varchar(255) not null";
Statement statement = connection.createStatement();
statement.execute(sql);
statement.close();
}
加密配置
private EncryptRuleConfiguration createEncryptRuleConfig() {
EncryptRuleConfiguration configuration = new EncryptRuleConfiguration();
// 加密解密器
Properties properties = new Properties();
properties.setProperty("aes.key.value", "123456");
EncryptorRuleConfiguration aesRuleConfiguration = new EncryptorRuleConfiguration("aes", properties);
configuration.getEncryptors().put("user_pwd_encryptor", aesRuleConfiguration);
// 表 - 字段 - 加密规则
EncryptColumnRuleConfiguration pwdEncryptConfig = new EncryptColumnRuleConfiguration("pwd", "pwd_encrypt", "", "user_pwd_encryptor");
Map<String, EncryptColumnRuleConfiguration> columns = new HashMap<>();
columns.put("pwd", pwdEncryptConfig);// 逻辑字段 - 加密配置
configuration.getTables().put("my_user", new EncryptTableRuleConfiguration(columns));
return configuration;
}
二、迁移前:query.with.cipher.column=false
老用户
查询
Logic SQL: select * from my_user where id = ?
Actual SQL: ds_1 ::: select id, pwd AS pwd from my_user_1 where id = ? ::: [3]
Logic SQL: select id, pwd from my_user where id = ?
Actual SQL: ds_1 ::: select id, pwd AS pwd from my_user_1 where id = ? ::: [3]
Logic SQL: select * from my_user where pwd = ?
Actual SQL: ds_0 ::: select id, pwd AS pwd from my_user_0 where pwd = ? ::: [1608881254186]
Actual SQL: ds_0 ::: select id, pwd AS pwd from my_user_1 where pwd = ? ::: [1608881254186]
Actual SQL: ds_1 ::: select id, pwd AS pwd from my_user_0 where pwd = ? ::: [1608881254186]
Actual SQL: ds_1 ::: select id, pwd AS pwd from my_user_1 where pwd = ? ::: [1608881254186]
结论:
- *会被重写为id, pwd as pwd;id, pwd会被重写为pwd As pwd。
- where查询条件不会被重写。
为什么查询字段会被重写?
回顾SQLRewriteEntry创建SQL重写上下文。
在启用加密规则配置时,SQL重写SQLRewriteEntry#createSQLRewriteContext时会做两个事情,一个是执行EncryptSQLRewriteContextDecorator把加密相关的Token生成器(SQLTokenGenerator)放入重写上下文,另一个就是生成SQLToken,而SQLToken如果实现了Substitutable接口会导致sql中原有的部分字符串被替换(见AbstractSQLBuilder#toSQL)
public SQLRewriteContext createSQLRewriteContext(final String sql, final List<Object> parameters, final SQLStatementContext sqlStatementContext, final RouteContext routeContext) {
SQLRewriteContext result = new SQLRewriteContext(schemaMetaData, sqlStatementContext, sql, parameters);
// 执行所有SQLRewriteContextDecorator,重写Param,创建SQLTokenGenerators
decorate(decorators, result, routeContext);
// 生成SQLToken
result.generateSQLTokens();
return result;
}
EncryptSQLRewriteContextDecorator会把EncryptProjectionTokenGenerator放入生成器列表,最终会执行EncryptProjectionTokenGenerator。
private Collection<SubstitutableColumnNameToken> generateSQLTokens(final ProjectionsSegment segment, final String tableName,
final SelectStatementContext selectStatementContext, final EncryptTable encryptTable) {
Collection<SubstitutableColumnNameToken> result = new LinkedList<>();
for (ProjectionSegment each : segment.getProjections()) {
if (each instanceof ColumnProjectionSegment) { // id, pwd
if (encryptTable.getLogicColumns().contains(((ColumnProjectionSegment) each).getColumn().getIdentifier().getValue())) {
result.add(generateSQLToken((ColumnProjectionSegment) each, tableName));
}
}
if (each instanceof ShorthandProjectionSegment) { // *
result.add(generateSQLToken((ShorthandProjectionSegment) each, selectStatementContext.getProjectionsContext(), tableName, encryptTable));
}
}
return result;
}
对于ShorthandProjectionSegment,如select * 会循环所有实际字段(来源于TableMetaData,注意密文字段不会存在于这个列表里,EncryptTableMetaDataDecorator在装饰TableMetaData时,排除了密文字段),普通字段会直接加入结果集,加密规则配置的逻辑字段可能会被重写。
private SubstitutableColumnNameToken generateSQLToken(final ShorthandProjectionSegment segment,
final ProjectionsContext projectionsContext, final String tableName, final EncryptTable encryptTable) {
ShorthandProjection shorthandProjection = getShorthandProjection(segment, projectionsContext);
List<String> shorthandExtensionProjections = new LinkedList<>();
// 获取所有实际的列
for (ColumnProjection each : shorthandProjection.getActualColumns()) {
// 如果逻辑字段与sql实际字段一致,可能会被重写
if (encryptTable.getLogicColumns().contains(each.getName())) {
shorthandExtensionProjections.add(new ColumnProjection(each.getOwner(), getEncryptColumnName(tableName, each.getName()), each.getName()).getExpressionWithAlias());
} else {
// 其他普通字段,直接add
shorthandExtensionProjections.add(each.getExpression());
}
}
return new SubstitutableColumnNameToken(segment.getStartIndex(), segment.getStopIndex(), Joiner.on(", ").join(shorthandExtensionProjections));
}
重点要看getEncryptColumnName方法,如果query.with.cipher.column为true,返回密文字段,否则返回明文字段。
private String getEncryptColumnName(final String tableName, final String logicEncryptColumnName) {
// 根据logicEncryptColumnName逻辑列名找到原文字段
Optional<String> plainColumn = getEncryptRule().findPlainColumn(tableName, logicEncryptColumnName);
// 如果原文字段存在 且 query.with.cipher.column=false,返回原文字段
// 如果原文字段不存在 或 query.with.cipher.column=true,返回密文字段
return plainColumn.isPresent() && !queryWithCipherColumn
? plainColumn.get()
: getEncryptRule().getCipherColumn(tableName, logicEncryptColumnName);
}
对于ColumnProjectionSegment,如id, pwd这种。pwd会匹配加密规则的逻辑字段,会进入generateSQLToken方法。也是根据上面的getEncryptColumnName方法,生成原文或密文,当sql中没有自定义as别名时,拼上逻辑字段名。
private SubstitutableColumnNameToken generateSQLToken(final ColumnProjectionSegment segment, final String tableName) {
String encryptColumnName = getEncryptColumnName(tableName, segment.getColumn().getIdentifier().getValue());
if (!segment.getAlias().isPresent()) {
// 拼接 as 逻辑列名 保证不影响客户端逻辑
encryptColumnName += " AS " + segment.getColumn().getIdentifier().getValue();
}
return segment.getColumn().getOwner().isPresent()
? new SubstitutableColumnNameToken(segment.getColumn().getOwner().get().getStopIndex() + 2, segment.getStopIndex(), encryptColumnName)
: new SubstitutableColumnNameToken(segment.getStartIndex(), segment.getStopIndex(), encryptColumnName);
}
更新
Logic SQL: update my_user set pwd = ? where id = ?
Actual SQL: ds_1 ::: update my_user_1 set pwd_encrypt = ?, pwd = ? where id = ? ::: [UVZAHLRN/LfAgbNp2BP7KA==, 1608881254186, 3]
为什么更新字段会新增加密字段?
1、新增了SQLToken
定位到SQLTokenGenerators#generateSQLTokens看看哪个SQLTokenGenerator创建了新的SQLToken。很快定位到了EncryptAssignmentTokenGenerator。
public final class EncryptAssignmentTokenGenerator extends BaseEncryptSQLTokenGenerator implements CollectionSQLTokenGenerator {
@Override
public Collection<EncryptAssignmentToken> generateSQLTokens(final SQLStatementContext sqlStatementContext) {
Collection<EncryptAssignmentToken> result = new LinkedList<>();
String tableName = ((TableAvailable) sqlStatementContext).getAllTables().iterator().next().getTableName().getIdentifier().getValue();
// 循环每个update的param,比如pwd
for (AssignmentSegment each : getSetAssignmentSegment(sqlStatementContext.getSqlStatement()).getAssignments()) {
// 根据逻辑名找到EncryptColumnRuleConfiguration
// 根据EncryptColumnRuleConfiguration的encryptor找到EncryptorRuleConfiguration
if (getEncryptRule().findEncryptor(tableName, each.getColumn().getIdentifier().getValue()).isPresent()) {
// 创建SQLToken放入result
generateSQLToken(tableName, each).ifPresent(result::add);
}
}
return result;
}
}
这里创建EncryptAssignmentToken分为两种情况,一种是update my_user set pwd = ?带占位符的Param模式;一种是update my_user set pwd = 'xxx'普通的Literal模式。
private Optional<EncryptAssignmentToken> generateSQLToken(final String tableName, final AssignmentSegment assignmentSegment) {
// param
if (assignmentSegment.getValue() instanceof ParameterMarkerExpressionSegment) {
return Optional.of(generateParameterSQLToken(tableName, assignmentSegment));
}
// literal
if (assignmentSegment.getValue() instanceof LiteralExpressionSegment) {
return Optional.of(generateLiteralSQLToken(tableName, assignmentSegment));
}
return Optional.empty();
}
看一下Param模式。
private EncryptAssignmentToken generateParameterSQLToken(final String tableName, final AssignmentSegment assignmentSegment) {
EncryptParameterAssignmentToken result = new EncryptParameterAssignmentToken(assignmentSegment.getColumn().getStartIndex(), assignmentSegment.getStopIndex());
String columnName = assignmentSegment.getColumn().getIdentifier().getValue();
// 加入密文列
addCipherColumn(tableName, columnName, result);
// 加入辅助查询列
addAssistedQueryColumn(tableName, columnName, result);
// 加入原文列
addPlainColumn(tableName, columnName, result);
return result;
}
EncryptParameterAssignmentToken是最终生成的SQLToken,重点看toString方法,会作为之后AbstractSQLBuilder#toSQL拼接sql的String。
public final class EncryptParameterAssignmentToken extends EncryptAssignmentToken {
private final Collection<String> columnNames = new LinkedList<>();
public EncryptParameterAssignmentToken(final int startIndex, final int stopIndex) {
super(startIndex, stopIndex);
}
public void addColumnName(final String columnName) {
columnNames.add(columnName);
}
// 返回pwd_encrypt = ?, pwd = ?
@Override
public String toString() {
Collection<String> items = Collections2.transform(columnNames, input -> String.format("%s = ?", input));
return Joiner.on(", ").join(items);
}
}
2、新增了Param
SQLRewriteEntry#createSQLRewriteContext,先执行了SQLRewriteContextDecorator重写param,然后执行SQLTokenGenerators创建SQLToken。
public SQLRewriteContext createSQLRewriteContext(final String sql, final List<Object> parameters, final SQLStatementContext sqlStatementContext, final RouteContext routeContext) {
SQLRewriteContext result = new SQLRewriteContext(schemaMetaData, sqlStatementContext, sql, parameters);
// 执行所有SQLRewriteContextDecorator,重写Param,创建SQLTokenGenerators
decorate(decorators, result, routeContext);
// 生成SQLToken
result.generateSQLTokens();
return result;
}
EncryptSQLRewriteContextDecorator执行所有ParameterRewriter。
public final class EncryptSQLRewriteContextDecorator implements SQLRewriteContextDecorator<EncryptRule> {
@Override
public void decorate(final EncryptRule encryptRule, final ConfigurationProperties properties, final SQLRewriteContext sqlRewriteContext) {
boolean isQueryWithCipherColumn = properties.<Boolean>getValue(ConfigurationPropertyKey.QUERY_WITH_CIPHER_COLUMN);
// 构造ParameterRewriter列表
for (ParameterRewriter each : new EncryptParameterRewriterBuilder(encryptRule, isQueryWithCipherColumn).getParameterRewriters(sqlRewriteContext.getSchemaMetaData())) {
if (!sqlRewriteContext.getParameters().isEmpty() && each.isNeedRewrite(sqlRewriteContext.getSqlStatementContext())) {
// 执行ParameterRewriter
each.rewrite(sqlRewriteContext.getParameterBuilder(), sqlRewriteContext.getSqlStatementContext(), sqlRewriteContext.getParameters());
}
}
sqlRewriteContext.addSQLTokenGenerators(new EncryptTokenGenerateBuilder(encryptRule, isQueryWithCipherColumn).getSQLTokenGenerators());
}
最终定位到EncryptAssignmentParameterRewriter。
public final class EncryptAssignmentParameterRewriter extends EncryptParameterRewriter<SQLStatementContext> {
@Override
public void rewrite(final ParameterBuilder parameterBuilder, final SQLStatementContext sqlStatementContext, final List<Object> parameters) {
String tableName = ((TableAvailable) sqlStatementContext).getAllTables().iterator().next().getTableName().getIdentifier().getValue();
for (AssignmentSegment each : getSetAssignmentSegment(sqlStatementContext.getSqlStatement()).getAssignments()) {
// 是parameter模式,且这是个加密规则配置中的逻辑字段
if (each.getValue() instanceof ParameterMarkerExpressionSegment && getEncryptRule().findEncryptor(tableName, each.getColumn().getIdentifier().getValue()).isPresent()) {
StandardParameterBuilder standardParameterBuilder = parameterBuilder instanceof StandardParameterBuilder
? (StandardParameterBuilder) parameterBuilder : ((GroupedParameterBuilder) parameterBuilder).getParameterBuilders().get(0);
// 执行parameters重写
encryptParameters(standardParameterBuilder, tableName, each, parameters);
}
}
}
}
encryptParameters方法重写StandardParameterBuilder里的参数列表。
private void encryptParameters(final StandardParameterBuilder parameterBuilder, final String tableName, final AssignmentSegment assignmentSegment, final List<Object> parameters) {
String columnName = assignmentSegment.getColumn().getIdentifier().getValue();
// pwd初始值(明文)对应下标
int parameterMarkerIndex = ((ParameterMarkerExpressionSegment) assignmentSegment.getValue()).getParameterMarkerIndex();
// pwd初始值(明文)
Object originalValue = parameters.get(parameterMarkerIndex);
// 找到 Encryptor 执行加密 得到 加密值
Object cipherValue = getEncryptRule().getEncryptValues(tableName, columnName, Collections.singletonList(originalValue)).iterator().next();
// 替换 明文值为加密值
parameterBuilder.addReplacedParameters(parameterMarkerIndex, cipherValue);
Collection<Object> addedParameters = new LinkedList<>();
// 加入辅助查询值
if (getEncryptRule().findAssistedQueryColumn(tableName, columnName).isPresent()) {
Object assistedQueryValue = getEncryptRule().getEncryptAssistedQueryValues(tableName, columnName, Collections.singletonList(originalValue)).iterator().next();
addedParameters.add(assistedQueryValue);
}
// 加入原文值
if (getEncryptRule().findPlainColumn(tableName, columnName).isPresent()) {
addedParameters.add(originalValue);
}
if (!addedParameters.isEmpty()) {
parameterBuilder.addAddedParameters(parameterMarkerIndex + 1, addedParameters);
}
}
为什么这里是替换原文值为密文值,然后判断加密规则中的原文字段是否存在,再重新加入原文值呢?而不是选择直接加入密文值呢?这个就涉及到第四步迁移后删除原文列,因为原文字段不可能永远存在。
新用户
新增
Logic SQL: insert into my_user (id, pwd) values (?, ?)
Actual SQL: ds_0 ::: insert into my_user_1 (id, pwd_encrypt, pwd) values (?, ?, ?) ::: [2, TcZjXe0lAwrT19nIJ9+0Iw==, 1608881254477]
为什么插入语句会新增加密字段?
1、新增了Param
EncryptInsertValueParameterRewriter执行加密规则insert语句的param重写。
@Override
public void rewrite(final ParameterBuilder parameterBuilder, final InsertStatementContext insertStatementContext, final List<Object> parameters) {
String tableName = insertStatementContext.getSqlStatement().getTable().getTableName().getIdentifier().getValue();
// 倒序排列的字段名
Iterator<String> descendingColumnNames = insertStatementContext.getDescendingColumnNames();
while (descendingColumnNames.hasNext()) {
String columnName = descendingColumnNames.next();
getEncryptRule()
// 找到Encryptor
.findEncryptor(tableName, columnName)
// 重写Param放入ParameterBuilder
.ifPresent(encryptor -> {
encryptInsertValues((GroupedParameterBuilder) parameterBuilder, insertStatementContext, encryptor, tableName, columnName);
});
}
}
最终定位到encryptInsertValue方法,将重写参数放入StandardParameterBuilder。
private void encryptInsertValue(final Encryptor encryptor, final String tableName, final int columnIndex,
final Object originalValue, final StandardParameterBuilder parameterBuilder, final String encryptLogicColumnName) {
// FIXME: can process all part of insert value is ? or literal, can not process mix ? and literal
// For example: values (?, ?), (1, 1) can process
// For example: values (?, 1), (?, 2) can not process
// 加密明文为密文放入parameterBuilder
parameterBuilder.addReplacedParameters(columnIndex, encryptor.encrypt(originalValue));
Collection<Object> addedParameters = new LinkedList<>();
// 辅助查询值放入parameterBuilder
if (encryptor instanceof QueryAssistedEncryptor) {
Optional<String> assistedColumnName = getEncryptRule().findAssistedQueryColumn(tableName, encryptLogicColumnName);
Preconditions.checkArgument(assistedColumnName.isPresent(), "Can not find assisted query Column Name");
addedParameters.add(((QueryAssistedEncryptor) encryptor).queryAssistedEncrypt(originalValue.toString()));
}
// 明文值放入parameterBuilder
if (getEncryptRule().findPlainColumn(tableName, encryptLogicColumnName).isPresent()) {
addedParameters.add(originalValue);
}
if (!addedParameters.isEmpty()) {
if (!parameterBuilder.getAddedIndexAndParameters().containsKey(columnIndex + 1)) {
parameterBuilder.getAddedIndexAndParameters().put(columnIndex + 1, new LinkedList<>());
}
parameterBuilder.getAddedIndexAndParameters().get(columnIndex + 1).addAll(addedParameters);
}
}
上面的源码注释看到,配置了加密规则的情况下,不支持literal和param模式混合在一个数据行中做插入。比如下面这种插入方式,这真的是个大坑。
private static void insert(Connection connection, int id, String pwd) throws SQLException {
String sql = "insert into my_user (id, pwd) values (9998, ?)";
PreparedStatement statement = connection.prepareStatement(sql);
statement.setString(1, String.valueOf(System.currentTimeMillis()));
statement.execute();
statement.close();
}
问题就是StandardParameterBuilder#getParameters这个方法执行时,会报java.lang.IndexOutOfBoundsException异常。
在SQLRouteRewriteEngine#rewrite重写引擎执行完sql语句拼接之后(AbstractSQLBuilder#toSQL),获取重写完param列表时调用StandardParameterBuilder#getParameters这个方法。
@RequiredArgsConstructor
public final class StandardParameterBuilder implements ParameterBuilder {
// 原始参数列表
private final List<Object> originalParameters;
// index - 需要添加的参数
@Getter
private final Map<Integer, Collection<Object>> addedIndexAndParameters = new TreeMap<>();
// index - 需要替换的参数
private final Map<Integer, Object> replacedIndexAndParameters = new LinkedHashMap<>();
// 需要移除的index
private final List<Integer> removeIndexAndParameters = new ArrayList<>();
// 暴露给外部
public void addAddedParameters(final int index, final Collection<Object> parameters) {
addedIndexAndParameters.put(index, parameters);
}
// 暴露给外部
public void addReplacedParameters(final int index, final Object parameter) {
replacedIndexAndParameters.put(index, parameter);
}
// 暴露给外部
public void addRemovedParameters(final int index) {
removeIndexAndParameters.add(index);
}
@Override
public List<Object> getParameters() {
List<Object> result = new LinkedList<>(originalParameters);
// 替换下标对应的值
for (Entry<Integer, Object> entry : replacedIndexAndParameters.entrySet()) {
// IndexOutOfBoundsException
result.set(entry.getKey(), entry.getValue());
}
// 新增下标对应的值
for (Entry<Integer, Collection<Object>> entry : ((TreeMap<Integer, Collection<Object>>) addedIndexAndParameters).descendingMap().entrySet()) {
if (entry.getKey() > result.size()) {
result.addAll(entry.getValue());
} else {
result.addAll(entry.getKey(), entry.getValue());
}
}
// 移除下标对应的值
for (int index : removeIndexAndParameters) {
result.remove(index);
}
return result;
}
}
2、替换了SQLToken
EncryptInsertValuesTokenGenerator创建加密insert的SQLToken。没什么特别的,就是把(?,?)改为(?,?,?)。
@Override
public InsertValuesToken generateSQLToken(final InsertStatementContext insertStatementContext) {
Optional<SQLToken> insertValuesToken = findPreviousSQLToken(InsertValuesToken.class);
if (insertValuesToken.isPresent()) {
processPreviousSQLToken(insertStatementContext, (InsertValuesToken) insertValuesToken.get());
return (InsertValuesToken) insertValuesToken.get();
}
return generateNewSQLToken(insertStatementContext);
}
// 核心逻辑
private void encryptToken(final InsertValue insertValueToken, final String tableName, final InsertStatementContext insertStatementContext, final InsertValueContext insertValueContext) {
Optional<SQLToken> useDefaultInsertColumnsToken = findPreviousSQLToken(UseDefaultInsertColumnsToken.class);
Iterator<String> descendingColumnNames = insertStatementContext.getDescendingColumnNames();
while (descendingColumnNames.hasNext()) {
String columnName = descendingColumnNames.next();
// 找到逻辑列名对应的Encryptor
Optional<Encryptor> encryptor = getEncryptRule().findEncryptor(tableName, columnName);
if (encryptor.isPresent()) {
int columnIndex = useDefaultInsertColumnsToken.map(sqlToken -> ((UseDefaultInsertColumnsToken) sqlToken).getColumns().indexOf(columnName))
.orElseGet(() -> insertStatementContext.getColumnNames().indexOf(columnName));
// 明文
Object originalValue = insertValueContext.getValue(columnIndex);
// 添加原文列
addPlainColumn(insertValueToken, columnIndex, tableName, columnName, insertValueContext, originalValue);
// 添加辅助查询列
addAssistedQueryColumn(insertValueToken, encryptor.get(), columnIndex, tableName, columnName, insertValueContext, originalValue);
// 设置密文列
setCipherColumn(insertValueToken, encryptor.get(), columnIndex, insertValueContext.getValueExpressions().get(columnIndex), originalValue);
}
}
}
历史数据更新
sharding-sphere没有提供历史数据的脱敏方案。pwd_encrypt字段始终为空,需要刷脚本初始化。
新增数据在插入时,就通过 Apache ShardingSphere 加密为密文数据,并被存储到了 cipherColumn。而现在就需要处理历史明文存量数据。 由于Apache ShardingSphere 目前并未提供相关迁移洗数工具,此时需要业务方自行将 pwd 中的明文数据进行加密处理存储到 pwd_cipher。
三、迁移中:query.with.cipher.column=true
老用户
查询
Logic SQL: select * from my_user where id = ?
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where id = ? ::: [3]
Logic SQL: select id, pwd from my_user where id = ?
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where id = ? ::: [3]
Logic SQL: select * from my_user where pwd = ?
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_0 where pwd_encrypt = ? ::: [/sRcoTnWHoY1+PT9F6tpAg==]
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_1 where pwd_encrypt = ? ::: [/sRcoTnWHoY1+PT9F6tpAg==]
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_0 where pwd_encrypt = ? ::: [/sRcoTnWHoY1+PT9F6tpAg==]
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where pwd_encrypt = ? ::: [/sRcoTnWHoY1+PT9F6tpAg==]
1、查询字段被重写 pwd_encrypt AS pwd
原因和前面迁移前一样,见EncryptProjectionTokenGenerator#getEncryptColumnName,由于query.with.cipher.column=true这里会返回密文字段pwd_encrypt。
private String getEncryptColumnName(final String tableName, final String logicEncryptColumnName) {
// 根据logicEncryptColumnName逻辑列名找到原文字段
Optional<String> plainColumn = getEncryptRule().findPlainColumn(tableName, logicEncryptColumnName);
// 如果原文字段存在 且 query.with.cipher.column=false,返回原文字段
// 如果原文字段不存在 或 query.with.cipher.column=true,返回密文字段
return plainColumn.isPresent() && !queryWithCipherColumn ? plainColumn.get() : getEncryptRule().getCipherColumn(tableName, logicEncryptColumnName);
}
2、where条件被重写 where pwd_encrypt = ?
首先EncryptPredicateColumnTokenGenerator创建了SQLToken。
private Collection<SubstitutableColumnNameToken> generateSQLTokens(final SQLStatementContext sqlStatementContext, final AndPredicate andPredicate) {
Collection<SubstitutableColumnNameToken> result = new LinkedList<>();
for (PredicateSegment each : andPredicate.getPredicates()) {
Optional<EncryptTable> encryptTable = findEncryptTable(sqlStatementContext, each);
if (!encryptTable.isPresent() || !encryptTable.get().findEncryptor(each.getColumn().getIdentifier().getValue()).isPresent()) {
continue;
}
int startIndex = each.getColumn().getOwner().isPresent() ? each.getColumn().getOwner().get().getStopIndex() + 2 : each.getColumn().getStartIndex();
int stopIndex = each.getColumn().getStopIndex();
// query.with.cipher.column=false 这里会取原文字段
if (!queryWithCipherColumn) {
Optional<String> plainColumn = encryptTable.get().findPlainColumn(each.getColumn().getIdentifier().getValue());
if (plainColumn.isPresent()) {
result.add(new SubstitutableColumnNameToken(startIndex, stopIndex, plainColumn.get()));
continue;
}
}
// query.with.cipher.column=true
// 判断辅助查询列是否匹配当前PredicateSegment断言,如果是辅助查询列则返回辅助查询列
Optional<String> assistedQueryColumn = encryptTable.get().findAssistedQueryColumn(each.getColumn().getIdentifier().getValue());
SubstitutableColumnNameToken encryptColumnNameToken = assistedQueryColumn.map(columnName -> new SubstitutableColumnNameToken(startIndex, stopIndex, columnName))
// 否则返回密文列
.orElseGet(() -> new SubstitutableColumnNameToken(startIndex, stopIndex, encryptTable.get().getCipherColumn(each.getColumn().getIdentifier().getValue())));
result.add(encryptColumnNameToken);
}
return result;
}
其次EncryptPredicateParameterRewriter将查询参数重写为密文。
@Setter
public final class EncryptPredicateParameterRewriter extends EncryptParameterRewriter<SQLStatementContext> implements SchemaMetaDataAware, QueryWithCipherColumnAware {
private SchemaMetaData schemaMetaData;
private boolean queryWithCipherColumn;
@Override
public void rewrite(final ParameterBuilder parameterBuilder, final SQLStatementContext sqlStatementContext, final List<Object> parameters) {
// 创建EncryptCondition,类似ShardingCondition,表示某个表某个字段的 = 或 in 断言
List<EncryptCondition> encryptConditions = new EncryptConditionEngine(getEncryptRule(), schemaMetaData).createEncryptConditions(sqlStatementContext);
if (encryptConditions.isEmpty()) {
return;
}
for (EncryptCondition each : encryptConditions) {
if (queryWithCipherColumn) {
// 加密原始查询参数
List<Object> encryptedValues = getEncryptedValues(each, each.getValues(parameters));
// parameterBuilder替换查询参数
encryptParameters(parameterBuilder, each.getPositionIndexMap(), encryptedValues);
}
}
}
}
EncryptConditionEngine#createEncryptCondition创建EncryptCondition,不支持between...and...比较。
private Optional<EncryptCondition> createEncryptCondition(final PredicateSegment predicateSegment, final String tableName) {
// 比较运算符 如 <> != =
if (predicateSegment.getRightValue() instanceof PredicateCompareRightValue) {
PredicateCompareRightValue compareRightValue = (PredicateCompareRightValue) predicateSegment.getRightValue();
return isSupportedOperator(compareRightValue.getOperator()) ? createCompareEncryptCondition(tableName, predicateSegment, compareRightValue) : Optional.empty();
}
// in
if (predicateSegment.getRightValue() instanceof PredicateInRightValue) {
return createInEncryptCondition(tableName, predicateSegment, (PredicateInRightValue) predicateSegment.getRightValue());
}
// 不支持between and
if (predicateSegment.getRightValue() instanceof PredicateBetweenRightValue) {
throw new ShardingSphereException("The SQL clause 'BETWEEN...AND...' is unsupported in encrypt rule.");
}
return Optional.empty();
}
EncryptPredicateParameterRewriter#getEncryptedValues优先使用辅助查询值作为查询条件值。
private List<Object> getEncryptedValues(final EncryptCondition encryptCondition, final List<Object> originalValues) {
String tableName = encryptCondition.getTableName();
String columnName = encryptCondition.getColumnName();
return getEncryptRule().findAssistedQueryColumn(tableName, columnName).isPresent()
// 如果辅助查询存在,使用辅助查询值
? getEncryptRule().getEncryptAssistedQueryValues(tableName, columnName, originalValues)
// 否则使用加密值
: getEncryptRule().getEncryptValues(tableName, columnName, originalValues);
}
EncryptPredicateParameterRewriter#encryptParameters,还是将重写参数通过addReplacedParameters方法放入StandardParameterBuilder。
private void encryptParameters(final ParameterBuilder parameterBuilder, final Map<Integer, Integer> positionIndexes, final List<Object> encryptValues) {
if (!positionIndexes.isEmpty()) {
for (Entry<Integer, Integer> entry : positionIndexes.entrySet()) {
((StandardParameterBuilder) parameterBuilder).addReplacedParameters(entry.getValue(), encryptValues.get(entry.getKey()));
}
}
}
更新
Logic SQL: update my_user set pwd = ? where id = ?
Actual SQL: ds_1 ::: update my_user_1 set pwd_encrypt = ?, pwd = ? where id = ? ::: [b9ValhsVZfJ2Mu7GKmsHag==, 1609060586708, 3]
和开启query.with.cipher.column之前一样,EncryptAssignmentTokenGenerator创建SQLToken,EncryptAssignmentParameterRewriter新增了加密参数,SQL没发生变化。
新用户
新增
Logic SQL: insert into my_user (id, pwd) values (?, ?)
Actual SQL: ds_0 ::: insert into my_user_0 (id, pwd_encrypt, pwd) values (?, ?, ?) ::: [5140, NHj62xHlEngXKI9YV7twyg==, 1609061001092]
和开启query.with.cipher.column之前一样,EncryptInsertValueParameterRewriter新增加密参数,EncryptInsertValuesTokenGenerator创建SQLToken。
四、迁移后
配置去除明文列
注意这里EncryptColumnRuleConfiguration配置的第一个参数,原来是pwd代表数据表实际的原文字段,这个时候去除了。
private EncryptRuleConfiguration createEncryptRuleConfig2() {
EncryptRuleConfiguration configuration = new EncryptRuleConfiguration();
// 加密解密器
Properties properties = new Properties();
properties.setProperty("aes.key.value", "123456");
EncryptorRuleConfiguration aesRuleConfiguration = new EncryptorRuleConfiguration("aes", properties);
configuration.getEncryptors().put("user_pwd_encryptor", aesRuleConfiguration);
// 表 - 字段 - 加密规则 注意已经去除了pwdEncryptConfig的原文字段pwd
EncryptColumnRuleConfiguration pwdEncryptConfig = new EncryptColumnRuleConfiguration(null, "pwd_encrypt", "", "user_pwd_encryptor");
Map<String, EncryptColumnRuleConfiguration> columns = new HashMap<>();
columns.put("pwd", pwdEncryptConfig);// 逻辑字段 - 加密配置
configuration.getTables().put("my_user", new EncryptTableRuleConfiguration(columns));
return configuration;
}
DDL删除明文字段
private static void dropPlainPwdColumn(Connection connection) throws SQLException {
String sql = "alter table my_user drop column pwd";
Statement statement = connection.createStatement();
statement.execute(sql);
statement.close();
}
增改查
Logic SQL: select * from my_user where id = ?
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where id = ? ::: [3]
Logic SQL: select id, pwd from my_user where id = ?
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where id = ? ::: [3]
Logic SQL: update my_user set pwd = ? where id = ?
Actual SQL: ds_1 ::: update my_user_1 set pwd_encrypt = ? where id = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==, 3]
Logic SQL: select * from my_user where pwd = ?
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_0 where pwd_encrypt = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==]
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_1 where pwd_encrypt = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==]
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_0 where pwd_encrypt = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==]
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_1 where pwd_encrypt = ? ::: [4tzE6aM2SQXhj8sJcJrjmQ==]
Logic SQL: insert into my_user (id, pwd) values (?, ?)
Actual SQL: ds_1 ::: insert into my_user_1 (id, pwd_encrypt) values (?, ?) ::: [9343, enAqjGiquCRS7lhheLgmnA==]
对于select没有变化。
对于insert,比如EncryptInsertValuesTokenGenerator#addPlainColumn在创建SQLToken时,判断原文字段配置不存在,就不会生成原文字段SQLToken。
private void addPlainColumn(final InsertValue insertValueToken, final int columnIndex,
final String tableName, final String columnName, final InsertValueContext insertValueContext, final Object originalValue) {
if (getEncryptRule().findPlainColumn(tableName, columnName).isPresent()) {
DerivedSimpleExpressionSegment derivedExpressionSegment = insertValueContext.getParameters().isEmpty()
? new DerivedLiteralExpressionSegment(originalValue) : new DerivedParameterMarkerExpressionSegment(getParameterIndexCount(insertValueToken));
insertValueToken.getValues().add(columnIndex + 1, derivedExpressionSegment);
}
}
同样的,EncryptInsertValueParameterRewriter#encryptInsertValue重写参数列表时,也因为找不到原文字段配置,不会把原文参数放入参数列表。
private void encryptInsertValue(final Encryptor encryptor, final String tableName, final int columnIndex,
final Object originalValue, final StandardParameterBuilder parameterBuilder, final String encryptLogicColumnName) {
// 加密明文为密文放入parameterBuilder
parameterBuilder.addReplacedParameters(columnIndex, encryptor.encrypt(originalValue));
Collection<Object> addedParameters = new LinkedList<>();
// 辅助查询值放入parameterBuilder
if (encryptor instanceof QueryAssistedEncryptor) {
Optional<String> assistedColumnName = getEncryptRule().findAssistedQueryColumn(tableName, encryptLogicColumnName);
Preconditions.checkArgument(assistedColumnName.isPresent(), "Can not find assisted query Column Name");
addedParameters.add(((QueryAssistedEncryptor) encryptor).queryAssistedEncrypt(originalValue.toString()));
}
// 这里找不到原文字段配置,不会放入结果集
if (getEncryptRule().findPlainColumn(tableName, encryptLogicColumnName).isPresent()) {
addedParameters.add(originalValue);
}
if (!addedParameters.isEmpty()) {
if (!parameterBuilder.getAddedIndexAndParameters().containsKey(columnIndex + 1)) {
parameterBuilder.getAddedIndexAndParameters().put(columnIndex + 1, new LinkedList<>());
}
parameterBuilder.getAddedIndexAndParameters().get(columnIndex + 1).addAll(addedParameters);
}
}
对于update也是一样,见EncryptAssignmentTokenGenerator#addPlainColumn和EncryptAssignmentParameterRewriter#encryptParameters。
四、辅助查询列
上面一直提到辅助查询列,为什么需要这个字段呢?主要是有这种需求:
即使是相同的数据,如两个用户的密码相同,它们在数据库里存储的加密数据也应当是不一样的。这种理念更有利于保护用户信息,防止撞库成功。
针对于这种需求,org.apache.shardingsphere.encrypt.strategy.spi.Encryptor有一个子接口org.apache.shardingsphere.encrypt.strategy.spi.QueryAssistedEncryptor会提供辅助查询方法,将明文值转换为辅助查询值去查询辅助查询字段(言外之意,只有where条件中有加密配置的逻辑字段限制条件时,这个才有用,如果平常业务用不到,只需要实现一个Encryptor保证可逆且同样的原文密文不同即可)。
public interface QueryAssistedEncryptor extends Encryptor {
String queryAssistedEncrypt(String plaintext);
}
它提供三种函数进行实现,分别是encrypt(), decrypt(), queryAssistedEncrypt()。在encrypt()阶段,用户通过设置某个变动种子,例如时间戳。 针对原始数据+变动种子组合的内容进行加密,就能保证即使原始数据相同,也因为有变动种子的存在,致使加密后的加密数据是不一样的。在decrypt()可依据之前规定的加密算法,利用种子数据进行解密。
由于queryAssistedEncrypt()和encrypt()产生不同加密数据进行存储,而decrypt()可逆,queryAssistedEncrypt()不可逆。 在查询原始数据的时候,我们会自动对SQL进行解析、改写、路由,利用辅助查询列进行WHERE条件的查询,却利用 decrypt()对encrypt()加密后的数据进行解密,并将原始数据返回给用户。 这一切都是对用户透明化的。
下面是根据官方文档实现的案例,AES加密方法带上时间戳随机种子保证落库数据不一样,AES解密过后刨除随机种子返回给用户,queryAssistedEncrypt辅助查询使用MD5算法保证where查询。
QueryAssistedEncryptor实现如下,记得META-INFO/services下要通过SPI注入:
public class CustomQueryAssistedEncryptor implements QueryAssistedEncryptor {
// 委托AES执行encrypt和decrypt
private final Encryptor aesEncryptor;
// 委托MD5执行queryAssistedEncrypt
private final Encryptor digestEncryptor;
private Properties properties;
// 随机种子长度
private final int seedLength = String.valueOf(System.currentTimeMillis()).length();
public CustomQueryAssistedEncryptor() {
this.aesEncryptor = new AESEncryptor();
this.digestEncryptor = new MD5Encryptor();
}
@Override
public String queryAssistedEncrypt(String plaintext) {
String salt = this.getProperties().getProperty("md5.salt");
return digestEncryptor.encrypt(plaintext + salt);
}
@Override
public void init() {
}
@Override
public String encrypt(Object plaintext) {
return aesEncryptor.encrypt(plaintext + String.valueOf(System.currentTimeMillis()));
}
@Override
public Object decrypt(String ciphertext) {
String decrypt = (String) aesEncryptor.decrypt(ciphertext);
return decrypt.substring(0, decrypt.length() - seedLength);
}
@Override
public String getType() {
return "CustomQueryAssistedEncryptor";
}
@Override
public Properties getProperties() {
return this.properties;
}
@Override
public void setProperties(Properties properties) {
this.properties = properties;
aesEncryptor.setProperties(properties);
digestEncryptor.setProperties(properties);
}
}
加密配置如下:
private EncryptRuleConfiguration createEncryptRuleConfig() {
EncryptRuleConfiguration configuration = new EncryptRuleConfiguration();
// 加密解密器
Properties properties = new Properties();
properties.setProperty("aes.key.value", "123456");
properties.setProperty("md5.salt", "abcdef");
EncryptorRuleConfiguration aesRuleConfiguration = new EncryptorRuleConfiguration("CustomQueryAssistedEncryptor", properties);
configuration.getEncryptors().put("user_pwd_encryptor", aesRuleConfiguration);
// 表 - 字段 - 加密规则
EncryptColumnRuleConfiguration encryptConfig = new EncryptColumnRuleConfiguration(null, "pwd_encrypt", "pwd_assist", "user_pwd_encryptor");
Map<String, EncryptColumnRuleConfiguration> columns = new HashMap<>();
columns.put("pwd", encryptConfig);// 逻辑字段 - 加密配置
configuration.getTables().put("my_user_assist", new EncryptTableRuleConfiguration(columns));
return configuration;
}
对应DDL:
create table my_user_assist (id bigint(20) not null primary key, pwd_encrypt varchar(255) not null, pwd_assist varchar(255) not null)
增改查重写结果:
// 插入两个用户,密码都是123456,pwd_encrypt不同,pwd_assist相同
Logic SQL: insert into my_user_assist (id, pwd) values (?, ?)
Actual SQL: ds_1 ::: insert into my_user_assist_1 (id, pwd_encrypt, pwd_assist) values (?, ?, ?) ::: [3, 1d/2wa8LM9FlepSZPmy3EC9HI2/xGzx9sq0qEBVSvC4=, 6f3b8ded65bd7a4db11625ac84e579bb]
Logic SQL: insert into my_user_assist (id, pwd) values (?, ?)
Actual SQL: ds_0 ::: insert into my_user_assist_0 (id, pwd_encrypt, pwd_assist) values (?, ?, ?) ::: [4, NPdqE1U22vN2awqLfcQGVmq90U7dDOEve/JY/7PbzCQ=, 6f3b8ded65bd7a4db11625ac84e579bb]
Logic SQL: select * from my_user_assist where id = ?
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_assist_1 where id = ? ::: [3]
Logic SQL: select id, pwd from my_user_assist where id = ?
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_assist_1 where id = ? ::: [3]
Logic SQL: update my_user_assist set pwd = ? where id = ?
Actual SQL: ds_1 ::: update my_user_assist_1 set pwd_encrypt = ?, pwd_assist = ? where id = ? ::: [4KpzieXQFNUpjdBToqGmYt1os93mcmEDK2zLftdNxIE=, 6f3b8ded65bd7a4db11625ac84e579bb, 3]
Logic SQL: select * from my_user_assist where pwd = ?
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_assist_0 where pwd_assist = ? ::: [6f3b8ded65bd7a4db11625ac84e579bb]
Actual SQL: ds_0 ::: select id, pwd_encrypt AS pwd from my_user_assist_1 where pwd_assist = ? ::: [6f3b8ded65bd7a4db11625ac84e579bb]
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_assist_0 where pwd_assist = ? ::: [6f3b8ded65bd7a4db11625ac84e579bb]
Actual SQL: ds_1 ::: select id, pwd_encrypt AS pwd from my_user_assist_1 where pwd_assist = ? ::: [6f3b8ded65bd7a4db11625ac84e579bb]
总结
-
sharding-jdbc的数据脱敏功能点的主要逻辑都集中在SQL重写这一步骤,而且是集中在SQLRewriteEntry#createSQLRewriteContext方法的第一步和第二步。
- 第一步,由EncryptSQLRewriteContextDecorator执行ParameterRewriter重写Parameter。
- 第二步,由EncryptSQLRewriteContextDecorator创建的SQLTokenGenerator,创建SQLToken。
-
query.with.cipher.column代表是否启用密文查询,如果是,即select字段需要查询密文字段出来后解密,where条件需要加密后再执行查询。
-
需要注意的一个坑,只要配置了EncryptRuleConfiguration加密配置,要保证insert时要么全是问号占位符,要么全是死值。即不能有这种sql:insert into table (id, pwd, deleted) values (?, ?, 0),否则会报错。
-
辅助查询列。如果同样的明文需要生成不同的密文落库,且又需要通过这个逻辑列做where条件查询,需要用到辅助查询列,需要用户自己实现QueryAssistedEncryptor接口。