序言
这篇文件是对我之前写的一篇文件的补充,地址在这# SpringBoot + Minio + ElasticSearch实现文件内容检索,在这里里面没有考虑到MySQL和ElasticSearch的数据同步性,所以今天补充一下。
1. Canal介绍
在了解Canal之前,大家需要知道MySQL主从复制的原理,我简单说明一下
- 主库上的每一个事务操作(比如插入、删除、更新等)都会被记录到二进制日志(binary log)中,binary log是一种二进制格式的日志文件,它包含了所有会修改数据的事件。可以通过
SHOW BINLOG EVENTS
进行查看 - 从库就会去读取主库的binary log的日志,将其拷贝到自己的中继日志(relay log)
- 然后从库执行 relay log中的事件,将数据变更反映为自己的数据。
1.1 Canal原理
- Canal就是模拟MySQL主从的交互协议,伪装自己为
Mysql SLAVE
(从库),向MySQL主库发送dump协议 - MySQL主库收到dump请求,开始推送自己binary log给SLAVE(即Canal)
- Canal解析binary log对象(原始为byte流)
1.2 Canal下载
Canal是阿里巴巴开源的一个基于 MySQL 数据库增量日志解析,提供增量数据订阅和消费的组件,下载地址为github.com/alibaba/can… 建议大家下载1.1.7的版本
下载完成之后,解压即可
1.3 Canal配置
在开始进行Canal配置之前,我们需要确保MySQL的binary log日志是否开启,使用如下命令
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'log_bin';
用于查看是否开启了 Binlog。如果返回值的Value
列显示为ON
,则表示 Binlog 已开启;如果是OFF
,则表示未开启。SHOW VARIABLES LIKE 'binlog_format';
用于查看当前 Binlog 的格式,返回值的Value
列可能显示为STATEMENT
、ROW
或者MIXED
。我们需要将格式设置为ROW
如果binary log没有开启的话,我们需要找到MySQL的配置文件,在Windows系统下为my.ini
,Linux系统下为my.cnf
,其路径通常在 C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Recent
下
在配置文件末尾加上以下内容:
log-bin=mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复
配置完成后,我们打开MySQL客户端,创建一个授权账号,供Canal连接使用,如果有账号则直接grant(不要使用root账号),完成后一定要重启数据库
CREATE USER canal IDENTIFIED BY 'canal';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
-- 刷新权限
FLUSH PRIVILEGES;
我们来到\canal-deployer-1.1.7\conf\example\instance.properties
需要修改两个地方
# position info 需要改成自己的数据库信息
# 修改为自己的数据库地址和端口
canal.instance.master.address=127.0.0.1:3306
canal.instance.master.journal.name=DESKTOP-FL9JG86-bin.000091
canal.instance.master.position=157
canal.instance.master.timestamp=
canal.instance.master.gtid=
# username/password 需要改成自己的数据库信息 不能使用root账户
canal.instance.dbUsername=canal # 修改为刚刚授权的账户
canal.instance.dbPassword=canal # 密码
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
对于canal.instance.master.journal.name
和canal.instance.master.position
两个参数的值,大家在MySQL客户端使用SHOW MASTER STATUS
命令可以查看,比如:
则
canal.instance.master.journal.name
值为DESKTOP-FL9JG86-bin.000091,canal.instance.master.position
值为2671,修改完毕后保存。然后运行
canal-deployer-1.1.7\bin\startup.bat
程序,
查看logs/canal/canal.log
日志,结果如下,则启动完成
2013-02-05 22:45:27.967 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## start the canal server.
2013-02-05 22:45:28.113 [main] INFO com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[10.1.29.120:11111]
2013-02-05 22:45:28.210 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## the canal server is running now ......
如果大家报错的话,是需要修改startup.bat文件的,因为里面的参数是使用Java的较低版本的参数,修改后内容如下,直接复制粘贴
@echo off
@if not "%ECHO%" == "" echo %ECHO%
@if "%OS%" == "Windows_NT" setlocal
set ENV_PATH=.\
if "%OS%" == "Windows_NT" set ENV_PATH=%~dp0%
set conf_dir=%ENV_PATH%\..\conf
set canal_conf=%conf_dir%\canal.properties
@rem set canal_conf=%conf_dir%\canal_local.properties
if "%1" == "local" set canal_conf=%conf_dir%\canal_local.properties
set logback_configurationFile=%conf_dir%\logback.xml
set CLASSPATH=%conf_dir%
set CLASSPATH=%conf_dir%\..\lib\*;%CLASSPATH%
set JAVA_MEM_OPTS= -Xms128m -Xmx128m -XX:MaxMetaspaceSize=128m
set JAVA_OPTS_EXT= -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dapplication.codeset=UTF-8 -Dfile.encoding=UTF-8
set JAVA_DEBUG_OPT= -server -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,address=9099,server=y,suspend=n
set CANAL_OPTS= -DappName=otter-canal -Dlogback.configurationFile="%logback_configurationFile%" -Dcanal.conf="%canal_conf%"
set JAVA_OPTS= %JAVA_MEM_OPTS% %JAVA_OPTS_EXT% %JAVA_DEBUG_OPT% %CANAL_OPTS%
set CMD_STR= java %JAVA_OPTS% -classpath "%CLASSPATH%" java %JAVA_OPTS% -classpath "%CLASSPATH%" com.alibaba.otter.canal.deployer.CanalLauncher
echo start cmd : %CMD_STR%
java %JAVA_OPTS% -classpath "%CLASSPATH%" com.alibaba.otter.canal.deployer.CanalLauncher
然后重新启动,就没有问题了
2.配置RabbitMQ
接下来,我们来到/canal-deployer-1.1.7/conf/canal.properties
,也是需要修改两个地方
canal.zkServers =
# flush data to zk
canal.zookeeper.flush.period = 1000
canal.withoutNetty = false
# tcp, kafka, rocketMQ, rabbitMQ, pulsarMQ 可选项: tcp(默认), kafka,RocketMQ,rabbitmq,pulsarmq
# 将此处修改为 rabbitMQ
canal.serverMode = rabbitMQ
# flush meta cursor/parse position to file
##################################################
######### RabbitMQ #############
##################################################
# 配置RabbitMQ
rabbitmq.host = 127.0.0.1:5672
rabbitmq.virtual.host = /
rabbitmq.exchange = canalFanoutExchange
rabbitmq.username = guest
rabbitmq.password = guest
rabbitmq.deliveryMode =
修改完成后保存,重新运行startup.bat
,查看logs/canal/canal.log
日志和logs/example/example.log
,如果都没有报错就成功了。如果大家出现java.io.IOException: Received error packet: errno = 1236, sqlstate = HY000 errmsg = binlog truncated
和Caused by:com.alibaba.otter.canal.parse.exception.CanalParseException: column size is not match for table
错误,直接删除canal/conf/下对应实例的meta.dat文件即可
3.实现同步
前面的配置完成之后,主数据的库操作都会被Canal监听,并把消息发送到RabbitMQ中,那我们直接获取这个消息,就可以知道MySQL做了哪些事情(新增、修改、删除),那么es也需要做出相应的操作
3.1 创建交换机
@Bean
public FanoutExchange canalFanoutExchange() {
return new FanoutExchange("canalFanoutExchange", true, false);
}
@Bean
public Queue canalQueue() {
return new Queue("canalQueue", true);
}
@Bean
public Binding canalBinding() {
LOGGER.info("绑定canal交换机");
return BindingBuilder.bind(canalQueue()).to(canalFanoutExchange());
}
这样我们监听canalQueue队列,即可获取到Canal推送到RabbitMQ的信息,代码如下:
@RabbitListener(queuesToDeclare = @Queue("canalQueue"))
public void receive3(@Payload String body, @Headers Map<String,Object> headers) {
// LOGGER.info("canalQueue:{}", canalEntity);
LOGGER.info("canalQueueBody:{}", body);
LOGGER.info("canalQueueHeaders:{}", headers);
}
然后在数据库新增一条数据
接口打印信息如下:
因为Canal发送的数据是byte类型的数组,所以这里接收到的数据也是byte类型的数组,使用代码对其进行解析
byte[] byteData = new byte[]{123,34,100,97,116,97,34,58,91,123,34,105,100,34,58,34,49,51,34,44,34,102,105,108,101,95,110,97,109,101,34,58,34,100,97,106,119,100,106,97,119,34,44,34,102,105,108,101,95,116,121,112,101,34,58,34,100,111,99,120,34,44,34,102,105,108,101,95,115,105,122,101,34,58,34,49,53,57,48,34,44,34,102,105,108,101,95,112,97,116,104,34,58,34,49,34,44,34,105,115,95,100,101,108,101,116,101,100,34,58,34,48,34,125,93,44,34,100,97,116,97,98,97,115,101,34,58,34,100,98,55,34,44,34,101,115,34,58,49,55,51,48,56,56,54,57,48,49,48,48,48,44,34,105,100,34,58,57,44,34,105,115,68,100,108,34,58,102,97,108,115,101,44,34,109,121,115,113,108,84,121,112,101,34,58,123,34,105,100,34,58,34,105,110,116,34,44,34,102,105,108,101,95,110,97,109,101,34,58,34,118,97,114,99,104,97,114,40,50,53,53,41,34,44,34,102,105,108,101,95,116,121,112,101,34,58,34,118,97,114,99,104,97,114,40,53,48,41,34,44,34,102,105,108,101,95,115,105,122,101,34,58,34,98,105,103,105,110,116,34,44,34,102,105,108,101,95,112,97,116,104,34,58,34,116,101,120,116,34,44,34,105,115,95,100,101,108,101,116,101,100,34,58,34,116,105,110,121,105,110,116,40,49,41,34,125,44,34,111,108,100,34,58,110,117,108,108,44,34,112,107,78,97,109,101,115,34,58,91,34,105,100,34,93,44,34,115,113,108,34,58,34,34,44,34,115,113,108,84,121,112,101,34,58,123,34,105,100,34,58,52,44,34,102,105,108,101,95,110,97,109,101,34,58,49,50,44,34,102,105,108,101,95,116,121,112,101,34,58,49,50,44,34,102,105,108,101,95,115,105,122,101,34,58,45,53,44,34,102,105,108,101,95,112,97,116,104,34,58,50,48,48,53,44,34,105,115,95,100,101,108,101,116,101,100,34,58,45,54,125,44,34,116,97,98,108,101,34,58,34,102,105,108,101,95,116,97,98,108,101,34,44,34,116,115,34,58,49,55,51,48,56,56,54,57,48,49,57,52,51,44,34,116,121,112,101,34,58,34,73,78,83,69,82,84,34,125};
String jsonString = new String(byteData, StandardCharsets.UTF_8);
ObjectMapper mapper = new ObjectMapper();
JsonNode map = null;
try {
map = mapper.readTree(jsonString);
} catch (IOException e) {
throw new RuntimeException(e);
}
// 打印解析后的 JSON 对象
System.out.println("解析后的 JSON 对象:" + map.toString());
将其解析后就会看到如下信息
可以看到data中的数据就是我们MySQL中新增的数据,针对这个数据结构,我建立了一个实体类,用来映射这个JSON对象
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.io.Serializable;
import java.util.List;
/**
* ClassName: CanalEntity
* Package: com.example.springbootrabbitmq.entity
* Description:
* 接受canal信息的实体类
*
* @Author ms
* @Create 2024/11/6 12:07
* @Version 1.0
*/
@Data
@AllArgsConstructor
@NoArgsConstructor
public class CanalEntity implements Serializable {
private List<FileTableData> data;
private String database;
private long es;
private int id;
private boolean isDdl;
private MysqlType mysqlType;
private Object old;
private List<String> pkNames;
private String sql;
private SqlType sqlType;
private String table;
private long ts;
private String type;
// 生成必要的 getter 和 setter 方法
@Data
public static class FileTableData {
private String id;
private String fileName;
private String fileType;
private String fileSize;
private String filePath;
private String isDeleted;
}
@Data
public static class MysqlType {
private String id;
private String fileName;
private String fileType;
private String fileSize;
private String filePath;
private String isDeleted;
}
@Data
public static class SqlType {
private int id;
private int fileName;
private int fileType;
private int fileSize;
private int filePath;
private int isDeleted;
}
}
工具类:
import com.example.springbootrabbitmq.entity.CanalEntity;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.springframework.util.Assert;
import java.lang.reflect.Field;
import java.lang.reflect.Type;
import java.nio.charset.StandardCharsets;
import java.util.*;
/**
* ClassName: ParseEntityUtils
* Package: com.example.springbootrabbitmq.utils
* Description:
* 用于将Map结构的数据转换为实体类
* 主要用于接受RabbitMq的消息
*
* @Author ms
* @Create 2024/11/6 12:57
* @Version 1.0
*/
public class ParseEntityUtils {
private static final List<String> OBJECT_FIELDS = Arrays.asList("data", "mysqlType", "pkNames", "sqlType");
/**
* 将byte结构的数据转换为实体类
*
* @param messageBody 字节数据
* @param clazz 实体的Class对象
* @param <T>
* @return
*/
public static <T> T parseEntity(byte[] messageBody, Class<T> clazz) {
// 1.将字节数组转换为字符串
String jsonString = new String(messageBody, StandardCharsets.UTF_8);
// 2.使用JackSon库将String转换为Map结构
ObjectMapper mapper = new ObjectMapper();
// 3.将String转换为Map结构
// todo: ObjectMapper其实提供了将String转换为实体类的方式,这里无法直接使用,是应为RabbitMQ返回的数据读取完成后是多层嵌套的对象,
// 所以我封装了内部类,而ObjectMapper是无法直接映射内部类的
T t = null;
Map<String, Object> map;
try {
map = mapper.readValue(jsonString, Map.class);
t = clazz.getDeclaredConstructor().newInstance();
Class<?>[] innerClasss = clazz.getDeclaredClasses();
// 获得当前类的所有字段 去除 data、MySqlType、pkNames字段,其他的字段类型都是基本数据类型或String类型,直接赋值
Field[] fields = clazz.getDeclaredFields();
for (Field field : fields) {
// 获得字段的名称
String fieldName = field.getName();
if (!OBJECT_FIELDS.contains(fieldName)) {
// 直接赋值
field.setAccessible(true);
field.set(t, map.get(fieldName));
}
// 对于其他类型,比如List、Map类型,额外处理
if (List.class.isAssignableFrom(field.getType())) {
// 获取List结构中的数据类型
Class<?> elementType = findListElementType(field);
Assert.notNull(elementType, "List结构中的数据类型不能为空");
// 获取clazz的内部类
for (Class<?> innerClass : innerClasss) {
if (elementType.getSimpleName().equals(innerClass.getSimpleName())) {
// 对innerClass对象赋值
Object innerObject = innerClass.getDeclaredConstructor().newInstance();
ArrayList<Map<String, Object>> innerValue = (ArrayList<Map<String, Object>>) map.get(replaceFieldToLower(fieldName));
ArrayList<Object> list = new ArrayList<>();
for (Map<String, Object> objectMap : innerValue) {
HashMap<String, Object> hashMap = new HashMap<>();
for (Map.Entry<String, Object> entry : objectMap.entrySet()) {
hashMap.put(replaceFieldToUpper(entry.getKey()), entry.getValue());
}
String str = mapper.writeValueAsString(hashMap);
Object object = mapper.readValue(str, innerClass);
list.add(object);
}
field.setAccessible(true);
field.set(t, list);
}
}
} else if (Map.class.isAssignableFrom(field.getType())) {
// // 如果是字段类型为Map
// field.setAccessible(true);
// field.set(t, map.get(fieldName));
} else if (CanalEntity.MysqlType.class.isAssignableFrom(field.getType()) || CanalEntity.SqlType.class.isAssignableFrom(field.getType())) {
// 最后表示字段类型为Object类型
Class<?> aClass = field.getType();
// 获得所以字段
Field[] innerFields = aClass.getDeclaredFields();
// 获得字段的值
HashMap<String, Object> innerFieldValue = (HashMap<String, Object>) map.get(field.getName());
HashMap<String, Object> hashMap = new HashMap<>();
for (Map.Entry<String, Object> entry : innerFieldValue.entrySet()) {
if (entry.getKey().contains("_")) {
hashMap.put(replaceFieldToUpper(entry.getKey()), entry.getValue());
} else {
hashMap.put(entry.getKey(), entry.getValue());
}
}
String string = mapper.writeValueAsString(hashMap);
Object value = mapper.readValue(string, aClass);
field.setAccessible(true);
field.set(t, value);
}
}
} catch (Exception e) {
e.printStackTrace();
throw new RuntimeException(e);
}
return t;
}
/**
* 将驼峰命名法替换为 _ 例如:userName -> user_name
*
* @param fieldName
* @return
*/
public static String replaceFieldToLower(String fieldName) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < fieldName.length(); i++) {
char c = fieldName.charAt(i);
if (Character.isUpperCase(c) && i > 0) {
result.append("_");
}
result.append(Character.toLowerCase(c));
}
return result.toString();
}
/**
* 将 _ 替换为驼峰命名法 例如:user_name -> userName
*
* @param fieldName
* @return
*/
public static String replaceFieldToUpper(String fieldName) {
StringBuilder result = new StringBuilder();
boolean nextToUpperCase = false;
for (char c : fieldName.toCharArray()) {
if (c == '_') {
nextToUpperCase = true;
} else {
if (nextToUpperCase) {
result.append(Character.toUpperCase(c));
nextToUpperCase = false;
} else {
result.append(c);
}
}
}
return result.toString();
}
/**
* 获取List结构中的数据类型
*
* @param field
* @return
*/
public static Class<?> findListElementType(Field field) {
Type genericType = field.getGenericType();
if (genericType instanceof java.lang.reflect.ParameterizedType) {
java.lang.reflect.ParameterizedType parameterizedType = (java.lang.reflect.ParameterizedType) genericType;
Type[] actualTypeArguments = parameterizedType.getActualTypeArguments();
if (actualTypeArguments.length > 0) {
if (actualTypeArguments[0] instanceof Class<?>) {
return (Class<?>) actualTypeArguments[0];
}
}
}
return null;
}
}
最后调用工具类解析JSON即可,完整方法:
@Test
public void contextLoads() {
byte[] byteData = new byte[]{123,34,100,97,116,97,34,58,91,123,34,105,100,34,58,34,49,51,34,44,34,102,105,108,101,95,110,97,109,101,34,58,34,100,97,106,119,100,106,97,119,34,44,34,102,105,108,101,95,116,121,112,101,34,58,34,100,111,99,120,34,44,34,102,105,108,101,95,115,105,122,101,34,58,34,49,53,57,48,34,44,34,102,105,108,101,95,112,97,116,104,34,58,34,49,34,44,34,105,115,95,100,101,108,101,116,101,100,34,58,34,48,34,125,93,44,34,100,97,116,97,98,97,115,101,34,58,34,100,98,55,34,44,34,101,115,34,58,49,55,51,48,56,56,54,57,48,49,48,48,48,44,34,105,100,34,58,57,44,34,105,115,68,100,108,34,58,102,97,108,115,101,44,34,109,121,115,113,108,84,121,112,101,34,58,123,34,105,100,34,58,34,105,110,116,34,44,34,102,105,108,101,95,110,97,109,101,34,58,34,118,97,114,99,104,97,114,40,50,53,53,41,34,44,34,102,105,108,101,95,116,121,112,101,34,58,34,118,97,114,99,104,97,114,40,53,48,41,34,44,34,102,105,108,101,95,115,105,122,101,34,58,34,98,105,103,105,110,116,34,44,34,102,105,108,101,95,112,97,116,104,34,58,34,116,101,120,116,34,44,34,105,115,95,100,101,108,101,116,101,100,34,58,34,116,105,110,121,105,110,116,40,49,41,34,125,44,34,111,108,100,34,58,110,117,108,108,44,34,112,107,78,97,109,101,115,34,58,91,34,105,100,34,93,44,34,115,113,108,34,58,34,34,44,34,115,113,108,84,121,112,101,34,58,123,34,105,100,34,58,52,44,34,102,105,108,101,95,110,97,109,101,34,58,49,50,44,34,102,105,108,101,95,116,121,112,101,34,58,49,50,44,34,102,105,108,101,95,115,105,122,101,34,58,45,53,44,34,102,105,108,101,95,112,97,116,104,34,58,50,48,48,53,44,34,105,115,95,100,101,108,101,116,101,100,34,58,45,54,125,44,34,116,97,98,108,101,34,58,34,102,105,108,101,95,116,97,98,108,101,34,44,34,116,115,34,58,49,55,51,48,56,56,54,57,48,49,57,52,51,44,34,116,121,112,101,34,58,34,73,78,83,69,82,84,34,125};
String jsonString = new String(byteData, StandardCharsets.UTF_8);
ObjectMapper mapper = new ObjectMapper();
JsonNode map = null;
try {
map = mapper.readTree(jsonString);
} catch (IOException e) {
throw new RuntimeException(e);
}
// 打印解析后的 JSON 对象
System.out.println("解析后的 JSON 对象:" + map.toString());
CanalEntity canalEntity = ParseEntityUtils.parseEntity(byteData, CanalEntity.class);
System.out.println(canalEntity.toString());
}
最后的输出结果:
到这里其实本文已经要结束,至于后续的同步问题,其实也很简单了,从CanalEntity中取出
FileTableData
对象的信息,然后调用ElasticSearch
的API执行插入操作即可,至于修改、删除其实也是同样的思路。
总结
Canal其实可以直接和ElasticSearch连接,无需走Java程序,就可以实现同步,本文为什么使用RabbitMQ的形式,是因为数据库的数据和ES并不能直接对应。以文件表为例,ES中会存文件的内容,已提供检索功能,而数据库常常不会存文件内容,都是存文件的访问地址,所以这就是区别,ES和数据库数据不对应,就不能直接将它俩同步了。