Canal + RabbitMQ实现MySQL和ElasticSearch数据一致

174 阅读17分钟

序言

这篇文件是对我之前写的一篇文件的补充,地址在这# SpringBoot + Minio + ElasticSearch实现文件内容检索,在这里里面没有考虑到MySQL和ElasticSearch的数据同步性,所以今天补充一下。

1. Canal介绍

在了解Canal之前,大家需要知道MySQL主从复制的原理,我简单说明一下

  • 主库上的每一个事务操作(比如插入、删除、更新等)都会被记录到二进制日志(binary log)中,binary log是一种二进制格式的日志文件,它包含了所有会修改数据的事件。可以通过 SHOW BINLOG EVENTS进行查看
  • 从库就会去读取主库的binary log的日志,将其拷贝到自己的中继日志(relay log)
  • 然后从库执行 relay log中的事件,将数据变更反映为自己的数据。

1.1 Canal原理

  • Canal就是模拟MySQL主从的交互协议,伪装自己为Mysql SLAVE(从库),向MySQL主库发送dump协议
  • MySQL主库收到dump请求,开始推送自己binary log给SLAVE(即Canal)
  • Canal解析binary log对象(原始为byte流)

1.2 Canal下载

Canal是阿里巴巴开源的一个基于 MySQL 数据库增量日志解析,提供增量数据订阅和消费的组件,下载地址为github.com/alibaba/can… 建议大家下载1.1.7的版本

image.png 下载完成之后,解压即可

1.3 Canal配置

在开始进行Canal配置之前,我们需要确保MySQL的binary log日志是否开启,使用如下命令

SHOW VARIABLES LIKE 'log_bin'; 
SHOW VARIABLES LIKE 'binlog_format';
  • SHOW VARIABLES LIKE 'log_bin';用于查看是否开启了 Binlog。如果返回值的Value列显示为ON,则表示 Binlog 已开启;如果是OFF,则表示未开启。
  • SHOW VARIABLES LIKE 'binlog_format';用于查看当前 Binlog 的格式,返回值的Value列可能显示为STATEMENTROW或者MIXED。我们需要将格式设置为ROW

如果binary log没有开启的话,我们需要找到MySQL的配置文件,在Windows系统下为my.ini,Linux系统下为my.cnf,其路径通常在 C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Recent下 在配置文件末尾加上以下内容:

log-bin=mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复

配置完成后,我们打开MySQL客户端,创建一个授权账号,供Canal连接使用,如果有账号则直接grant(不要使用root账号),完成后一定要重启数据库

CREATE USER canal IDENTIFIED BY 'canal';  
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
-- 刷新权限
FLUSH PRIVILEGES;

我们来到\canal-deployer-1.1.7\conf\example\instance.properties 需要修改两个地方

# position info  需要改成自己的数据库信息
# 修改为自己的数据库地址和端口
canal.instance.master.address=127.0.0.1:3306
canal.instance.master.journal.name=DESKTOP-FL9JG86-bin.000091
canal.instance.master.position=157
canal.instance.master.timestamp=
canal.instance.master.gtid=


# username/password   需要改成自己的数据库信息  不能使用root账户
canal.instance.dbUsername=canal   # 修改为刚刚授权的账户
canal.instance.dbPassword=canal   # 密码
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

对于canal.instance.master.journal.namecanal.instance.master.position两个参数的值,大家在MySQL客户端使用SHOW MASTER STATUS命令可以查看,比如:

image.pngcanal.instance.master.journal.name值为DESKTOP-FL9JG86-bin.000091,canal.instance.master.position值为2671,修改完毕后保存。然后运行 canal-deployer-1.1.7\bin\startup.bat程序, 查看logs/canal/canal.log日志,结果如下,则启动完成

2013-02-05 22:45:27.967 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## start the canal server.
2013-02-05 22:45:28.113 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[10.1.29.120:11111]
2013-02-05 22:45:28.210 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## the canal server is running now ......

如果大家报错的话,是需要修改startup.bat文件的,因为里面的参数是使用Java的较低版本的参数,修改后内容如下,直接复制粘贴

@echo off
@if not "%ECHO%" == ""  echo %ECHO%
@if "%OS%" == "Windows_NT"  setlocal

set ENV_PATH=.\
if "%OS%" == "Windows_NT" set ENV_PATH=%~dp0%

set conf_dir=%ENV_PATH%\..\conf
set canal_conf=%conf_dir%\canal.properties
@rem set canal_conf=%conf_dir%\canal_local.properties
if "%1" == "local" set canal_conf=%conf_dir%\canal_local.properties
set logback_configurationFile=%conf_dir%\logback.xml

set CLASSPATH=%conf_dir%
set CLASSPATH=%conf_dir%\..\lib\*;%CLASSPATH%

set JAVA_MEM_OPTS= -Xms128m -Xmx128m -XX:MaxMetaspaceSize=128m
set JAVA_OPTS_EXT= -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dapplication.codeset=UTF-8 -Dfile.encoding=UTF-8
set JAVA_DEBUG_OPT= -server -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,address=9099,server=y,suspend=n
set CANAL_OPTS= -DappName=otter-canal -Dlogback.configurationFile="%logback_configurationFile%" -Dcanal.conf="%canal_conf%"

set JAVA_OPTS= %JAVA_MEM_OPTS% %JAVA_OPTS_EXT% %JAVA_DEBUG_OPT% %CANAL_OPTS%

set CMD_STR= java %JAVA_OPTS% -classpath "%CLASSPATH%" java %JAVA_OPTS% -classpath "%CLASSPATH%" com.alibaba.otter.canal.deployer.CanalLauncher
echo start cmd : %CMD_STR%

java %JAVA_OPTS% -classpath "%CLASSPATH%" com.alibaba.otter.canal.deployer.CanalLauncher

然后重新启动,就没有问题了

2.配置RabbitMQ

接下来,我们来到/canal-deployer-1.1.7/conf/canal.properties,也是需要修改两个地方

canal.zkServers =
# flush data to zk
canal.zookeeper.flush.period = 1000
canal.withoutNetty = false
# tcp, kafka, rocketMQ, rabbitMQ, pulsarMQ   可选项: tcp(默认), kafka,RocketMQ,rabbitmq,pulsarmq
 # 将此处修改为 rabbitMQ
canal.serverMode = rabbitMQ 
# flush meta cursor/parse position to file



##################################################
######### 		    RabbitMQ	     #############
##################################################
# 配置RabbitMQ
rabbitmq.host = 127.0.0.1:5672
rabbitmq.virtual.host = /
rabbitmq.exchange = canalFanoutExchange
rabbitmq.username = guest
rabbitmq.password = guest
rabbitmq.deliveryMode =

修改完成后保存,重新运行startup.bat,查看logs/canal/canal.log日志和logs/example/example.log,如果都没有报错就成功了。如果大家出现java.io.IOException: Received error packet: errno = 1236, sqlstate = HY000 errmsg = binlog truncatedCaused by:com.alibaba.otter.canal.parse.exception.CanalParseException: column size is not match for table错误,直接删除canal/conf/下对应实例的meta.dat文件即可

3.实现同步

前面的配置完成之后,主数据的库操作都会被Canal监听,并把消息发送到RabbitMQ中,那我们直接获取这个消息,就可以知道MySQL做了哪些事情(新增、修改、删除),那么es也需要做出相应的操作

3.1 创建交换机

@Bean
public FanoutExchange canalFanoutExchange() {
    return new FanoutExchange("canalFanoutExchange", true, false);
}

@Bean
public Queue canalQueue() {
    return new Queue("canalQueue", true);
}


@Bean
public Binding canalBinding() {
    LOGGER.info("绑定canal交换机");
    return BindingBuilder.bind(canalQueue()).to(canalFanoutExchange());
}

这样我们监听canalQueue队列,即可获取到Canal推送到RabbitMQ的信息,代码如下:

 @RabbitListener(queuesToDeclare = @Queue("canalQueue"))
 public void receive3(@Payload String body, @Headers Map<String,Object> headers) {
//        LOGGER.info("canalQueue:{}", canalEntity);
        LOGGER.info("canalQueueBody:{}", body);
        LOGGER.info("canalQueueHeaders:{}", headers);
    }

然后在数据库新增一条数据

image.png 接口打印信息如下:

image.png 因为Canal发送的数据是byte类型的数组,所以这里接收到的数据也是byte类型的数组,使用代码对其进行解析

byte[] byteData = new byte[]{123,34,100,97,116,97,34,58,91,123,34,105,100,34,58,34,49,51,34,44,34,102,105,108,101,95,110,97,109,101,34,58,34,100,97,106,119,100,106,97,119,34,44,34,102,105,108,101,95,116,121,112,101,34,58,34,100,111,99,120,34,44,34,102,105,108,101,95,115,105,122,101,34,58,34,49,53,57,48,34,44,34,102,105,108,101,95,112,97,116,104,34,58,34,49,34,44,34,105,115,95,100,101,108,101,116,101,100,34,58,34,48,34,125,93,44,34,100,97,116,97,98,97,115,101,34,58,34,100,98,55,34,44,34,101,115,34,58,49,55,51,48,56,56,54,57,48,49,48,48,48,44,34,105,100,34,58,57,44,34,105,115,68,100,108,34,58,102,97,108,115,101,44,34,109,121,115,113,108,84,121,112,101,34,58,123,34,105,100,34,58,34,105,110,116,34,44,34,102,105,108,101,95,110,97,109,101,34,58,34,118,97,114,99,104,97,114,40,50,53,53,41,34,44,34,102,105,108,101,95,116,121,112,101,34,58,34,118,97,114,99,104,97,114,40,53,48,41,34,44,34,102,105,108,101,95,115,105,122,101,34,58,34,98,105,103,105,110,116,34,44,34,102,105,108,101,95,112,97,116,104,34,58,34,116,101,120,116,34,44,34,105,115,95,100,101,108,101,116,101,100,34,58,34,116,105,110,121,105,110,116,40,49,41,34,125,44,34,111,108,100,34,58,110,117,108,108,44,34,112,107,78,97,109,101,115,34,58,91,34,105,100,34,93,44,34,115,113,108,34,58,34,34,44,34,115,113,108,84,121,112,101,34,58,123,34,105,100,34,58,52,44,34,102,105,108,101,95,110,97,109,101,34,58,49,50,44,34,102,105,108,101,95,116,121,112,101,34,58,49,50,44,34,102,105,108,101,95,115,105,122,101,34,58,45,53,44,34,102,105,108,101,95,112,97,116,104,34,58,50,48,48,53,44,34,105,115,95,100,101,108,101,116,101,100,34,58,45,54,125,44,34,116,97,98,108,101,34,58,34,102,105,108,101,95,116,97,98,108,101,34,44,34,116,115,34,58,49,55,51,48,56,56,54,57,48,49,57,52,51,44,34,116,121,112,101,34,58,34,73,78,83,69,82,84,34,125};
String jsonString = new String(byteData, StandardCharsets.UTF_8);
ObjectMapper mapper = new ObjectMapper();
JsonNode map = null;
try {
    map = mapper.readTree(jsonString);
} catch (IOException e) {
    throw new RuntimeException(e);
}

// 打印解析后的 JSON 对象
System.out.println("解析后的 JSON 对象:" + map.toString());

将其解析后就会看到如下信息 image.png 可以看到data中的数据就是我们MySQL中新增的数据,针对这个数据结构,我建立了一个实体类,用来映射这个JSON对象

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

import java.io.Serializable;
import java.util.List;

/**
 * ClassName: CanalEntity
 * Package: com.example.springbootrabbitmq.entity
 * Description:
 * 接受canal信息的实体类
 *
 * @Author ms
 * @Create 2024/11/6 12:07
 * @Version 1.0
 */
@Data
@AllArgsConstructor
@NoArgsConstructor
public class CanalEntity implements Serializable {

    private List<FileTableData> data;
    private String database;
    private long es;
    private int id;
    private boolean isDdl;
    private MysqlType mysqlType;
    private Object old;
    private List<String> pkNames;
    private String sql;
    private SqlType sqlType;
    private String table;
    private long ts;
    private String type;

    // 生成必要的 getter 和 setter 方法

    @Data
    public static class FileTableData {
        private String id;
        private String fileName;
        private String fileType;
        private String fileSize;
        private String filePath;
        private String isDeleted;

    }

    @Data
    public static class MysqlType {
        private String id;
        private String fileName;
        private String fileType;
        private String fileSize;
        private String filePath;
        private String isDeleted;

    }

    @Data
    public static class SqlType {
        private int id;
        private int fileName;
        private int fileType;
        private int fileSize;
        private int filePath;
        private int isDeleted;
    }
}

工具类:

import com.example.springbootrabbitmq.entity.CanalEntity;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.springframework.util.Assert;

import java.lang.reflect.Field;
import java.lang.reflect.Type;
import java.nio.charset.StandardCharsets;
import java.util.*;

/**
 * ClassName: ParseEntityUtils
 * Package: com.example.springbootrabbitmq.utils
 * Description:
 * 用于将Map结构的数据转换为实体类
 * 主要用于接受RabbitMq的消息
 *
 * @Author ms
 * @Create 2024/11/6 12:57
 * @Version 1.0
 */
public class ParseEntityUtils {

    private static final List<String> OBJECT_FIELDS = Arrays.asList("data", "mysqlType", "pkNames", "sqlType");


    /**
     * 将byte结构的数据转换为实体类
     *
     * @param messageBody 字节数据
     * @param clazz       实体的Class对象
     * @param <T>
     * @return
     */
    public static <T> T parseEntity(byte[] messageBody, Class<T> clazz) {
        // 1.将字节数组转换为字符串
        String jsonString = new String(messageBody, StandardCharsets.UTF_8);
        // 2.使用JackSon库将String转换为Map结构
        ObjectMapper mapper = new ObjectMapper();
        // 3.将String转换为Map结构
        // todo: ObjectMapper其实提供了将String转换为实体类的方式,这里无法直接使用,是应为RabbitMQ返回的数据读取完成后是多层嵌套的对象,
        //  所以我封装了内部类,而ObjectMapper是无法直接映射内部类的
        T t = null;

        Map<String, Object> map;
        try {
            map = mapper.readValue(jsonString, Map.class);

            t = clazz.getDeclaredConstructor().newInstance();

            Class<?>[] innerClasss = clazz.getDeclaredClasses();

            // 获得当前类的所有字段  去除 data、MySqlType、pkNames字段,其他的字段类型都是基本数据类型或String类型,直接赋值
            Field[] fields = clazz.getDeclaredFields();
            for (Field field : fields) {
                // 获得字段的名称
                String fieldName = field.getName();
                if (!OBJECT_FIELDS.contains(fieldName)) {
                    // 直接赋值
                    field.setAccessible(true);
                    field.set(t, map.get(fieldName));
                }

                // 对于其他类型,比如List、Map类型,额外处理
                if (List.class.isAssignableFrom(field.getType())) {
                    // 获取List结构中的数据类型
                    Class<?> elementType = findListElementType(field);
                    Assert.notNull(elementType, "List结构中的数据类型不能为空");
                    // 获取clazz的内部类

                    for (Class<?> innerClass : innerClasss) {
                        if (elementType.getSimpleName().equals(innerClass.getSimpleName())) {
                            // 对innerClass对象赋值
                            Object innerObject = innerClass.getDeclaredConstructor().newInstance();

                            ArrayList<Map<String, Object>> innerValue = (ArrayList<Map<String, Object>>) map.get(replaceFieldToLower(fieldName));
                            ArrayList<Object> list = new ArrayList<>();
                            for (Map<String, Object> objectMap : innerValue) {
                                HashMap<String, Object> hashMap = new HashMap<>();
                                for (Map.Entry<String, Object> entry : objectMap.entrySet()) {
                                    hashMap.put(replaceFieldToUpper(entry.getKey()), entry.getValue());
                                }
                                String str = mapper.writeValueAsString(hashMap);
                                Object object = mapper.readValue(str, innerClass);
                                list.add(object);
                            }
                            field.setAccessible(true);
                            field.set(t, list);
                        }
                    }
                } else if (Map.class.isAssignableFrom(field.getType())) {
//                    // 如果是字段类型为Map
//                    field.setAccessible(true);
//                    field.set(t, map.get(fieldName));
                } else if (CanalEntity.MysqlType.class.isAssignableFrom(field.getType()) || CanalEntity.SqlType.class.isAssignableFrom(field.getType())) {
                    // 最后表示字段类型为Object类型
                    Class<?> aClass = field.getType();
                    // 获得所以字段
                    Field[] innerFields = aClass.getDeclaredFields();
                    // 获得字段的值
                    HashMap<String, Object> innerFieldValue = (HashMap<String, Object>) map.get(field.getName());
                    HashMap<String, Object> hashMap = new HashMap<>();

                    for (Map.Entry<String, Object> entry : innerFieldValue.entrySet()) {
                        if (entry.getKey().contains("_")) {
                            hashMap.put(replaceFieldToUpper(entry.getKey()), entry.getValue());
                        } else {
                            hashMap.put(entry.getKey(), entry.getValue());
                        }
                    }
                    String string = mapper.writeValueAsString(hashMap);
                    Object value = mapper.readValue(string, aClass);
                    field.setAccessible(true);
                    field.set(t, value);
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
        return t;
    }


    /**
     * 将驼峰命名法替换为 _   例如:userName -> user_name
     *
     * @param fieldName
     * @return
     */
    public static String replaceFieldToLower(String fieldName) {
        StringBuilder result = new StringBuilder();
        for (int i = 0; i < fieldName.length(); i++) {
            char c = fieldName.charAt(i);
            if (Character.isUpperCase(c) && i > 0) {
                result.append("_");
            }
            result.append(Character.toLowerCase(c));
        }
        return result.toString();
    }


    /**
     * 将 _ 替换为驼峰命名法 例如:user_name -> userName
     *
     * @param fieldName
     * @return
     */
    public static String replaceFieldToUpper(String fieldName) {
        StringBuilder result = new StringBuilder();
        boolean nextToUpperCase = false;
        for (char c : fieldName.toCharArray()) {
            if (c == '_') {
                nextToUpperCase = true;
            } else {
                if (nextToUpperCase) {
                    result.append(Character.toUpperCase(c));
                    nextToUpperCase = false;
                } else {
                    result.append(c);
                }
            }
        }
        return result.toString();
    }


    /**
     * 获取List结构中的数据类型
     *
     * @param field
     * @return
     */
    public static Class<?> findListElementType(Field field) {
        Type genericType = field.getGenericType();
        if (genericType instanceof java.lang.reflect.ParameterizedType) {
            java.lang.reflect.ParameterizedType parameterizedType = (java.lang.reflect.ParameterizedType) genericType;
            Type[] actualTypeArguments = parameterizedType.getActualTypeArguments();
            if (actualTypeArguments.length > 0) {
                if (actualTypeArguments[0] instanceof Class<?>) {
                    return (Class<?>) actualTypeArguments[0];
                }
            }
        }
        return null;
    }
}

最后调用工具类解析JSON即可,完整方法:

@Test
public void contextLoads() {
    byte[] byteData = new byte[]{123,34,100,97,116,97,34,58,91,123,34,105,100,34,58,34,49,51,34,44,34,102,105,108,101,95,110,97,109,101,34,58,34,100,97,106,119,100,106,97,119,34,44,34,102,105,108,101,95,116,121,112,101,34,58,34,100,111,99,120,34,44,34,102,105,108,101,95,115,105,122,101,34,58,34,49,53,57,48,34,44,34,102,105,108,101,95,112,97,116,104,34,58,34,49,34,44,34,105,115,95,100,101,108,101,116,101,100,34,58,34,48,34,125,93,44,34,100,97,116,97,98,97,115,101,34,58,34,100,98,55,34,44,34,101,115,34,58,49,55,51,48,56,56,54,57,48,49,48,48,48,44,34,105,100,34,58,57,44,34,105,115,68,100,108,34,58,102,97,108,115,101,44,34,109,121,115,113,108,84,121,112,101,34,58,123,34,105,100,34,58,34,105,110,116,34,44,34,102,105,108,101,95,110,97,109,101,34,58,34,118,97,114,99,104,97,114,40,50,53,53,41,34,44,34,102,105,108,101,95,116,121,112,101,34,58,34,118,97,114,99,104,97,114,40,53,48,41,34,44,34,102,105,108,101,95,115,105,122,101,34,58,34,98,105,103,105,110,116,34,44,34,102,105,108,101,95,112,97,116,104,34,58,34,116,101,120,116,34,44,34,105,115,95,100,101,108,101,116,101,100,34,58,34,116,105,110,121,105,110,116,40,49,41,34,125,44,34,111,108,100,34,58,110,117,108,108,44,34,112,107,78,97,109,101,115,34,58,91,34,105,100,34,93,44,34,115,113,108,34,58,34,34,44,34,115,113,108,84,121,112,101,34,58,123,34,105,100,34,58,52,44,34,102,105,108,101,95,110,97,109,101,34,58,49,50,44,34,102,105,108,101,95,116,121,112,101,34,58,49,50,44,34,102,105,108,101,95,115,105,122,101,34,58,45,53,44,34,102,105,108,101,95,112,97,116,104,34,58,50,48,48,53,44,34,105,115,95,100,101,108,101,116,101,100,34,58,45,54,125,44,34,116,97,98,108,101,34,58,34,102,105,108,101,95,116,97,98,108,101,34,44,34,116,115,34,58,49,55,51,48,56,56,54,57,48,49,57,52,51,44,34,116,121,112,101,34,58,34,73,78,83,69,82,84,34,125};
    String jsonString = new String(byteData, StandardCharsets.UTF_8);
    ObjectMapper mapper = new ObjectMapper();
    JsonNode map = null;
    try {
        map = mapper.readTree(jsonString);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }

    // 打印解析后的 JSON 对象
    System.out.println("解析后的 JSON 对象:" + map.toString());


    CanalEntity canalEntity = ParseEntityUtils.parseEntity(byteData, CanalEntity.class);
    System.out.println(canalEntity.toString());

}

最后的输出结果:

image.png 到这里其实本文已经要结束,至于后续的同步问题,其实也很简单了,从CanalEntity中取出FileTableData对象的信息,然后调用ElasticSearch的API执行插入操作即可,至于修改、删除其实也是同样的思路。

总结

Canal其实可以直接和ElasticSearch连接,无需走Java程序,就可以实现同步,本文为什么使用RabbitMQ的形式,是因为数据库的数据和ES并不能直接对应。以文件表为例,ES中会存文件的内容,已提供检索功能,而数据库常常不会存文件内容,都是存文件的访问地址,所以这就是区别,ES和数据库数据不对应,就不能直接将它俩同步了。