Canal实时同步数据从Mysql到ES中Nested类型的实践

511 阅读2分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第5天,点击查看活动详情

背景

业务使用场景中,存在ES 的字段类型是 nested 类型 ,虽然这个效率不高。不过业务中还是比较常见的。想使用 canal 进行数据同步到 ES 上。但貌似官方还不支持。

后面看了断点调试了下,找到方法。

1. 前置准备

1.1 es 索引

首先 创建是canal_test 索引 ,其中 addresses 是 nested 类型

curl -XPUT "http://localhost:9200/canal_test" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "addresses": {
        "type": "nested",
        "properties": {
          "address": {
            "analyzer": "ik_max_word",
            "type": "text"
          },
          "houseId": {
            "type": "long"
          },
          "zxbs": {
            "type": "long"
          },
          "jwhdm": {
            "type": "keyword"
          },
          "id": {
            "type": "long"
          }
        }
      },
      "death": {
        "type": "boolean"
      },
      "gender": {
        "type": "keyword"
      },
      "nation": {
        "type": "keyword"
      },
      "zxbs": {
        "type": "keyword"
      },
      "name": {
        "type": "keyword",
        "fields": {
          "fulltext": {
            "type": "text"
          }
        }
      },
      "residentId": {
        "type": "long"
      },
      "type": {
        "type": "long"
      }
    }
  }
}'

1.2表结构

创建 两张表 t_address,t_rk

DROP TABLE IF EXISTS `t_address`;
CREATE TABLE `t_address` (
  `address` varchar(255) DEFAULT NULL,
  `houseId` int(11) DEFAULT NULL,
  `zxbs` int(11) DEFAULT NULL,
  `jwhdm` varchar(255) DEFAULT NULL,
  `id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

-- ----------------------------
-- Table structure for t_rk
-- ----------------------------
DROP TABLE IF EXISTS `t_rk`;
CREATE TABLE `t_rk` (
  `residentId` int(11) NOT NULL AUTO_INCREMENT,
  `type` int(255) DEFAULT NULL,
  `zxbs` varchar(255) DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL,
  `nation` varchar(255) DEFAULT NULL,
  `gender` varchar(255) DEFAULT NULL,
  `death` char(1) DEFAULT NULL,
  PRIMARY KEY (`residentId`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4;

1.3测试数据

INSERT INTO `demo`.`t_rk`(`residentId`, `type`, `zxbs`, `name`, `nation`, `gender`, `death`) VALUES (3, 1, '1', 'huang123', '113', '1', '1');
INSERT INTO `demo`.`t_address`(`address`, `houseId`, `zxbs`, `jwhdm`, `id`) VALUES ('厦门', 12, 23, '23', 3);
INSERT INTO `demo`.`t_address`(`address`, `houseId`, `zxbs`, `jwhdm`, `id`) VALUES ('漳州', 233, 23, '23', 3);
INSERT INTO `demo`.`t_address`(`address`, `houseId`, `zxbs`, `jwhdm`, `id`) VALUES ('泉州', 233, 23, '23', 3);

2. canal 配置

从 mysql 同步到 es,每个索引我们都知道要为其配置一份配置文件。

2.1 配置同步 yml 文件

创建canal_test.yml 内容如下

2.2 数组字段的配置方式:(这个配置不满足nested)

dataSourceKey: defaultDS
destination: example
groupId: g1
esMapping:
  _index: canal_test
  _type: _doc
  _id: _id
  upsert: true
#  pk: id
  sql: "SELECT
        	t1.residentId AS _id,
        	t1.`name`,
        	t1.death,
        	t1.gender,
        	t1.nation,
        	t1.type,
        	t1.zxbs,
        	c.address
        FROM
        	t_rk t1
        	LEFT JOIN (
        	SELECT
        		id,
        		GROUP_CONCAT(JSON_OBJECT('address',address,'houseId',houseId)) AS address
        	FROM
        		t_address
        	GROUP BY
        		id
        	) c ON c.id = t1.residentId"
  objFields:
    address: array:;
#  etlCondition: "where a.c_time>={}"
  commitBatch: 3000

  objFields:

   address: array:;

这个方案中,默认实现,每个json都被当成一个字符串

同步出来的结果如下:(错误的结果,对于nested不适用)

2.3 nested 配置的正确姿势

dataSourceKey: defaultDS
destination: example
groupId: g1
esMapping:
  _index: canal_test
  _type: _doc
  _id: _id
  upsert: true
#  pk: id
  sql: "SELECT
        t1.residentId AS _id,
        t1.`name`,
        t1.death,
        t1.gender,
        t1.nation,	
        t1.type,
        t1.zxbs,
        CONCAT('[',c.address,']') AS  addresses
        FROM
        t_rk t1
        LEFT JOIN (
        SELECT
        id,
        GROUP_CONCAT(JSON_OBJECT('address',address,'houseId',houseId)) AS address
        FROM
        t_address
        GROUP BY
        id
      ) c ON c.id = t1.residentId"
  objFields:
    addresses: object
#  etlCondition: "where a.c_time>={}"
  commitBatch: 3000

重点关注


  1. sql写法
SELECT
        t1.residentId AS _id,
        t1.`name`,
        t1.death,
        t1.gender,
        t1.nation,	
        t1.type,
        t1.zxbs,
        CONCAT('[',c.address,']') AS  addresses
        FROM
        t_rk t1
        LEFT JOIN (
        SELECT
        id,
        GROUP_CONCAT(JSON_OBJECT('address',address,'houseId',houseId)) AS address
        FROM
        t_address
        GROUP BY
        id
      ) c ON c.id = t1.residentId"
  1. 配置中的关键
  objFields:
    addresses: object
  1. 执行测试数据
  2. canal 执行日志

  1. 同步结果

  1. nested 搜一下啊
GET canal_test/_search
{
  "query":{
    "bool": {
      "must": [
        {
          "nested": {
            "path": "addresses",
            "query": {
              "bool": {
                "must": [
                  {
                    "match_phrase": {
                      "addresses.address": "漳州"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}