持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第5天,点击查看活动详情
背景
业务使用场景中,存在ES 的字段类型是 nested 类型 ,虽然这个效率不高。不过业务中还是比较常见的。想使用 canal 进行数据同步到 ES 上。但貌似官方还不支持。
后面看了断点调试了下,找到方法。
1. 前置准备
1.1 es 索引
首先 创建是canal_test 索引 ,其中 addresses 是 nested 类型
curl -XPUT "http://localhost:9200/canal_test" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"dynamic": false,
"properties": {
"addresses": {
"type": "nested",
"properties": {
"address": {
"analyzer": "ik_max_word",
"type": "text"
},
"houseId": {
"type": "long"
},
"zxbs": {
"type": "long"
},
"jwhdm": {
"type": "keyword"
},
"id": {
"type": "long"
}
}
},
"death": {
"type": "boolean"
},
"gender": {
"type": "keyword"
},
"nation": {
"type": "keyword"
},
"zxbs": {
"type": "keyword"
},
"name": {
"type": "keyword",
"fields": {
"fulltext": {
"type": "text"
}
}
},
"residentId": {
"type": "long"
},
"type": {
"type": "long"
}
}
}
}'
1.2表结构
创建 两张表 t_address,t_rk
DROP TABLE IF EXISTS `t_address`;
CREATE TABLE `t_address` (
`address` varchar(255) DEFAULT NULL,
`houseId` int(11) DEFAULT NULL,
`zxbs` int(11) DEFAULT NULL,
`jwhdm` varchar(255) DEFAULT NULL,
`id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
-- ----------------------------
-- Table structure for t_rk
-- ----------------------------
DROP TABLE IF EXISTS `t_rk`;
CREATE TABLE `t_rk` (
`residentId` int(11) NOT NULL AUTO_INCREMENT,
`type` int(255) DEFAULT NULL,
`zxbs` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`nation` varchar(255) DEFAULT NULL,
`gender` varchar(255) DEFAULT NULL,
`death` char(1) DEFAULT NULL,
PRIMARY KEY (`residentId`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4;
1.3测试数据
INSERT INTO `demo`.`t_rk`(`residentId`, `type`, `zxbs`, `name`, `nation`, `gender`, `death`) VALUES (3, 1, '1', 'huang123', '113', '1', '1');
INSERT INTO `demo`.`t_address`(`address`, `houseId`, `zxbs`, `jwhdm`, `id`) VALUES ('厦门', 12, 23, '23', 3);
INSERT INTO `demo`.`t_address`(`address`, `houseId`, `zxbs`, `jwhdm`, `id`) VALUES ('漳州', 233, 23, '23', 3);
INSERT INTO `demo`.`t_address`(`address`, `houseId`, `zxbs`, `jwhdm`, `id`) VALUES ('泉州', 233, 23, '23', 3);
2. canal 配置
从 mysql 同步到 es,每个索引我们都知道要为其配置一份配置文件。
2.1 配置同步 yml 文件
创建canal_test.yml 内容如下
2.2 数组字段的配置方式:(这个配置不满足nested)
dataSourceKey: defaultDS
destination: example
groupId: g1
esMapping:
_index: canal_test
_type: _doc
_id: _id
upsert: true
# pk: id
sql: "SELECT
t1.residentId AS _id,
t1.`name`,
t1.death,
t1.gender,
t1.nation,
t1.type,
t1.zxbs,
c.address
FROM
t_rk t1
LEFT JOIN (
SELECT
id,
GROUP_CONCAT(JSON_OBJECT('address',address,'houseId',houseId)) AS address
FROM
t_address
GROUP BY
id
) c ON c.id = t1.residentId"
objFields:
address: array:;
# etlCondition: "where a.c_time>={}"
commitBatch: 3000
objFields:
address: array:;
这个方案中,默认实现,每个json都被当成一个字符串
同步出来的结果如下:(错误的结果,对于nested不适用)
2.3 nested 配置的正确姿势
dataSourceKey: defaultDS
destination: example
groupId: g1
esMapping:
_index: canal_test
_type: _doc
_id: _id
upsert: true
# pk: id
sql: "SELECT
t1.residentId AS _id,
t1.`name`,
t1.death,
t1.gender,
t1.nation,
t1.type,
t1.zxbs,
CONCAT('[',c.address,']') AS addresses
FROM
t_rk t1
LEFT JOIN (
SELECT
id,
GROUP_CONCAT(JSON_OBJECT('address',address,'houseId',houseId)) AS address
FROM
t_address
GROUP BY
id
) c ON c.id = t1.residentId"
objFields:
addresses: object
# etlCondition: "where a.c_time>={}"
commitBatch: 3000
重点关注
- sql写法
SELECT
t1.residentId AS _id,
t1.`name`,
t1.death,
t1.gender,
t1.nation,
t1.type,
t1.zxbs,
CONCAT('[',c.address,']') AS addresses
FROM
t_rk t1
LEFT JOIN (
SELECT
id,
GROUP_CONCAT(JSON_OBJECT('address',address,'houseId',houseId)) AS address
FROM
t_address
GROUP BY
id
) c ON c.id = t1.residentId"
- 配置中的关键
objFields:
addresses: object
- 执行测试数据
- canal 执行日志
- 同步结果
- nested 搜一下啊
GET canal_test/_search
{
"query":{
"bool": {
"must": [
{
"nested": {
"path": "addresses",
"query": {
"bool": {
"must": [
{
"match_phrase": {
"addresses.address": "漳州"
}
}
]
}
}
}
}
]
}
}
}