SqlServer主从搭建并将从库数据实时同步到Clickhouse

585 阅读7分钟

【前置条件】

  1. 一台Linux电脑,安装好docker,clickhouse。
IP端口说明
172.16.10.161433主库
172.16.10.161434从库库

【知识要求】

  1. 了解SqlServer。
  2. 了解Debezium(debezium.io/)。
  3. 了解Docker。
  4. 了解Vector(vector.dev/)。
  5. 了解Clickhouse(clickhouse.com/)。

架构图

未命名文件(129).png

Sqlserver主从搭建

准备数据库

启动两台Sqlserver数据库,一个做主库,一个做从库,参考以下docker-compose.yml,这里使用的是Sqlserver2019,目前测试的最低版本是Sqlserver2014企业版,理论上支持CDC的Sqlserver版本都行。

version: '2.1'
services:
   sqlserver-master:
     image: mcr.microsoft.com/mssql/server:2019-latest
     container_name: sqlserver-master
     ports:
       - "1433:1433"
     environment:
       - "MSSQL_AGENT_ENABLED=true"
       - "MSSQL_PID=Standard"
       - "ACCEPT_EULA=Y"
       - "SA_PASSWORD=P@ssw0rd01"
     volumes:
       - /dataroot/tools/sqlserver/data-master:/var/opt/mssql
       - /dataroot/tools/sqlserver/ReplData:/var/opt/mssql/ReplData

   sqlserver-slave:
     image: mcr.microsoft.com/mssql/server:2019-latest
     container_name: sqlserver-slave
     ports:
       - "1434:1433"
     environment:
       - "MSSQL_AGENT_ENABLED=true"
       - "MSSQL_PID=Standard"
       - "ACCEPT_EULA=Y"
       - "SA_PASSWORD=P@ssw0rd02"
     volumes:
       - /dataroot/tools/sqlserver/data-slave:/var/opt/mssql
       - /dataroot/tools/sqlserver/ReplData:/var/opt/mssql/ReplData

  • 运行命令:docker-compose up -d,启动数据库。
  • 运行命令:docker-compose down, 停止数据库并删除所有数据,这里因为将数据映射到宿主机,所以映射的目录需手动删除。

下载SSMS

到Microsoft官网下载Sqlserver管理工具:learn.microsoft.com/en-us/sql/s…

准备数据

通过SSMS连接主库,172.16.10.16,1433 sa/P@ssw0rd01,运行以下SQL创建测试数据

-- Sqlserver
CREATE DATABASE inventory;
GO
USE inventory;
CREATE TABLE orders (
    id INTEGER IDENTITY(10001,1) NOT NULL PRIMARY KEY,
    order_date DATE NOT NULL,
    purchaser INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    product_id INTEGER NOT NULL
);
INSERT INTO orders(order_date,purchaser,quantity,product_id)
VALUES ('16-JAN-2016', 1001, 1, 102);
INSERT INTO orders(order_date,purchaser,quantity,product_id)
VALUES ('17-JAN-2016', 1002, 2, 105);
INSERT INTO orders(order_date,purchaser,quantity,product_id)
VALUES ('19-FEB-2016', 1002, 2, 106);
INSERT INTO orders(order_date,purchaser,quantity,product_id)
VALUES ('21-FEB-2016', 1003, 1, 107);
GO

主库发布数据

参考官方文档创建发布:learn.microsoft.com/zh-cn/sql/r…

  • 创建分发的时候,如果到下面这一步出现的是一个IP地址,分发服务器可能会创建失败

image.png

  • 这时可以直接使用下面的SQL创建分发,此方式跟通过页面操作是一样的,只是通过SQL可以控制分发服务器使用servername而非IP地址。
use master
DECLARE @Server SYSNAME;
SELECT @Server = @@servername;
exec sp_adddistributor @distributor = @Server, @password = N'P@ssw0rd01'
GO
exec sp_adddistributiondb @database = N'distribution', @data_folder = N'/var/opt/mssql/ReplData', @min_distretention = 0, @max_distretention = 72, @history_retention = 48, @security_mode = 1
GO

use [distribution] 
if (not exists (select * from sysobjects where name = 'UIProperties' and type = 'U ')) 
	create table UIProperties(id int) 
if (exists (select * from ::fn_listextendedproperty('SnapshotFolder', 'user', 'dbo', 'table', 'UIProperties', null, null))) 
	EXEC sp_updateextendedproperty N'SnapshotFolder', N'/var/opt/mssql/ReplData', 'user', dbo, 'table', 'UIProperties' 
else 
	EXEC sp_addextendedproperty N'SnapshotFolder', N'/var/opt/mssql/ReplData', 'user', dbo, 'table', 'UIProperties'
GO
DECLARE @Server SYSNAME;
SELECT @Server = @@servername;
exec sp_adddistpublisher @publisher = @Server, @distribution_db = N'distribution', @security_mode = 0, @login = N'sa', @password = N'P@ssw0rd01', @working_directory = N'/var/opt/mssql/ReplData', @trusted = N'false', @thirdparty_flag = 0, @publisher_type = N'MSSQLSERVER'
GO
  • 分发服务器创建成功后,在SSMS中点击复制 > 本地发布, 新建发布:

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

  • 查看发布是否成功,右键点击SSMS主库中复制,选中启动复制监视器,点击代理显示100%表示发布成功,如果数据量比较大,一般这里要等待很长时间。

image.png

image.png

从库订阅数据

  • 通过SSMS连接主库,172.16.10.16,1434 sa/P@ssw0rd02,创建订阅参数官方文档:learn.microsoft.com/zh-cn/sql/r…
  • 右键点击复制 > 本地订阅,新建订阅:
  • 点击查找SQL Server发布服务器,输入上面创建的发布服务器地址:172.16.10.16,1433 sa/P@ssw0rd01 image.png

image.png

image.png

  • 订阅数据库选择新建数据库inventory

image.png

  • 这里使用从库的账号和密码sa/P@ssw0rd02 image.png

image.png

image.png

image.png

image.png

  • 查看订阅是否成功,注意是在【主库】上打开复制监听器查看从库订阅是否成功,正常应该如下图所示:

image.png 在从库上运行SQL验证主从数据是否完全一致:

image.png

Clickhouse安装

安装比较简单,请参考官方文档:clickhouse.com/docs/en/ins…

将SQL Server主库数据同步到Clickhouse

上面我们完成了SQL Server主从库搭建,现在从库的数据跟主库是完全一致的,我们只需要将从库数据同步到Clickhouse即可。

  1. 从库开启CDC,通过SSMS连接从库,运行以下SQL:
USE inventory;
EXEC sys.sp_cdc_enable_db;
EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'orders', @role_name = NULL, @supports_net_changes = 0;
GO
  1. 启动zookeeper:
docker run -it --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 quay.io/debezium/zookeeper:2.4
  1. 启动kafka:
docker run -it --rm --name kafka -p 9092:9092 --link zookeeper:zookeeper quay.io/debezium/kafka:2.4 
  1. 启动debezium:
docker run -it --rm --name connect -p 8083:8083 -e GROUP_ID=ck-gid -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses --link kafka:kafka quay.io/debezium/connect:2.4
  • 创建SQL Server和Kafka连接:

image.png

  • 运行以下curl命令创建连接:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{"name":"inventory-connector","config":{"connector.class":"io.debezium.connector.sqlserver.SqlServerConnector","tasks.max":"1","database.hostname":"172.16.10.16","database.port":"1434","database.user":"sa","database.password":"P@ssw0rd02","database.server.id":"184054","database.names":"inventory","topic.prefix":"oa","database.server.name":"","table.include.list":"dbo.orders,dbo.products","schema.history.internal.kafka.bootstrap.servers":"kafka:9092","schema.history.internal.kafka.topic":"schema-changes.inventory","database.encrypt":false}}'
  1. kafka可视化,访问:http://服务器ID:8080 查看SQL Server数据是否已经同步到Kafka。
docker run -it -p 8080:8080 -e DYNAMIC_CONFIG_ENABLED=true provectuslabs/kafka-ui
  • kafka已经可以查看到orders表中的4条数据 image.png
  1. 安装vector
  • Vector安装比较简单,请参考官方文档:vector.dev/docs/setup/…
  • 参考以下配置,由于kafka中存放的是json数据,Vector在这里的作用就是将json数据转换一下并插入到clickhouse中。
#                                    __   __  __
#                                    \ \ / / / /
#                                     \ V / / /
#                                      \_/  \/
#
#                                    V E C T O R
#                                   Configuration
#
# ------------------------------------------------------------------------------
# Website: https://vector.dev
# Docs: https://vector.dev/docs
# Chat: https://chat.vector.dev
# ------------------------------------------------------------------------------

# Change this to use a non-default directory for Vector data storage:
# data_dir: "/var/lib/vector"

# Random Syslog-formatted logs
sources:
  dummy_logs:
    type: "demo_logs"
    format: "syslog"
    interval: 1
  oa:
    type: kafka
    bootstrap_servers: 172.16.10.16:9092
    group_id: '1'
    auto_offset_reset: 'beginning'
    topics:
      - oa.inventory.dbo.orders

# Parse Syslog logs
# See the Vector Remap Language reference for more info: https://vrl.dev
transforms:
  parse_oa_message:
    type: remap
    inputs:
      - oa
    source: |
      .topic = string!(.topic)
      .data = parse_json!(.message) 
  filter_oa_order:
    type: filter
    inputs:
      - parse_oa_message
    condition: '.topic == "oa.inventory.dbo.orders"'

# Print parsed logs to stdout
sinks:
  ck_oa_orders:
    type: clickhouse
    inputs:
      - parse_oa_orders_payload
    endpoint: http://localhost:8123
    database: oa_sqlserver
    table: orders_changes
    auth:
      user: default
      password: xc8NalcdDXBek
      strategy: basic

# Vector's GraphQL API (disabled by default)
# Uncomment to try it out with the `vector top` command or
# in your browser at http://localhost:8686
api:
  enabled: true
  address: "127.0.0.1:8686"

  1. Clickhouse中创建相关表用于接收和转换数据,此处将数据都放到before和after一个字段中,可以再优化一下,vector那里处理的时候只保留json中的playload下面的数据,这里就可以before和after中的每项数据单独定义为一个字段了。
DROP TABLE IF EXISTS oa_sqlserver.orders;
CREATE TABLE oa_sqlserver.orders
(
`id` UInt64,
`order_date` Date,
`purchaser` UInt32,
`quantity` UInt32,
`product_id` UInt64,
`version` String,
`deleted` UInt8
)
ENGINE = ReplacingMergeTree
PRIMARY KEY id
ORDER BY id;

DROP TABLE IF EXISTS oa_sqlserver.orders_changes;
CREATE TABLE oa_sqlserver.orders_changes
(
`before` Tuple(id Nullable(UInt64), order_date Nullable(UInt32), product_id Nullable(UInt64), purchaser Nullable(UInt32), quantity Nullable(UInt32)),
`after` Tuple(id Nullable(UInt64), order_date Nullable(UInt32), product_id Nullable(UInt64), purchaser Nullable(UInt32), quantity Nullable(UInt32)),
`source` Tuple(change_lsn Nullable(String), commit_lsn Nullable(String), db Nullable(String), connector Nullable(String), event_serial_no Nullable(UInt32), name Nullable(String), `schema` Nullable(String), `sequence` Nullable(String), snapshot Nullable(String), `table` Nullable(String), ts_ms Nullable(UInt64), version Nullable(String)),
`op` LowCardinality(String),
`ts_ms` UInt64
)
ENGINE = MergeTree
ORDER BY tuple();

DROP VIEW IF EXISTS oa_sqlserver.orders_mv;
CREATE MATERIALIZED VIEW oa_sqlserver.orders_mv TO oa_sqlserver.orders
(
`id` Nullable(UInt64),
`order_date` Nullable(Date),
`purchaser` Nullable(UInt32),
`quantity` Nullable(UInt32),
`product_id` Nullable(UInt64),
`version` String,
`deleted` UInt8
) AS
SELECT
if(op = 'd', before.1, after.1) AS id,
if(op = 'd', toDate(before.2), toDate(after.2)) AS order_date,
if(op = 'd', before.4, after.4) AS purchaser,
if(op = 'd', before.5, after.5) AS quantity,
if(op = 'd', before.3, after.3) AS product_id,
if(op = 'd', ts_ms, ts_ms) AS version,
if(op = 'd', 1, 0) AS deleted
FROM oa_sqlserver.orders_changes
WHERE (op = 'c') OR (op = 'r') OR (op = 'u') OR (op = 'd');

连接上clickhouse运行SQL应该能查看到如下数据:

image.png

数据验证

在SQL Server主库中运行以下SQL:

INSERT INTO orders(order_date,purchaser,quantity,product_id) VALUES ('16-JAN-2024', 8888, 10, 102);

UPDATE inventory.dbo.orders SET order_date='2024-01-16', purchaser=1001, quantity=6, product_id=102 WHERE id=10001;

DELETE FROM inventory.dbo.orders WHERE id=10004;

分别新增、修改、删除一条数据,执行后主库数据如下:

image.png

Clickhouse中数据如下:

image.png

证明SQL Server主库数据改动后可正常同步到Clickhouse中。