tidb集群安装部署

47 阅读8分钟

tidb集群安装部署

部署架构图

image-20210330171042356

一、安装tiup

curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh

二、部署tidb集群

架构图

存储架构图

SQL层架构图

1、安装tiup cluster
tiup install cluster
2、制作集群部署文件
global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/tidb-deploy"
  data_dir: "/tidb-data"

# # Monitored variables are applied to all the machines.
monitored:
  node_exporter_port: 9100
  blackbox_exporter_port: 9115
  # deploy_dir: "/tidb-deploy/monitored-9100"
  # data_dir: "/tidb-data/monitored-9100"
  # log_dir: "/tidb-deploy/monitored-9100/log"
server_configs:
  tidb:
    log.slow-threshold: 300
    binlog.enable: false
    binlog.ignore-error: false
  tikv:
    # server.grpc-concurrency: 4
    # raftstore.apply-pool-size: 2
    # raftstore.store-pool-size: 2
    # rocksdb.max-sub-compactions: 1
    # storage.block-cache.capacity: "16GB"
    # readpool.unified.max-thread-count: 12
    readpool.storage.use-unified-pool: false
    readpool.coprocessor.use-unified-pool: true
  pd:
    schedule.leader-schedule-limit: 4
    schedule.region-schedule-limit: 2048
    schedule.replica-schedule-limit: 64
pd_servers:
  - host: 192.168.1.91
    name: pd-91
    # ssh_port: 22
    # client_port: 2379
    # peer_port: 2380
    # deploy_dir: "/tidb-deploy/pd-2379"
    # data_dir: "/tidb-data/pd-2379"
    # log_dir: "/tidb-deploy/pd-2379/log"
    # numa_node: "0,1"
    # # The following configs are used to overwrite the `server_configs.pd` values.
    # config:
    #   schedule.max-merge-region-size: 20
    #   schedule.max-merge-region-keys: 200000
  - host: 192.168.1.92
    name: pd-92
  - host: 192.168.1.93
    name: pd-93
tidb_servers: 
  - host: 192.168.1.91
  # ssh_port: 22
    # port: 4000
    # status_port: 10080
    # deploy_dir: "/tidb-deploy/tidb-4000"
    # log_dir: "/tidb-deploy/tidb-4000/log"
    # numa_node: "0,1"
    # # The following configs are used to overwrite the `server_configs.tidb` values.
    # config:
    #   log.slow-query-file: tidb-slow-overwrited.log
  - host: 192.168.1.92
  - host: 192.168.1.93
tikv_servers: 
  - host: 192.168.1.91
  # ssh_port: 22
    # port: 20160
    # status_port: 20180
    # deploy_dir: "/tidb-deploy/tikv-20160"
    # data_dir: "/tidb-data/tikv-20160"
    # log_dir: "/tidb-deploy/tikv-20160/log"
    # numa_node: "0,1"
    # # The following configs are used to overwrite the `server_configs.tikv` values.
    # config:
    #   server.grpc-concurrency: 4
    #   server.labels: { zone: "zone1", dc: "dc1", host: "host1" }
  - host: 192.168.1.92
  - host: 192.168.1.93
cdc_servers:
  - host: 10.0.1.3
monitoring_servers:
  - host: 192.168.1.91
  # ssh_port: 22
    # port: 9090
    # deploy_dir: "/tidb-deploy/prometheus-8249"
    # data_dir: "/tidb-data/prometheus-8249"
    # log_dir: "/tidb-deploy/prometheus-8249/log"
grafana_servers:
  - host: 10.0.1.10
    # port: 3000
    # deploy_dir: /tidb-deploy/grafana-3000

alertmanager_servers:
  - host: 10.0.1.10
    # ssh_port: 22
    # web_port: 9093
    # cluster_port: 9094
    # deploy_dir: "/tidb-deploy/alertmanager-9093"
    # data_dir: "/tidb-data/alertmanager-9093"
    # log_dir: "/tidb-deploy/alertmanager-9093/log"
3、部署集群
tiup cluster deploy prod-cluster v4.0.11 /tmp/topology.yaml -u root -p 123456
  • prod-cluster - 集群名称

  • v4.0.1 - 版本号

  • 修改tidb的sql-mode,支持group by(多字段)

    SHOW VARIABLES like 'sql_mode';
    SET GLOBAL sql_mode='';-- 去掉group by
    
  • 改为悲观事务 mysql默认为悲观事务,想要与mysql保持一致,改为悲观事务

    SET GLOBAL tidb_txn_mode = 'pessimistic';
    
  • 启动集群

    关闭机器防火墙

  tiup cluster start prod-cluster
4、集群扩容

编写扩容文件类似步骤二的文件

执行扩容命令

tiup cluster scale-out <cluster-name> scale-out.yaml -u root -p
5、集群缩容
tiup cluster scale-in tidb-test -N 192.168.1.67:9001
--force 强制

缩容失败,手工下线节点

curl -X POST 'http://<pd-address>/pd/api/v1/store/<store_id>/state?state=Tombstone' #修改节点状态
tiup ctl pd -u <pd-address> store delete <store_id> #删除节点
tiup cluster edit-config <cluster-name>   #删除节点
6、binlog 同步mysql数据

下载同步的程序,下载链接

部署架构

编写syncer.meta文件:

binlog-name = "mysql-bin.000006"
binlog-pos = 3025199
binlog-gtid = ""

binlog-name、binlog-pos需要在mysql上通过show master status查看

编写配置文件config.toml

log-level = "info"
log-file = "syncer.log"
log-rotate = "day"

server-id = 101

## meta 文件地址
meta = "./syncer.meta"
worker-count = 16
batch = 100
flavor = "mysql"

## Prometheus 可以通过该地址拉取 Syncer metrics,也是 Syncer 的 pprof 调试地址
status-addr = ":8271"

## 如果设置为 true,Syncer 遇到 DDL 语句时就会停止退出
stop-on-ddl = false

## SQL 请求由于网络异常等原因出错时的最大重试次数
max-retry = 100

[from]
host = "192.168.1.67"
user = "asr_root"
password = "ASRADMIN"
port = 3306

[to]
host = "192.168.1.51"
user = "root"
password = ""
port = 4000

启动syncer

./bin/syncer -config config.toml

参考链接

7、cdc同步数据到mysql

架构图

通过扩容的方式安装cdc

cdc_servers:
  - host: 10.0.1.3

同步命令

cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_1.log --addr=0.0.0.0:8301 --advertise-addr=127.0.0.1:8301
cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_2.log --addr=0.0.0.0:8302 --advertise-addr=127.0.0.1:8302
cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_3.log --addr=0.0.0.0:8303 --advertise-addr=127.0.0.1:8303

创建同步任务

cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task"

参考连接

8、TiFlash安装

扩容文件

server_configs:
  pd:
    replication.enable-placement-rules: true
tiflash_servers:
  - host: 192.168.1.67
    metrics_port: 8236
    tcp_port: 9000
    http_port: 8123
  - host: 192.168.1.68
    metrics_port: 8236
    tcp_port: 9000
    http_port: 8123

执行扩容命令

tiup cluster scale-out <cluster-name> <scale-file> -u root -p

设置tiflash副本

alter table <db-name>.<table-name> set tiflash replica 2;
9、mysql数据迁移

数据备份

./dumpling   -u asr_root -p ASRADMIN   -P 3306   -h 127.0.0.1 -B cdc_test  --filetype sql   --threads 32   -o ./sql   -F 256MiB

数据导入config.toml

[lightning]
# 日志
level = "info"
file = "tidb-lightning.log"

[tikv-importer]
# 选择使用的 local 后端
backend = "tidb"
# 设置排序的键值对的临时存放地址,目标路径需要是一个空目录
sorted-kv-dir = "/home/tidb/tikv"

[mydumper]
# 源数据目录。
data-source-dir = "/home/tidb/sql/"

# 配置通配符规则,默认规则会过滤 mysql、sys、INFORMATION_SCHEMA、PERFORMANCE_SCHEMA、METRICS_SCHEMA、INSPECTION_SCHEMA 系统数据库下的所有表
# 若不配置该项,导入系统表时会出现“找不到 schema”的异常
#filter = ['*.*']
[checkpoint]
# 是否启用断点续传。
# 导入数据时,TiDB Lightning 会记录当前表导入的进度。
# 所以即使 TiDB Lightning 或其他组件异常退出,在重启时也可以避免重复再导入已完成的数据。
enable = false
# 存储断点的数据库名称。
[tidb]
# 目标集群的信息
host = "192.168.1.51"
port = 4000
user = "root"
password = ""
# 表架构信息在从 TiDB 的“状态端口”获取。
status-port = 10080
# 集群 pd 的地址
pd-addr = "192.168.1.51:2379"
log-level = "error"

导入命令

./tidb-lighting -config config.toml

三、Haproxy 安装

1、安装
yum -y install haproxy
2、配置文件

vi /etc/haproxy.cfg

#---------------------------------------------------------------------
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    log                     global
    retries                 3
    timeout connect         2s
    timeout client          500m
    timeout server          500m

#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------

#---------------------------------------------------------------------
# static backend for serving up images, stylesheets and such
#---------------------------------------------------------------------
listen admin_stats                         # frontend 和 backend 的组合体,监控组的名称,按需自定义名称
   bind 0.0.0.0:8080                       # 配置监听端口
   mode http                               # 配置监控运行的模式,此处为 `http` 模式
   option httplog                          # 表示开始启用记录 HTTP 请求的日志功能
   maxconn 10                              # 最大并发连接数
   stats refresh 30s                       # 配置每隔 30 秒自动刷新监控页面
   stats uri /haproxy                      # 配置监控页面的 URL
   stats realm HAProxy                     # 配置监控页面的提示信息
   stats auth admin:pingcap123             # 配置监控页面的用户和密码 admin,可以设置多个用户名
   stats hide-version                      # 配置隐藏统计页面上的 HAProxy 版本信息
   stats  admin if TRUE
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
listen tidb-cluster                        # 配置 database 负载均衡
   bind 0.0.0.0:3306                       # 配置浮动 IP 和 监听端口
   mode tcp                                # HAProxy 中要使用第四层的应用层
   balance leastconn                       # 连接数最少的服务器优先接收连接。`leastconn` 建议用于长会话服务,例如 LDAP、SQL、TSE 等,而不是短会话协议,如 HTTP。该算法是动态的,对于实例启动慢的服务器,权重会在运行中作调整。
   server tidb-1 192.168.1.91:4000 check inter 2000 rise 2 fall 3       # 检测 4000 端口,检测频率为 2000 毫秒。如果检测出 2 次正常就认定机器已恢复正常使用,如果检测出 3 次失败便认定该服务器不可用。
   server tidb-2 192.168.1.92:4000 check inter 2000 rise 2 fall 3
   server tidb-3 192.168.1.93:4000 check inter 2000 rise 2 fall 3

启动

#设置绑定策略
setsebool -P haproxy_connect_any=1

systemctl start haproxy

haproxy-check.sh检测软件

#!/bin/bash
A=`ps -C haproxy --no-header |wc -l`
if [ $A -eq 0 ];then
    echo "`date` haproxy is dead" >> /tmp/haproxy.log
	systemctl start haproxy
fi
sleep 2
if [ `ps -C haproxy --no-header |wc -l` -eq 0 ];then
    echo "`date` haproxy cannot start,stop keepalived" >> /tmp/haproxy.log
	systemctl stop keepalived
fi

四、keepalived安装

yun安装

yum install -y keepalived

如果缺少依赖,安装以下依赖

yum install -y curl gcc openssl-devel libnl3-devel net-snmp-devel

备机配置文件

global_defs {
    router_id hp_03
}

vrrp_script chk_haproxy {
	script "/etc/keepalived/haproxy_check.sh" ## 检测 haproxy 状态的脚本路径
	interval 5 ## 检测时间间隔
	weight -20 ## 如果条件成立,权重-20
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens192 
    virtual_router_id 51
    priority 90
    nopreempt
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
   track_script {
	     chk_haproxy
	 }
    virtual_ipaddress {
        192.168.1.98
    }
}

主机配置文件

global_defs {
    router_id hp_03
}

vrrp_script chk_haproxy {
	script "/etc/keepalived/haproxy_check.sh" ## 检测 haproxy 状态的脚本路径
	interval 5 ## 检测时间间隔
	weight -20 ## 如果条件成立,权重-20
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens192
    virtual_router_id 51
    priority 100
    nopreempt
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
  	track_script {
	     chk_haproxy
	 }
    virtual_ipaddress {
        192.168.1.98
    }
}

启动

systemctl start keepalived
五、实施方案
1、部署tidb集群,3个tikv,3个pd,3个tidb,3个tiflash,2台haproxy,2个keepalived
2、同步mysql数据,作为mysql的从节点,运行一到两周
3、cdc同步数据到mysql,down机,把主库改为tidb