SR集群升级(2.5到3.0)WBS

90 阅读5分钟

一、工单信息

CIC提了个工单,需要对太仓的Starrocks集群做升级+磁盘配置,可以明天操作,工单:YYYW202502271803318



# 
集群升级到3.1,集群加入SSD,冷热分离调整成 storage_cooldown_second = 31536000

注意:

  1. 必须从 2.5 版本升级到 3.0.9 版本再升级至3.1版本,否则无法回滚。

  2. 分步骤升级,先从2.5升到3.0.9, 一段时间后再升到3.1。

  3. 支持滚动升级,BE和CN向后兼容。先升级BE和CN,最后升级FE。

  4. 对于 FE 节点,您必须先升级所有 Follower BE 节点,最后升级 Leader FE 节点。




二、准备

  1. 升级包准备
  2. 集群信息
节点IP角色配置版本BE路径
172.25.214.48fe/be64C/256G2.5.22/mnt/disk6/StarRocks-2.2.9/be/conf
172.25.214.49fe/be64C/256G2.5.22/mnt/disk6/StarRocks-2.2.9/be/conf
172.25.214.50fe/be64C/256G2.5.22/mnt/disk6/StarRocks-2.2.9/be/conf

# root 
mysql -h172.25.214.48 -uroot -P9030 -p"***"


# frontends
mysql> show frontends;
+----------------------------------+---------------+-------------+----------+-----------+---------+----------+-----------+------+-------+-------------------+---------------------+----------+--------+---------------------+----------------+
| Name                             | IP            | EditLogPort | HttpPort | QueryPort | RpcPort | Role     | ClusterId | Join | Alive | ReplayedJournalId | LastHeartbeat       | IsHelper | ErrMsg | StartTime           | Version        |
+----------------------------------+---------------+-------------+----------+-----------+---------+----------+-----------+------+-------+-------------------+---------------------+----------+--------+---------------------+----------------+
| 172.25.214.49_9010_1675397952903 | 172.25.214.49 | 9010        | 8030     | 9030      | 9020    | LEADER   | 613338486 | true | true  | 604020423         | 2024-10-29 15:34:53 | true     |        | 2024-08-07 16:43:41 | 2.5.22-5dffd65 |
| 172.25.214.50_9010_1669709615101 | 172.25.214.50 | 9010        | 8030     | 9030      | 9020    | FOLLOWER | 613338486 | true | true  | 604020410         | 2024-10-29 15:34:53 | true     |        | 2024-08-07 16:45:46 | 2.5.22-5dffd65 |
| 172.25.214.48_9010_1675398150131 | 172.25.214.48 | 9010        | 8030     | 9030      | 9020    | FOLLOWER | 613338486 | true | true  | 604020410         | 2024-10-29 15:34:53 | true     |        | 2024-08-07 16:41:09 | 2.5.22-5dffd65 |
+----------------------------------+---------------+-------------+----------+-----------+---------+----------+-----------+------+-------+-------------------+---------------------+----------+--------+---------------------+----------------+
3 rows in set (0.00 sec)



mysql> show backends;
+-----------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------+----------------+--------------------------------------------------------+-------------------+-------------+----------+-------------------+------------+------------+
| BackendId | IP            | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime       | LastHeartbeat       | Alive | SystemDecommissioned | ClusterDecommissioned | TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct | MaxDiskUsedPct | ErrMsg | Version        | Status                                                 | DataTotalCapacity | DataUsedPct | CpuCores | NumRunningQueries | MemUsedPct | CpuUsedPct |
+-----------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------+----------------+--------------------------------------------------------+-------------------+-------------+----------+-------------------+------------+------------+
| 10067     | 172.25.214.48 | 9055          | 9053   | 9054     | 9056     | 2024-08-07 16:35:00 | 2024-10-29 15:35:33 | true  | false                | false                 | 133650    | 2.517 TB         | 26.446 TB     | 29.102 TB     | 9.13 %  | 10.08 %        |        | 2.5.22-5dffd65 | {"lastSuccessReportTabletsTime":"2024-10-29 15:34:56"} | 28.962 TB         | 8.69 %      | 64       | 0                 | 28.73 %    | 0.3 %      |
| 10066     | 172.25.214.49 | 9055          | 9053   | 9054     | 9056     | 2024-08-07 16:36:35 | 2024-10-29 15:35:33 | true  | false                | false                 | 133649    | 2.517 TB         | 26.486 TB     | 29.102 TB     | 8.99 %  | 9.42 %         |        | 2.5.22-5dffd65 | {"lastSuccessReportTabletsTime":"2024-10-29 15:34:40"} | 29.003 TB         | 8.68 %      | 64       | 0                 | 28.89 %    | 2.4 %      |
| 10003     | 172.25.214.50 | 9055          | 9053   | 9054     | 9056     | 2024-08-07 16:38:25 | 2024-10-29 15:35:33 | true  | false                | false                 | 133649    | 2.517 TB         | 26.491 TB     | 29.102 TB     | 8.97 %  | 9.48 %         |        | 2.5.22-5dffd65 | {"lastSuccessReportTabletsTime":"2024-10-29 15:34:48"} | 29.009 TB         | 8.68 %      | 64       | 0                 | 28.73 %    | 1.0 %      |
+-----------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------+----------------+--------------------------------------------------------+-------------------+-------------+----------+-------------------+------------+------------+
3 rows in set (0.01 sec)



三、磁盘配置

新建一个文件夹 mkdir /mnt/disk7

将所有的sdh 磁盘挂在 /mnt/disk7

通过mysql 协议登录starrocks后修改参数

# 
ADMIN SHOW FRONTEND CONFIG LIKE "storage_cooldown_second";


ADMIN SET FRONTEND CONFIG ("tablet_sched_storage_cooldown_second" = "31536000");



四、SR升级3.0

官方文档

docs.starrocks.io/zh/docs/3.3…

  1. 准备工作(done)

调整参数

 # 关闭tablet clone。关闭balancer
ADMIN SET FRONTEND CONFIG ("tablet_sched_max_scheduling_tablets" = "0");
ADMIN SET FRONTEND CONFIG ("tablet_sched_max_balancing_tablets" = "0");
ADMIN SET FRONTEND CONFIG ("disable_balance"="true") ;
ADMIN SET FRONTEND CONFIG ("tablet_sched_disable_colocate_balance"="true");

# 
ADMIN get FRONTEND like "disable_colocate_balance";

2. ### 升级BE(done)

修改每个be节点 be.conf 中的 参数storage_root_path,在后面追加一个新的磁盘

storage_root_path = /mnt/disk6/StarRocks-2.2.9/be/storage;/mnt/disk2/StarRocks-2.2.9/be/storage;/mnt/disk3/StarRocks-2.2.9/be/storage;/mnt/disk4/StarRocks-2.2.9/be/storage;/mnt/disk7/StarRocks-2.2.9/be/storage

每个be节点的参数

ADMIN SET FRONTEND CONFIG ("tablet_sched_storage_cooldown_second" = "31536000");

# 
tablet_sched_storage_cooldown_second=31536000

  1. 升级FE

  1. 恢复参数


# 升级后恢复参数
ADMIN SET FRONTEND CONFIG ("tablet_sched_max_scheduling_tablets" = "10000");
ADMIN SET FRONTEND CONFIG ("tablet_sched_max_balancing_tablets" = "500");
ADMIN SET FRONTEND CONFIG ("disable_balance"="false") ;
ADMIN SET FRONTEND CONFIG ("tablet_sched_disable_colocate_balance"="false");

注意事项

图片.png

解决方法:

fe参数enable_shuffle_load关闭




五、升级的回滚方案

回滚操作,复位原来的bin和lib,然后启动fe和be的服务即可。




六、补充

参考

  1. StarRocks 版本2.5.20升级3.1.15 变更

  2. docs.starrocks.io/zh/docs/2.5…

新包下载连接

www.mirrorship.cn/zh-CN/downl…




七、封装supervisor

supervisor安装


# 安装
yum install -y supervisor

# 配置文件
vim /etc/supervisord.conf
# 其中有2个参数需要调整, 以下是默认值,但默认值会偏小,需要在后面加 000
minfds=1024
minprocs=200
# 修改为
minfds=1024000
minprocs=200000

ll /etc/supervisord.d/*.ini

# 启动
/usr/bin/python /usr/bin/supervisord -c /etc/supervisord.conf
# 或者使用systemctl的方式启动
systemctl start supervisord

# supervisor 日志
/var/log/supervisor/supervisord.log

systemctl enable supervisord
systemctl status supervisord

# supervisord 的子配置文件路径
/etc/supervisord.d/*.ini

#

172.25.214.48

FE

vi /etc/supervisord.d/starrocks_fe.ini

[program:starrocks_fe]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/fe
command=sh /mnt/disk6/StarRocks-2.2.9/fe/bin/start_fe.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10

BE

vi /etc/supervisord.d/starrocks_be.ini

[program:starrocks_be]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/be
command=sh /mnt/disk6/StarRocks-2.2.9/be/bin/start_be.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10

172.25.214.49(用原来的supervisor)

supervisor配置文件路径 /etc/supervisord/supervisord.conf

FE

vi /etc/supervisord.d/starrocks_fe.ini

[program:starrocks_fe]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/fe
command=sh /mnt/disk6/StarRocks-2.2.9/fe/bin/start_fe.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10

BE

vi /etc/supervisord.d/starrocks_be.ini

[program:starrocks_be]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/be
command=sh /mnt/disk6/StarRocks-2.2.9/be/bin/start_be.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10

172.25.214.50 (用原来的supervisor)

supervisor配置文件路径 /etc/supervisord/supervisord.conf

FE

vi /etc/supervisord.d/starrocks_fe.ini

[program:starrocks_fe]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/fe
command=sh /mnt/disk6/StarRocks-2.2.9/fe/bin/start_fe.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10

BE

vi /etc/supervisord.d/starrocks_be.ini

[program:starrocks_be]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/be
command=sh /mnt/disk6/StarRocks-2.2.9/be/bin/start_be.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10



八、后续问题

  1. 172.25.214.48 上面supervisord配置正确
  2. 214.49 上面的supervisord没配置,启停用命令操作
  3. 214.50 上面的 supervisord没配置,启停用命令操作。
  4. 后面约升级3.1的排期