一、工单信息
CIC提了个工单,需要对太仓的Starrocks集群做升级+磁盘配置,可以明天操作,工单:YYYW202502271803318
#
集群升级到3.1,集群加入SSD,冷热分离调整成 storage_cooldown_second = 31536000
注意:
-
必须从 2.5 版本升级到 3.0.9 版本再升级至3.1版本,否则无法回滚。
-
分步骤升级,先从2.5升到3.0.9, 一段时间后再升到3.1。
-
支持滚动升级,BE和CN向后兼容。先升级BE和CN,最后升级FE。
-
对于 FE 节点,您必须先升级所有 Follower BE 节点,最后升级 Leader FE 节点。
二、准备
- 升级包准备
- 集群信息
| 节点IP | 角色 | 配置 | 版本 | BE路径 |
|---|---|---|---|---|
| 172.25.214.48 | fe/be | 64C/256G | 2.5.22 | /mnt/disk6/StarRocks-2.2.9/be/conf |
| 172.25.214.49 | fe/be | 64C/256G | 2.5.22 | /mnt/disk6/StarRocks-2.2.9/be/conf |
| 172.25.214.50 | fe/be | 64C/256G | 2.5.22 | /mnt/disk6/StarRocks-2.2.9/be/conf |
# root
mysql -h172.25.214.48 -uroot -P9030 -p"***"
# frontends
mysql> show frontends;
+----------------------------------+---------------+-------------+----------+-----------+---------+----------+-----------+------+-------+-------------------+---------------------+----------+--------+---------------------+----------------+
| Name | IP | EditLogPort | HttpPort | QueryPort | RpcPort | Role | ClusterId | Join | Alive | ReplayedJournalId | LastHeartbeat | IsHelper | ErrMsg | StartTime | Version |
+----------------------------------+---------------+-------------+----------+-----------+---------+----------+-----------+------+-------+-------------------+---------------------+----------+--------+---------------------+----------------+
| 172.25.214.49_9010_1675397952903 | 172.25.214.49 | 9010 | 8030 | 9030 | 9020 | LEADER | 613338486 | true | true | 604020423 | 2024-10-29 15:34:53 | true | | 2024-08-07 16:43:41 | 2.5.22-5dffd65 |
| 172.25.214.50_9010_1669709615101 | 172.25.214.50 | 9010 | 8030 | 9030 | 9020 | FOLLOWER | 613338486 | true | true | 604020410 | 2024-10-29 15:34:53 | true | | 2024-08-07 16:45:46 | 2.5.22-5dffd65 |
| 172.25.214.48_9010_1675398150131 | 172.25.214.48 | 9010 | 8030 | 9030 | 9020 | FOLLOWER | 613338486 | true | true | 604020410 | 2024-10-29 15:34:53 | true | | 2024-08-07 16:41:09 | 2.5.22-5dffd65 |
+----------------------------------+---------------+-------------+----------+-----------+---------+----------+-----------+------+-------+-------------------+---------------------+----------+--------+---------------------+----------------+
3 rows in set (0.00 sec)
mysql> show backends;
+-----------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------+----------------+--------------------------------------------------------+-------------------+-------------+----------+-------------------+------------+------------+
| BackendId | IP | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime | LastHeartbeat | Alive | SystemDecommissioned | ClusterDecommissioned | TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct | MaxDiskUsedPct | ErrMsg | Version | Status | DataTotalCapacity | DataUsedPct | CpuCores | NumRunningQueries | MemUsedPct | CpuUsedPct |
+-----------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------+----------------+--------------------------------------------------------+-------------------+-------------+----------+-------------------+------------+------------+
| 10067 | 172.25.214.48 | 9055 | 9053 | 9054 | 9056 | 2024-08-07 16:35:00 | 2024-10-29 15:35:33 | true | false | false | 133650 | 2.517 TB | 26.446 TB | 29.102 TB | 9.13 % | 10.08 % | | 2.5.22-5dffd65 | {"lastSuccessReportTabletsTime":"2024-10-29 15:34:56"} | 28.962 TB | 8.69 % | 64 | 0 | 28.73 % | 0.3 % |
| 10066 | 172.25.214.49 | 9055 | 9053 | 9054 | 9056 | 2024-08-07 16:36:35 | 2024-10-29 15:35:33 | true | false | false | 133649 | 2.517 TB | 26.486 TB | 29.102 TB | 8.99 % | 9.42 % | | 2.5.22-5dffd65 | {"lastSuccessReportTabletsTime":"2024-10-29 15:34:40"} | 29.003 TB | 8.68 % | 64 | 0 | 28.89 % | 2.4 % |
| 10003 | 172.25.214.50 | 9055 | 9053 | 9054 | 9056 | 2024-08-07 16:38:25 | 2024-10-29 15:35:33 | true | false | false | 133649 | 2.517 TB | 26.491 TB | 29.102 TB | 8.97 % | 9.48 % | | 2.5.22-5dffd65 | {"lastSuccessReportTabletsTime":"2024-10-29 15:34:48"} | 29.009 TB | 8.68 % | 64 | 0 | 28.73 % | 1.0 % |
+-----------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------+----------------+--------------------------------------------------------+-------------------+-------------+----------+-------------------+------------+------------+
3 rows in set (0.01 sec)
三、磁盘配置
新建一个文件夹 mkdir /mnt/disk7
将所有的sdh 磁盘挂在 /mnt/disk7
通过mysql 协议登录starrocks后修改参数
#
ADMIN SHOW FRONTEND CONFIG LIKE "storage_cooldown_second";
ADMIN SET FRONTEND CONFIG ("tablet_sched_storage_cooldown_second" = "31536000");
四、SR升级3.0
官方文档
docs.starrocks.io/zh/docs/3.3…
-
准备工作(done)
调整参数
# 关闭tablet clone。关闭balancer
ADMIN SET FRONTEND CONFIG ("tablet_sched_max_scheduling_tablets" = "0");
ADMIN SET FRONTEND CONFIG ("tablet_sched_max_balancing_tablets" = "0");
ADMIN SET FRONTEND CONFIG ("disable_balance"="true") ;
ADMIN SET FRONTEND CONFIG ("tablet_sched_disable_colocate_balance"="true");
#
ADMIN get FRONTEND like "disable_colocate_balance";
2. ### 升级BE(done)
修改每个be节点 be.conf 中的 参数storage_root_path,在后面追加一个新的磁盘
storage_root_path = /mnt/disk6/StarRocks-2.2.9/be/storage;/mnt/disk2/StarRocks-2.2.9/be/storage;/mnt/disk3/StarRocks-2.2.9/be/storage;/mnt/disk4/StarRocks-2.2.9/be/storage;/mnt/disk7/StarRocks-2.2.9/be/storage
每个be节点的参数
ADMIN SET FRONTEND CONFIG ("tablet_sched_storage_cooldown_second" = "31536000");
#
tablet_sched_storage_cooldown_second=31536000
-
升级FE
-
恢复参数
# 升级后恢复参数
ADMIN SET FRONTEND CONFIG ("tablet_sched_max_scheduling_tablets" = "10000");
ADMIN SET FRONTEND CONFIG ("tablet_sched_max_balancing_tablets" = "500");
ADMIN SET FRONTEND CONFIG ("disable_balance"="false") ;
ADMIN SET FRONTEND CONFIG ("tablet_sched_disable_colocate_balance"="false");
注意事项
解决方法:
fe参数enable_shuffle_load关闭
五、升级的回滚方案
回滚操作,复位原来的bin和lib,然后启动fe和be的服务即可。
六、补充
参考
新包下载连接
www.mirrorship.cn/zh-CN/downl…
七、封装supervisor
supervisor安装
# 安装
yum install -y supervisor
# 配置文件
vim /etc/supervisord.conf
# 其中有2个参数需要调整, 以下是默认值,但默认值会偏小,需要在后面加 000
minfds=1024
minprocs=200
# 修改为
minfds=1024000
minprocs=200000
ll /etc/supervisord.d/*.ini
# 启动
/usr/bin/python /usr/bin/supervisord -c /etc/supervisord.conf
# 或者使用systemctl的方式启动
systemctl start supervisord
# supervisor 日志
/var/log/supervisor/supervisord.log
systemctl enable supervisord
systemctl status supervisord
# supervisord 的子配置文件路径
/etc/supervisord.d/*.ini
#
172.25.214.48
FE
vi /etc/supervisord.d/starrocks_fe.ini
[program:starrocks_fe]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/fe
command=sh /mnt/disk6/StarRocks-2.2.9/fe/bin/start_fe.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10
BE
vi /etc/supervisord.d/starrocks_be.ini
[program:starrocks_be]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/be
command=sh /mnt/disk6/StarRocks-2.2.9/be/bin/start_be.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10
172.25.214.49(用原来的supervisor)
supervisor配置文件路径 /etc/supervisord/supervisord.conf
FE
vi /etc/supervisord.d/starrocks_fe.ini
[program:starrocks_fe]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/fe
command=sh /mnt/disk6/StarRocks-2.2.9/fe/bin/start_fe.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10
BE
vi /etc/supervisord.d/starrocks_be.ini
[program:starrocks_be]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/be
command=sh /mnt/disk6/StarRocks-2.2.9/be/bin/start_be.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10
172.25.214.50 (用原来的supervisor)
supervisor配置文件路径 /etc/supervisord/supervisord.conf
FE
vi /etc/supervisord.d/starrocks_fe.ini
[program:starrocks_fe]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/fe
command=sh /mnt/disk6/StarRocks-2.2.9/fe/bin/start_fe.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10
BE
vi /etc/supervisord.d/starrocks_be.ini
[program:starrocks_be]
process_name=%(program_name)s
directory=/mnt/disk6/StarRocks-2.2.9/be
command=sh /mnt/disk6/StarRocks-2.2.9/be/bin/start_be.sh
autostart=true
autorestart=true
user=root
numprocs=1
startretries=3
stopasgroup=true
killasgroup=true
startsecs=1
stopwaitsecs=10
八、后续问题
- 172.25.214.48 上面supervisord配置正确
- 214.49 上面的supervisord没配置,启停用命令操作
- 214.50 上面的 supervisord没配置,启停用命令操作。
- 后面约升级3.1的排期