TiUP升级集群报Run Command Timeout/SSH Timeout错误解决方案-CSDN博客

117 阅读2分钟

作者:代晓磊

原文来源: tidb.net/blog/fdbe79…

(1)问题现象:升级tiup过程中stop tikv节点超时:ERROR Run Command Timeout,其实登录到192.168.1.43查看tikv其实已经stop了。
2020-06-29T05:21:18.289+0800 INFO Stopping instance 192.168.1.43
2020-06-29T05:22:58.364+0800 INFO SSHCommand {“host”: “192.168.1.43”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c “systemctl daemon-reload && systemctl stop tikv-20160.service””, “stdout”: “”, “stderr”: “Run Command Timeout!\"n”}
2020-06-29T05:22:58.364+0800 ERROR Run Command Timeout!

2020-06-29T05:22:58.364+0800 INFO Execute command finished {“code”: 1, “error”: “failed to upgrade: failed to stop 192.168.1.43: failed to stop: tikv 192.168.1.43:20160: executor.ssh.execute_timedout: Execute command over SSH timedout for ‘tidb@192.168.1.43 :22’ {ssh_stderr: Run Command Timeout!\"n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c “systemctl daemon-reload && systemctl stop tikv-20160.service”}”, “errorVerbose”: “executor.ssh.execute_timedout: Execute command over SSH timedout for ‘tidb@192.168.1.43 :22’ {ssh_stderr: Run Command Timeout!\"n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c “systemctl daemon-reload && systemctl stop tikv-20160.service”}\"n at github.com/pingcap/tiu…).Execute()\"n\" tgithub.com/pingcap/tiu…at github.com/pingcap/tiu…).Execute()\"n\" tgithub.com/pingcap/tiu…at github.com/pingcap/tiu…at github.com/pingcap/tiu…at github.com/pingcap/tiu…).Execute()\"n\" tgithub.com/pingcap/tiu…at github.com/pingcap/tiu…).Execute()\"n\" tgithub.com/pingcap/tiu…at github.com/pingcap/tiu…at github.com/pingcap/tiu…at github.com/spf13/cobra…).execute()\"n\" tgithub.com/spf13/cobra…at github.com/spf13/cobra…).ExecuteC()\"n\" tgithub.com/spf13/cobra…at github.com/spf13/cobra…).Execute()\"n\" tgithub.com/spf13/cobra…at github.com/pingcap/tiu…at main.main()\"n\" tgithub.com/pingcap/tiu…at runtime.main()\"n\"truntime/proc.go:203\"n at runtime.goexit()\"n\"truntime/asm_amd64.s:1357\"nfailed to stop: tikv 192.168.1.43:20160\" ngithub.com/pingcap/tiu…).Execute\"n\" tgithub.com/pingcap/tiu…).Execute\"n\" tgithub.com/pingcap/tiu…).execute\"n\" tgithub.com/spf13/cobra…).ExecuteC\"n\" tgithub.com/spf13/cobra…).Execute\"n\" tgithub.com/spf13/cobra…to stop 192.168.1.43\"nfailed to upgrade”}

(2)解决方案:
1、升级tiup到最新版本: tiup update --self && tiup update --all 升级以下 tiup 及其组件
为啥要升级,目的是要使用最新版本的tiup的下面2个参数:
tiup cluster --help
Flags:
-h, --help help for tiup
–ssh-timeout int Timeout in seconds to connect host via SSH, ignored for operations that don’t need an SSH connection. (default 5)
-v, --version version for tiup
–wait-timeout int Timeout in seconds to wait for an operation to complete, ignored for operations that don’t fit. (default 60)

如果报ssh-timeout相关的报错,这个是中控机跟tikv/pd/tidb机器建立ssh连接的超时时间,如果遇到网络不好等情况,可以调大这个参数时间
如果报ERROR Run Command Timeout相关的报错,这个是中控机跟tikv/pd/tidb机器执行命令的超时时间,如果遇到执行比较慢,可以调大这个参数时间。

2、 调整了相关的timeout超时时间,执行了多次还是升级不成功,那就祭出最大的杀器:–force

滚动升级会逐个升级所有的组件。升级 TiKV 期间,会逐个将 TiKV 上的所有 leader 切走再停止该 TiKV 实例。默认超时时间为 5 分钟,超过后会直接停止实例。

如果不希望驱逐 leader,而希望立刻升级,可以在上述命令中指定 --force, 该方式会造成性能抖动(特别建议在凌晨低峰时间操作,将影响降低到最低) ,不会造成数据损失。