kube-ovn 组件的升级对于某些变动较大的版本,可能需要基于专门的脚本进行迁移。但是这些脚本都只有比较早的一些版本的迁移才有,如果想要用最新的版本一般都需要自己手动搞,可能会遇到一些问题。
目前我的环境是1.11,打算将版本升级到master分支,以便测试一些较新的特性。
可以参考官方文档使用最新的安装脚本,kubeovn.github.io/docs/v1.12.…
[root@pc-node-1 tmp]# grep "\[Step" install.sh
echo "[Step 0/6] Generate SSL key and cert"
echo "[Step 1/6] Label kube-ovn-master node and label datapath type"
echo "[Step 2/6] Install OVN components"
echo "[Step 3/6] Install Kube-OVN"
echo "[Step 4/6] Delete pod that not in host network mode"
# 这一步如非必要,最好跳过,不要执行
echo "[Step 5/6] Add kubectl plugin PATH"
echo "[Step 6/6] Run network diagnose"
- 对比install.sh
diff install.sh install.sh.1.11
需要将该集群1.11的install.sh中的配置,逐一同步到最新的master的install.sh
# 我这边只需要保证vlan以上的配置是一致的即可
< VLAN_ID="100"
---
> VLAN_ID="0"
执行install.sh 之前 务必再确认一次应该同步的配置都同步了
如果install.sh 执行卡主了,某些pod没有正常,可以先把所有kube-ovn pod重启一遍
kubectl get pod -A -o wide | grep ovn| awk '{print "kubectl delete pod -n" $1 " " $2 }' | bash -x
- 检查kube-ovn组件log
逐个查看下kube-ovn-controller, kube-ovn-cni,ovs 等关键组件的log是否存在问题
如果发现log没有启动,可以参考kube-ovn端口明细以及ovn架构梳理问题所在位置。 kubeovn.github.io/docs/v1.12.… www.ovn.org/en/architec…
端口信息¶
| 组件 | 端口 | 用途 |
|---|---|---|
| ovn-central | 6641/tcp, 6642/tcp, 6643/tcp, 6644/tcp | ovn-db 和 raft server 监听端口 |
| ovs-ovn | Geneve 6081/udp, STT 7471/tcp, Vxlan 4789/udp | 隧道端口 |
| kube-ovn-controller | 10660/tcp | 监控监听端口 |
| kube-ovn-daemon | 10665/tcp | 监控监听端口 |
| kube-ovn-monitor | 10661/tcp | 监控监听端口 |
CMS(neutron|kube-ovn-controller)
|
|
+-----------|-----------+
| | |
| OVN/CMS Plugin |
| | |
| | |
| OVN Northbound DB |
| | |
| | |
| ovn-northd |
| | |
+-----------|-----------+
|
|
+-------------------+
| OVN Southbound DB |
+-------------------+
|
|
+------------------+------------------+
| | |
HV 1 | | HV n |
+---------------|---------------+ . +---------------|---------------+
| | | . | | |
| ovn-controller | . | ovn-controller |
| | | | . | | | |
| | | | | | | |
| ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
| | | |
+-------------------------------+ +-------------------------------+
# 比如我这边发现 kube-ovn-controller 一直在频繁重启
I0301 11:25:04.002139 7 ipam.go:79] allocate v4 10.5.49.0 mac 00:00:00:CE:D0:68 for snat-qa-hongkong
I0301 11:25:04.002177 7 ipam.go:79] allocate v4 100.64.0.2 mac 00:00:00:1B:E3:72 for node-k8s-ctrl-1
I0301 11:25:04.002227 7 ipam.go:79] allocate v4 100.64.0.4 mac 00:00:00:1E:D1:6C for node-k8s-ctrl-2
I0301 11:25:04.002269 7 ipam.go:79] allocate v4 100.64.0.3 mac 00:00:00:E6:34:24 for node-k8s-ctrl-3
I0301 11:25:04.002304 7 init.go:464] take 0.57 seconds to initialize IPAM
W0301 11:25:04.011181 7 ovn-sbctl.go:46] ovn-sbctl command error: ovn-sbctl --timeout=60 --db=tcp:[10.5.32.21]:6642,tcp:[10.5.32.22]:6642,tcp:[10.5.32.23]:6642 --format=csv --no-heading --data=bare --columns=name find chassis name=678b8a34-a715-4f20-8204-93c4728513a4 in 8ms
E0301 11:25:04.011263 7 init.go:823] failed to check chassis exist: failed to find node chassis 678b8a34-a715-4f20-8204-93c4728513a4, ovn-sbctl: tcp:[10.5.32.21]:6642,tcp:[10.5.32.22]:6642,tcp:[10.5.32.23]:6642: database connection failed (Connection refused)
, "exit status 1"
E0301 11:25:04.011300 7 klog.go:10] "failed to initialize node chassis" err=<
failed to find node chassis 678b8a34-a715-4f20-8204-93c4728513a4, ovn-sbctl: tcp:[10.5.32.21]:6642,tcp:[10.5.32.22]:6642,tcp:[10.5.32.23]:6642: database connection failed (Connection refused)
, "exit status 1"
# 查看log发现ipam恢复之后,无法连到6642,那说明sbdb有问题,那只能查下ovn-central
# ovn-central可以单节点跑,也可以三节点跑,如果一开始是3节点最好还是恢复到三节点跑的模式
# ovn-central 恢复为3节点之后即正常
# kube-ovn-cni出现 10665端口未启动的问题
Readiness probe failed: dial tcp 10.5.32.23:10665: connect: connection refused
如果有重建ovn-central相关的数据库的操作,等ovn-central重建之后,可以重启以下pod
kube-ovn-controller ovs-ovn kube-ovn-cni
- 检查旧的pod资源网络是否正常
kube-ovn-pinger pod是否正常运行,到网关
node 和 默认vpc下pod,同节点,跨节点互通
默认vpc下的pod之间, 同节点,跨节点互通
snat 访问公网 fip从外部访问 dnat从外部访问