kube-ovn 升级

586 阅读4分钟

kube-ovn 组件的升级对于某些变动较大的版本,可能需要基于专门的脚本进行迁移。但是这些脚本都只有比较早的一些版本的迁移才有,如果想要用最新的版本一般都需要自己手动搞,可能会遇到一些问题。

目前我的环境是1.11,打算将版本升级到master分支,以便测试一些较新的特性。

可以参考官方文档使用最新的安装脚本,kubeovn.github.io/docs/v1.12.…


[root@pc-node-1 tmp]# grep "\[Step" install.sh

echo "[Step 0/6] Generate SSL key and cert"
echo "[Step 1/6] Label kube-ovn-master node and label datapath type"
echo "[Step 2/6] Install OVN components"
echo "[Step 3/6] Install Kube-OVN"
echo "[Step 4/6] Delete pod that not in host network mode"
# 这一步如非必要,最好跳过,不要执行
echo "[Step 5/6] Add kubectl plugin PATH"
echo "[Step 6/6] Run network diagnose"


  1. 对比install.sh

diff install.sh install.sh.1.11

需要将该集群1.11的install.sh中的配置,逐一同步到最新的master的install.sh

# 我这边只需要保证vlan以上的配置是一致的即可
< VLAN_ID="100"
---
> VLAN_ID="0"

执行install.sh 之前 务必再确认一次应该同步的配置都同步了

如果install.sh 执行卡主了,某些pod没有正常,可以先把所有kube-ovn pod重启一遍

kubectl get pod -A -o wide  | grep ovn| awk '{print "kubectl delete pod -n" $1 " " $2 }' | bash -x

  1. 检查kube-ovn组件log

逐个查看下kube-ovn-controller, kube-ovn-cni,ovs 等关键组件的log是否存在问题

如果发现log没有启动,可以参考kube-ovn端口明细以及ovn架构梳理问题所在位置。 kubeovn.github.io/docs/v1.12.… www.ovn.org/en/architec…

端口信息

组件端口用途
ovn-central6641/tcp, 6642/tcp, 6643/tcp, 6644/tcpovn-db 和 raft server 监听端口
ovs-ovnGeneve 6081/udp, STT 7471/tcp, Vxlan 4789/udp隧道端口
kube-ovn-controller10660/tcp监控监听端口
kube-ovn-daemon10665/tcp监控监听端口
kube-ovn-monitor10661/tcp监控监听端口

                                     CMS(neutron|kube-ovn-controller)
                                          |
                                          |
                              +-----------|-----------+
                              |           |           |
                              |     OVN/CMS Plugin    |
                              |           |           |
                              |           |           |
                              |   OVN Northbound DB   |
                              |           |           |
                              |           |           |
                              |       ovn-northd      |
                              |           |           |
                              +-----------|-----------+
                                          |
                                          |
                                +-------------------+
                                | OVN Southbound DB |
                                +-------------------+
                                          |
                                          |
                       +------------------+------------------+
                       |                  |                  |
         HV 1          |                  |    HV n          |
       +---------------|---------------+  .  +---------------|---------------+
       |               |               |  .  |               |               |
       |        ovn-controller         |  .  |        ovn-controller         |
       |         |          |          |  .  |         |          |          |
       |         |          |          |     |         |          |          |
       |  ovs-vswitchd   ovsdb-server  |     |  ovs-vswitchd   ovsdb-server  |
       |                               |     |                               |
       +-------------------------------+     +-------------------------------+


# 比如我这边发现 kube-ovn-controller 一直在频繁重启

I0301 11:25:04.002139       7 ipam.go:79] allocate v4 10.5.49.0 mac 00:00:00:CE:D0:68 for snat-qa-hongkong
I0301 11:25:04.002177       7 ipam.go:79] allocate v4 100.64.0.2 mac 00:00:00:1B:E3:72 for node-k8s-ctrl-1
I0301 11:25:04.002227       7 ipam.go:79] allocate v4 100.64.0.4 mac 00:00:00:1E:D1:6C for node-k8s-ctrl-2
I0301 11:25:04.002269       7 ipam.go:79] allocate v4 100.64.0.3 mac 00:00:00:E6:34:24 for node-k8s-ctrl-3
I0301 11:25:04.002304       7 init.go:464] take 0.57 seconds to initialize IPAM
W0301 11:25:04.011181       7 ovn-sbctl.go:46] ovn-sbctl command error: ovn-sbctl --timeout=60 --db=tcp:[10.5.32.21]:6642,tcp:[10.5.32.22]:6642,tcp:[10.5.32.23]:6642 --format=csv --no-heading --data=bare --columns=name find chassis name=678b8a34-a715-4f20-8204-93c4728513a4 in 8ms
E0301 11:25:04.011263       7 init.go:823] failed to check chassis exist: failed to find node chassis 678b8a34-a715-4f20-8204-93c4728513a4, ovn-sbctl: tcp:[10.5.32.21]:6642,tcp:[10.5.32.22]:6642,tcp:[10.5.32.23]:6642: database connection failed (Connection refused)
, "exit status 1"
E0301 11:25:04.011300       7 klog.go:10] "failed to initialize node chassis" err=<
    failed to find node chassis 678b8a34-a715-4f20-8204-93c4728513a4, ovn-sbctl: tcp:[10.5.32.21]:6642,tcp:[10.5.32.22]:6642,tcp:[10.5.32.23]:6642: database connection failed (Connection refused)
    , "exit status 1"


# 查看log发现ipam恢复之后,无法连到6642,那说明sbdb有问题,那只能查下ovn-central


# ovn-central可以单节点跑,也可以三节点跑,如果一开始是3节点最好还是恢复到三节点跑的模式
# ovn-central 恢复为3节点之后即正常



# kube-ovn-cni出现 10665端口未启动的问题

Readiness probe failed: dial tcp 10.5.32.23:10665: connect: connection refused



如果有重建ovn-central相关的数据库的操作,等ovn-central重建之后,可以重启以下pod

kube-ovn-controller ovs-ovn kube-ovn-cni

  1. 检查旧的pod资源网络是否正常

kube-ovn-pinger pod是否正常运行,到网关

node 和 默认vpc下pod,同节点,跨节点互通

默认vpc下的pod之间, 同节点,跨节点互通

snat 访问公网 fip从外部访问 dnat从外部访问