kube-ovn 1.15.0 使用中的错误 log 梳理

0 阅读16分钟

梳理顺序

ovs --> ovn --> kube-ovn

  1. ovs

grep -v INFO ovs-vswitchd.log | grep -v WARN

grep -v INFO ovsdb-server.log


image.png

WARN log 提交 issue:github.com/kubeovn/kub…

只有一条 warn log 值得关注,这条 log,在下述的 ovn-controller 和 ovn-northd 中都有类似的 due to excessive rate 的 warn log。

2026-01-22T07:23:10.064Z|00757|ofproto_dpif_xlate(handler1)|WARN|Dropped 1 log messages in last 30 seconds (most recently, 30 seconds ago) due to excessive rate

2. ovn


# ll
total 21068
drwxr-xr-x  2 nobody nogroup     4096 Jan  5 11:40 ./
drwxrwxr-x 20 root   syslog      4096 Jan 22 15:35 ../
-rw-r-----  1 nobody nogroup 20513283 Jan 22 15:28 ovn-controller.log
-rw-r-----  1 nobody nogroup   930467 Jan 22 15:09 ovn-northd.log
-rw-r-----  1 nobody nogroup    43068 Jan 22 15:08 ovsdb-server-nb.log
-rw-r-----  1 nobody nogroup    49811 Jan 22 15:09 ovsdb-server-sb.log

1. ovn-controller.log


# grep -v INFO ovn-controller.log | grep -v "Connection reset by peer" | grep -v "unknown interface"
2026-01-05T03:40:37.100Z|00012|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (No such file or directory)
2026-01-05T03:40:37.101Z|00016|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (No such file or directory)
2026-01-15T03:32:12.977Z|105013|lflow|WARN|error parsing match "ip4.src == $ovn.vm.jfcs1001_ip4": Syntax error at `$ovn.vm.jfcs1001_ip4' expecting address set name.
2026-01-16T06:32:01.180Z|00288|reconnect|ERR|tcp:[10.210.10.26]:6642: no response to inactivity probe after 10 seconds, disconnecting

# 分析了一下没什么问题

address 建出来之后,应该就没问题了,只是偶现了一下

# k ko sbctl list address_set | grep ovn.vm.jfcs1001_ip4
name                : ovn.vm.jfcs1001_ip4

2. ovn-northd.log


root@jfcs1001:/var/log/ovn# grep -v INFO ovn-northd.log
2026-01-05T03:40:54.202Z|00022|northd|WARN|No path for static route 0.0.0.0/0 on router ovn-cluster; next hop 100.64.0.1
2026-01-06T11:24:47.760Z|00024|jsonrpc|WARN|tcp:[10.210.10.26]:6641: receive error: Connection reset by peer
2026-01-06T11:24:47.760Z|00025|reconnect|WARN|tcp:[10.210.10.26]:6641: connection dropped (Connection reset by peer)
2026-01-06T11:24:52.949Z|00044|jsonrpc|WARN|tcp:[10.210.10.25]:6642: receive error: Connection reset by peer
2026-01-06T11:24:52.949Z|00045|reconnect|WARN|tcp:[10.210.10.25]:6642: connection dropped (Connection reset by peer)
2026-01-09T08:31:16.225Z|02682|northd|WARN|No path for routing policy priority 29010 on router ns8; next hop 10.177.255.253
2026-01-09T10:35:58.571Z|02683|northd|WARN|Dropped 3 log messages in last 7483 seconds (most recently, 7483 seconds ago) due to excessive rate
2026-01-09T10:35:58.571Z|02684|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-09T12:26:22.978Z|02697|northd|WARN|Dropped 5 log messages in last 6624 seconds (most recently, 6624 seconds ago) due to excessive rate
2026-01-09T12:26:22.978Z|02698|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-09T12:26:46.534Z|02708|northd|WARN|Dropped 1 log messages in last 24 seconds (most recently, 24 seconds ago) due to excessive rate
2026-01-09T12:26:46.534Z|02709|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-09T12:27:41.236Z|02740|northd|WARN|Dropped 3 log messages in last 54 seconds (most recently, 52 seconds ago) due to excessive rate
2026-01-09T12:27:41.236Z|02741|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-10T02:25:32.156Z|02765|northd|WARN|Dropped 1 log messages in last 50271 seconds (most recently, 50271 seconds ago) due to excessive rate
2026-01-10T02:25:32.156Z|02766|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-11T06:23:50.760Z|02808|northd|WARN|Dropped 1 log messages in last 100699 seconds (most recently, 100699 seconds ago) due to excessive rate
2026-01-11T06:23:50.760Z|02809|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-11T06:37:49.195Z|00023|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-11T13:51:06.974Z|00037|northd|WARN|Dropped 19 log messages in last 25998 seconds (most recently, 25996 seconds ago) due to excessive rate
2026-01-11T13:51:06.974Z|00038|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-11T13:52:30.905Z|00056|northd|WARN|Dropped 1 log messages in last 84 seconds (most recently, 84 seconds ago) due to excessive rate
2026-01-11T13:52:30.905Z|00057|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-12T11:03:02.026Z|00102|northd|WARN|Dropped 1 log messages in last 76231 seconds (most recently, 76231 seconds ago) due to excessive rate
2026-01-12T11:03:02.026Z|00103|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-12T11:04:49.349Z|00121|northd|WARN|Dropped 1 log messages in last 107 seconds (most recently, 107 seconds ago) due to excessive rate
2026-01-12T11:04:49.349Z|00122|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-12T13:51:45.480Z|00128|northd|WARN|Dropped 1 log messages in last 10017 seconds (most recently, 10017 seconds ago) due to excessive rate
2026-01-12T13:51:45.480Z|00129|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-12T13:53:13.201Z|00139|northd|WARN|Dropped 1 log messages in last 87 seconds (most recently, 87 seconds ago) due to excessive rate
2026-01-12T13:53:13.201Z|00140|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-13T01:26:27.118Z|00147|northd|WARN|Dropped 1 log messages in last 41594 seconds (most recently, 41594 seconds ago) due to excessive rate
2026-01-13T01:26:27.119Z|00148|northd|WARN|No path for routing policy priority 100 on router vpc-69606b344630628adbe9a50d; next hop 172.18.1.254
2026-01-13T15:14:16.666Z|00200|northd|WARN|Dropped 1 log messages in last 49670 seconds (most recently, 49670 seconds ago) due to excessive rate
2026-01-13T15:14:16.666Z|00201|northd|WARN|No path for routing policy priority 100 on router vpc-6960a1844630628adbe9a528; next hop 172.18.1.254
2026-01-13T15:15:41.284Z|00208|northd|WARN|No path for routing policy priority 100 on router vpc-6960a1844630628adbe9a528; next hop 172.18.1.254
2026-01-14T03:20:42.832Z|00224|northd|WARN|No path for routing policy priority 100 on router vpc-6960a1844630628adbe9a528; next hop 172.18.1.254
2026-01-15T05:39:18.776Z|00262|northd|WARN|No path for routing policy priority 100 on router vpc-6960a1844630628adbe9a528; next hop 172.18.1.254
2026-01-16T03:24:21.367Z|00324|reconnect|ERR|tcp:[10.210.10.25]:6641: no response to inactivity probe after 5 seconds, disconnecting
2026-01-16T03:24:21.367Z|00326|reconnect|ERR|tcp:[10.210.10.25]:6642: no response to inactivity probe after 5 seconds, disconnecting
2026-01-16T03:24:39.621Z|00350|reconnect|ERR|tcp:[10.210.10.25]:6641: no response to inactivity probe after 5 seconds, disconnecting
2026-01-16T03:24:39.621Z|00353|reconnect|ERR|tcp:[10.210.10.25]:6642: no response to inactivity probe after 5.01 seconds, disconnecting
2026-01-16T03:27:15.290Z|00376|reconnect|ERR|tcp:[10.210.10.25]:6641: no response to inactivity probe after 5 seconds, disconnecting
2026-01-16T03:27:15.291Z|00378|reconnect|ERR|tcp:[10.210.10.25]:6642: no response to inactivity probe after 5 seconds, disconnecting
2026-01-16T03:29:12.292Z|00384|reconnect|ERR|tcp:[10.210.10.26]:6641: no response to inactivity probe after 5 seconds, disconnecting
2026-01-16T03:29:15.306Z|00394|northd|WARN|No path for routing policy priority 100 on router vpc-6960a1844630628adbe9a528; next hop 172.18.1.254
2026-01-16T03:37:39.935Z|00396|jsonrpc|WARN|tcp:[10.210.10.25]:6641: receive error: Connection reset by peer
2026-01-16T03:37:39.935Z|00397|reconnect|WARN|tcp:[10.210.10.25]:6641: connection dropped (Connection reset by peer)
2026-01-16T03:37:55.249Z|00021|northd|WARN|No path for routing policy priority 100 on router vpc-6960a1844630628adbe9a528; next hop 172.18.1.254
2026-01-16T03:53:07.868Z|00035|northd|WARN|Dropped 9 log messages in last 912 seconds (most recently, 910 seconds ago) due to excessive rate
2026-01-16T03:53:07.868Z|00036|northd|WARN|No path for routing policy priority 100 on router vpc-6960a1844630628adbe9a528; next hop 172.18.1.254
2026-01-16T06:13:16.375Z|00183|reconnect|ERR|tcp:[10.210.10.26]:6642: no response to inactivity probe after 5 seconds, disconnecting
2026-01-16T06:13:18.429Z|00195|reconnect|ERR|tcp:[10.210.10.25]:6641: no response to inactivity probe after 5 seconds, disconnecting
2026-01-16T06:18:15.647Z|00014|reconnect|ERR|tcp:[10.210.10.25]:6641: no response to inactivity probe after 5 seconds, disconnecting
2026-01-18T11:20:39.945Z|01591|jsonrpc|WARN|tcp:[10.210.10.26]:6642: receive error: Connection reset by peer
2026-01-18T11:20:39.945Z|01592|reconnect|WARN|tcp:[10.210.10.26]:6642: connection dropped (Connection reset by peer)
2026-01-21T11:15:04.080Z|00276|northd|WARN|No path for routing policy priority 29010 on router ns6; next hop 10.177.255.253

# No path for routing policy priority 这种有点像还没同步到 datapath 的感觉

3. ovsdb-server-nb.log

没有什么值得注意的 log, 只是默认部署出来的就是选举触发间隔是 5s(这个可能要参考下 etcd 的设计和最佳实践)。


root@jfcs1001:/var/log/ovn# grep -v INFO ovsdb-server-nb.log | grep -v "inactivity probe"
2026-01-06T12:24:18.686Z|00014|raft|WARN|ignoring vote request received after only 4795 ms (minimum election time is 5000 ms)
2026-01-06T12:24:18.688Z|00015|raft|WARN|ignoring vote request received after only 4797 ms (minimum election time is 5000 ms)
2026-01-11T06:37:42.441Z|00014|raft|WARN|ignoring vote request received after only 4380 ms (minimum election time is 5000 ms)
2026-01-16T03:24:25.304Z|00054|jsonrpc|WARN|tcp:10.210.10.25:35980: receive error: Connection reset by peer
2026-01-16T03:24:25.305Z|00055|reconnect|WARN|tcp:10.210.10.25:35980: connection dropped (Connection reset by peer)
2026-01-16T03:24:41.817Z|00079|jsonrpc|WARN|tcp:10.210.10.25:56808: receive error: Connection reset by peer
2026-01-16T03:24:41.817Z|00080|reconnect|WARN|tcp:10.210.10.25:56808: connection dropped (Connection reset by peer)
2026-01-16T03:27:20.492Z|00099|raft|WARN|ignoring vote request received after only 1617 ms (minimum election time is 5000 ms)
2026-01-16T03:29:13.302Z|00105|raft|WARN|ignoring vote request received after only 964 ms (minimum election time is 5000 ms)
2026-01-16T03:37:49.508Z|00012|raft|WARN|ignoring vote request received after only 4812 ms (minimum election time is 5000 ms)
2026-01-16T03:37:49.509Z|00013|raft|WARN|ignoring vote request received after only 4814 ms (minimum election time is 5000 ms)
2026-01-16T06:13:14.727Z|00018|cooperative_multitasking|WARN|../ovsdb/jsonrpc-server.c:698: yield for raft_run(0x5600d65634a0): elapsed(2503) >= threshold(1832), overrun: 671
2026-01-16T06:31:41.664Z|00033|raft|WARN|ignoring vote request received as leader
2026-01-16T06:31:46.748Z|00034|raft|WARN|ignoring vote request received as leader
2026-01-16T06:32:47.164Z|00047|raft|WARN|ignoring vote request received as leader
2026-01-16T06:33:31.799Z|00049|jsonrpc|WARN|tcp:10.210.10.26:61806: receive error: Connection reset by peer
2026-01-16T06:33:31.799Z|00050|reconnect|WARN|tcp:10.210.10.26:61806: connection dropped (Connection reset by peer)
2026-01-18T11:20:48.298Z|00012|raft|WARN|ignoring vote request received after only 4166 ms (minimum election time is 5000 ms)

4. ovsdb-server-sb.log

和上述 类似


root@jfcs1001:/var/log/ovn# grep -v INFO ovsdb-server-nb.log | grep -v "inactivity probe" | grep -v "inactivity probe after"
2026-01-06T12:24:18.686Z|00014|raft|WARN|ignoring vote request received after only 4795 ms (minimum election time is 5000 ms)
2026-01-06T12:24:18.688Z|00015|raft|WARN|ignoring vote request received after only 4797 ms (minimum election time is 5000 ms)
2026-01-11T06:37:42.441Z|00014|raft|WARN|ignoring vote request received after only 4380 ms (minimum election time is 5000 ms)
2026-01-16T03:24:25.304Z|00054|jsonrpc|WARN|tcp:10.210.10.25:35980: receive error: Connection reset by peer
2026-01-16T03:24:25.305Z|00055|reconnect|WARN|tcp:10.210.10.25:35980: connection dropped (Connection reset by peer)
2026-01-16T03:24:41.817Z|00079|jsonrpc|WARN|tcp:10.210.10.25:56808: receive error: Connection reset by peer
2026-01-16T03:24:41.817Z|00080|reconnect|WARN|tcp:10.210.10.25:56808: connection dropped (Connection reset by peer)
2026-01-16T03:27:20.492Z|00099|raft|WARN|ignoring vote request received after only 1617 ms (minimum election time is 5000 ms)
2026-01-16T03:29:13.302Z|00105|raft|WARN|ignoring vote request received after only 964 ms (minimum election time is 5000 ms)
2026-01-16T03:37:49.508Z|00012|raft|WARN|ignoring vote request received after only 4812 ms (minimum election time is 5000 ms)
2026-01-16T03:37:49.509Z|00013|raft|WARN|ignoring vote request received after only 4814 ms (minimum election time is 5000 ms)
2026-01-16T06:13:14.727Z|00018|cooperative_multitasking|WARN|../ovsdb/jsonrpc-server.c:698: yield for raft_run(0x5600d65634a0): elapsed(2503) >= threshold(1832), overrun: 671
2026-01-16T06:31:41.664Z|00033|raft|WARN|ignoring vote request received as leader
2026-01-16T06:31:46.748Z|00034|raft|WARN|ignoring vote request received as leader
2026-01-16T06:32:47.164Z|00047|raft|WARN|ignoring vote request received as leader
2026-01-16T06:33:31.799Z|00049|jsonrpc|WARN|tcp:10.210.10.26:61806: receive error: Connection reset by peer
2026-01-16T06:33:31.799Z|00050|reconnect|WARN|tcp:10.210.10.26:61806: connection dropped (Connection reset by peer)
2026-01-18T11:20:48.298Z|00012|raft|WARN|ignoring vote request received after only 4166 ms (minimum election time is 5000 ms)

1. kube-ovn

发现了一些使用上的问题

###########################  我感觉这个 pod 获取 nad 的方式有问题,但 gw pod 又能创建成功。但又总是报错

是不是创建 pod 中关于 获取 nad 的逻辑中有不一致的逻辑?

E0121 15:07:51.660616       7 pod.go:1800] failed to get net-attach-def real-eip, networkattachmentdefinition.k8s.cni.cncf.io "real-eip" not found
E0121 15:07:51.660664       7 pod.go:1595] networkattachmentdefinition.k8s.cni.cncf.io "real-eip" not found
E0121 15:07:51.660680       7 pod.go:232] invalid pod net: networkattachmentdefinition.k8s.cni.cncf.io "real-eip" not found
E0121 15:07:51.660825       7 pod.go:1800] failed to get net-attach-def ovn-vpc-external-network, networkattachmentdefinition.k8s.cni.cncf.io "ovn-vpc-external-network" not found




###########################  我感觉这个 subnet 获取 nad 的方式也有类似问题,但 gw pod 又能创建成功。但又总是报错

E0121 15:07:52.043854       7 subnet.go:2940] failed to get NAD kube-system/fake-eip for subnet fake-eip: networkattachmentdefinition.k8s.cni.cncf.io "fake-eip" not found
E0121 15:07:52.043924       7 subnet.go:552] failed to get NAD kube-system/fake-eip for subnet fake-eip: networkattachmentdefinition.k8s.cni.cncf.io "fake-eip" not found
E0121 15:07:52.043991       7 controller.go:1546] "Unhandled Error" err="error syncing add/update subnet \"fake-eip\": failed to get NAD kube-system/fake-eip for subnet fake-eip: networkattachmentdefinition.k8s.cni.cncf.io \"fake-eip\" not found, requeuing" logger="UnhandledError"
E0121 15:07:52.048948       7 pod.go:1800] failed to get net-attach-def real-eip, networkattachmentdefinition.k8s.cni.cncf.io "real-eip" not found
E0121 15:07:52.048976       7 pod.go:445] networkattachmentdefinition.k8s.cni.cncf.io "real-eip" not found
E0121 15:07:52.048998       7 subnet.go:1318] failed to get pod nets networkattachmentdefinition.k8s.cni.cncf.io "real-eip" not found
E0121 15:07:52.049116       7 pod.go:1800] failed to get net-attach-def ovn-vpc-external-network, networkattachmentdefinition.k8s.cni.cncf.io "ovn-vpc-external-network" not found
E0121 15:07:52.049132       7 pod.go:445] networkattachmentdefinition.k8s.cni.cncf.io "ovn-vpc-external-network" not found
E0121 15:07:52.049148       7 subnet.go:1318] failed to get pod nets networkattachmentdefinition.k8s.cni.cncf.io "ovn-vpc-external-network" not found



########################### qos 不能更新么
E0122 16:11:46.151213       7 qos_policy.go:326] qos policy 10.210.10.203 is being used
E0122 16:11:46.151275       7 controller.go:1546] "Unhandled Error" err="error syncing update qos policy \"10.210.10.203\": qos policy 10.210.10.203 is being used, requeuing" logger="UnhandledError"
E0122 16:11:47.152190       7 qos_policy.go:326] qos policy 10.210.10.203 is being used
E0122 16:11:47.152264       7 controller.go:1546] "Unhandled Error" err="error syncing update qos policy \"10.210.10.203\": qos policy 10.210.10.203 is being used, requeuing" logger="UnhandledError"
E0122 16:11:48.153422       7 qos_policy.go:326] qos policy 10.210.10.203 is being used
E0122 16:11:48.153509       7 controller.go:1546] "Unhandled Error" err="error syncing update qos policy \"10.210.10.203\": qos policy 10.210.10.203 is being used, requeuing" logger="UnhandledError"


2. kube-ovn-cni

没问题


root@jfcs1001:/var/log/kube-ovn# k logs -f -n kube-system                    kube-ovn-cni-sqqv8 | grep -v I01
Defaulted container "cni-server" out of: cni-server, hostpath-init (init), install-cni (init)
W0122 14:09:44.045019   53650 handler.go:514] Pod vpc-nat-gw-ns6-bgp-gw-0 not found, proceeding with NIC deletion using ContainerID and NetNs
W0122 14:09:44.164814   53650 handler.go:514] Pod vpc-nat-gw-ns7-bgp-gw-0 not found, proceeding with NIC deletion using ContainerID and NetNs


2. kube-bgp-spaker

有一些 bug

比如日志没有同步到 node