环境说明:通过公网环境搭建的k8s集群,pod之间可以ping通,但是无法通过curl请求其他节点上部署的pod的nginx服务
[root@master ~]# tcpdump -i eth0 -s0 -nnn port 8472
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:41:31.011945 IP 10.120.98.121.43580 > 139.19.17.11.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.244.0.0 > 10.244.1.4: ICMP echo request, id 14, seq 1, length 1408
17:00:33.161639 IP 10.120.98.121.33899 > 139.19.17.11.8472: OTV, flags [I] (0x08), overlay 0, instance 1
16:41:31.012465 IP 139.19.17.11.58717 > 10.120.98.121.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.244.1.4 > 10.244.0.0: ICMP echo reply, id 14, seq 1, length 1408
IP 10.244.0.0.54052 > 10.244.1.4.80: Flags [S], seq 308308049, win 28200, options [mss 1410,sackOK,TS val 1449320609 ecr 0,nop,wscale 7], length 0
可以看到vxlan包裹的ping包是有返回数据的,但是包裹的tcp包没有返回数据
换个命令
[root@master ~]# tcpdump -vvv -XX -nn -S -i eth0 port 8472
dropped privs to tcpdump
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:07:43.426305 IP (tos 0x0, ttl 64, id 31236, offset 0, flags [none], proto UDP (17), length 110)
10.120.98.121.32939 > 139.19.17.11.8472: [bad udp cksum 0x3899 -> 0xc110!] OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 64, id 6713, offset 0, flags [DF], proto TCP (6), length 60)
10.244.0.0.55284 > 10.244.1.4.80: Flags [S], cksum 0xac36 (correct), seq 3067985922, win 28200, options [mss 1410,sackOK,TS val 1449750873 ecr 0,nop,wscale 7], length 0
0x0000: 0254 99c4 89cb 5254 99fb 6f34 0800 4500 .T....RT..o4..E.
0x0010: 006e 7a04 0000 4011 5b55 0a78 6279 8bc6 .nz...@.[U.xby..
0x0020: ac6e 80ab 2118 005a 3899 0800 0000 0000 .n..!..Z8.......
0x0030: 0100 9e4d 923a eb6a 4a68 793f 1dbf 0800 ...M.:.jJhy?....
0x0040: 4500 003c 1a39 4000 4006 0998 0af4 0000 E..<.9@.@.......
0x0050: 0af4 0104 d7f4 0050 b6dd c002 0000 0000 .......P........
0x0060: a002 6e28 ac36 0000 0204 0582 0402 080a ..n(.6..........
0x0070: 5669 7159 0000 0000 0103 0307 ViqY........
17:09:19.262775 IP (tos 0x0, ttl 61, id 33094, offset 0, flags [none], proto UDP (17), length 134)
139.19.17.11.58717 > 10.120.98.121.8472: [udp sum ok] OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 63, id 3494, offset 0, flags [none], proto ICMP (1), length 84)
10.244.1.4 > 10.244.0.0: ICMP echo reply, id 21, seq 1, length 64
0x0000: 5254 99fb 6f34 0254 99c4 89cb 0800 4500 RT..o4.T......E.
0x0010: 0086 8146 0000 3d11 56fb 8bc6 ac6e 0a78 ...F..=.V....n.x
0x0020: 6279 e55d 2118 0072 4514 0800 0000 0000 by.]!..rE.......
0x0030: 0100 4a68 793f 1dbf 9e4d 923a eb6a 0800 ..Jhy?...M.:.j..
0x0040: 4500 0054 0da6 0000 3f01 5718 0af4 0104 E..T....?.W.....
0x0050: 0af4 0000 0000 2dd5 0015 0001 bfde 4c64 ......-.......Ld
0x0060: 0000 0000 03ff 0300 0000 0000 1011 1213 ................
0x0070: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0080: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0090: 3435 3637
通过上面的抓包结果可以看到,包裹着tcp请求的数据包的cksum为bad,所以请求不成功
解决办法
先暂时关闭校验和
# 查看校验和
[root@master ~]# ethtool -k flannel.1 | grep checksum
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
# 关闭校验和
[root@master ~]# ethtool -K flannel.1 tx-checksum-ip-generic off
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp-ecn-segmentation: off [not requested]
tx-tcp-mangleid-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
# 再次查看
[root@master ~]# ethtool -k flannel.1 | grep checksum
rx-checksumming: on
tx-checksumming: off
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: off
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
永久关闭
cat >/etc/systemd/system/k8s-flannel-tx-checksum-off.service <<EOF
[Unit]
Description=Turn off checksum offload on flannel.1
After=sys-devices-virtual-net-flannel.1.device
[Install]
WantedBy=sys-devices-virtual-net-flannel.1.device
[Service]
Type=oneshot
ExecStart=/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off
EOF
#开机自启动,并启动服务
systemctl enable k8s-flannel-tx-checksum-off
systemctl start k8s-flannel-tx-checksum-off