如何找到VEth设备的对端接口VEth peer

4,377 阅读9分钟

序言

了解容器网络的同学都知道容器之间是通过VEth设备来进行容器间的网络通信的, 即通过将VEth设备的一端接在宿主机上, 另一端接在容器里面来实现宿主机network namespace和容器network namespace的连接, 在这里VEth设备充当了连接两个network namespace的一根虚拟网线的作用.

处在宿主机上的这一端的“网线接口”体现为一个宿主机上的网络接口, 直接在宿主机上通过ip a即可以看到, 一般形式为vethXXX (我们也可以通过ip -d link show <interface name>的命令来查看设备的类型), 但是当我们看到一串串以veth开头加上一串随机字符串的接口时是不是一下子就蒙了? 到底这些接口跟另一端在容器里面的接口是如何对应的? 这跟虚拟网线的另一端到底连接的是哪个容器?

下面就来分享两种方法我总结的方法, 第一种也是官方推荐的做法, 第二种是自己突然灵感乍现想到的💡, 所以赶紧记录下来, 不知道有没有跟我有同感的同学哈 : )

实验环境

两个运行在同一个节点上的Pod容器, 也可以自己通过docker run随意创建两个容器, 这里就不纠结了.

[root@10-10-40-84 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 47m 10.222.1.3 10-10-40-93 <none> <none>
busybox2 1/1 Running 0 45m 10.222.1.4 10-10-40-93 <none> <none>
[root@10-10-40-84 ~]#

docker ps的输出

[root@10-10-40-93 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
70eebe80845b af2f74c517aa "sleep 3600" 31 minutes ago Up 31 minutes k8s_busybox_busybox2_default_247b9265-59f5-11e9-9c05-faf63cb42000_1
2060ba52f6ed af2f74c517aa "sleep 3600" 34 minutes ago Up 34 minutes k8s_busybox_busybox_default_c7bf5185-59f4-11e9-9c05-faf63cb42000_1
bcb7f08f8707 registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_busybox2_default_247b9265-59f5-11e9-9c05-faf63cb42000_0
9a23d437bf97 registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_busybox_default_c7bf5185-59f4-11e9-9c05-faf63cb42000_0
[root@10-10-40-93 ~]#

先来看下两个容器所在的宿主机上ip a输出的情况

[root@10-10-40-93 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:e7:af:c1:b5:00 brd ff:ff:ff:ff:ff:ff
    inet 10.10.40.93/24 brd 10.10.40.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f8e7:afff:fec1:b500/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:83:4d:a3:4e:01 brd ff:ff:ff:ff:ff:ff
    inet 172.16.130.91/24 brd 172.16.130.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::f883:4dff:fea3:4e01/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:42:ad:df:4f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether be:1f:af:bb:6e:f5 brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::bc1f:afff:febb:6ef5/64 scope link
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP qlen 1000
    link/ether e6:36:8b:52:21:62 brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::e436:8bff:fe52:2162/64 scope link
       valid_lft forever preferred_lft forever
8: vethf0808a3e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether b2:2f:ed:b3:d1:66 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::b02f:edff:feb3:d166/64 scope link
       valid_lft forever preferred_lft forever
9: vethd5962a6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether be:14:67:cb:39:79 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::bc14:67ff:fecb:3979/64 scope link
       valid_lft forever preferred_lft forever
[root@10-10-40-93 ~]#

可以看到在宿主机上有两个VEth接口vethf0808a3e和vethd5962a6c, 再通过ip -d link show验证确实是两个VEth接口

[root@10-10-40-93 ~]# ip -d link show vethf0808a3e
8: vethf0808a3e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT
    link/ether b2:2f:ed:b3:d1:66 brd ff:ff:ff:ff:ff:ff link-netnsid 1 promiscuity 1
    veth
    bridge_slave state forwarding priority 32 cost 2 hairpin on guard off root_block off fastleave off learning on flood on port_id 0x8002 port_no 0x2 designated_port 32770 designated_cost 0 designated_bridge 8000.e6:36:8b:52:21:62 designated_root 8000.e6:36:8b:52:21:62 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on addrgenmode eui64
[root@10-10-40-93 ~]# ip -d link show vethd5962a6c
9: vethd5962a6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT
    link/ether be:14:67:cb:39:79 brd ff:ff:ff:ff:ff:ff link-netnsid 2 promiscuity 1
    veth
    bridge_slave state forwarding priority 32 cost 2 hairpin on guard off root_block off fastleave off learning on flood on port_id 0x8003 port_no 0x3 designated_port 32771 designated_cost 0 designated_bridge 8000.e6:36:8b:52:21:62 designated_root 8000.e6:36:8b:52:21:62 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on addrgenmode eui64
[root@10-10-40-93 ~]#

通过 brctl show可以看到两个VEth接口都接在网桥cni0上

[root@10-10-40-93 ~]# brctl show
bridge name	bridge id	STP enabled	interfaces
cni0	8000.e6368b522162	no	vethd5962a6c
       vethf0808a3e
docker0	8000.024242addf4f	no
[root@10-10-40-93 ~]#

通过ip a输出的网络接口序号对应关系找到VEth设备的对端接口

分别在两个Pod(容器)当中执行ip a, 查看容器当中的网络接口情况

[root@10-10-40-84 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 47m 10.222.1.3 10-10-40-93 <none> <none>
busybox2 1/1 Running 0 45m 10.222.1.4 10-10-40-93 <none> <none>
[root@10-10-40-84 ~]# kubectl exec -it busybox -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether a6:d1:b0:67:6a:55 brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.3/24 scope global eth0
       valid_lft forever preferred_lft forever
[root@10-10-40-84 ~]# kubectl exec -it busybox2 -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether 5a:d8:0d:16:64:5e brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.4/24 scope global eth0
       valid_lft forever preferred_lft forever
[root@10-10-40-84 ~]#

可以看到busybox这个容器里面看到的接口为eth0@if8, 对应宿主机上的序号为8的接口即vethf0808a3e. 而busybox2这个容器里面看到的接口为eth0@if9, 对应宿主机上序号为9的网络接口vethd5962a6c, 下面来进行抓包验证, 通过在busybox这个容器往外发ping包, 然后在宿主机上抓包看宿主机上的哪个VEth网络接口上能抓到ICMP报文

[root@10-10-40-84 ~]# kubectl exec -it busybox sh
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether a6:d1:b0:67:6a:55 brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.3/24 scope global eth0
       valid_lft forever preferred_lft forever
/ # ping 10.222.1.4
PING 10.222.1.4 (10.222.1.4): 56 data bytes
^C
--- 10.222.1.4 ping statistics ---
49 packets transmitted, 0 packets received, 100% packet loss
/ #
[root@10-10-40-93 ~]# tcpdump -nn -i vethf0808a3e icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethf0808a3e, link-type EN10MB (Ethernet), capture size 262144 bytes
21:36:23.262196 IP 10.222.1.3 > 10.222.1.4: ICMP echo request, id 5888, seq 19, length 64
21:36:24.262413 IP 10.222.1.3 > 10.222.1.4: ICMP echo request, id 5888, seq 20, length 64
21:36:25.262565 IP 10.222.1.3 > 10.222.1.4: ICMP echo request, id 5888, seq 21, length 64
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
[root@10-10-40-93 ~]# tcpdump -nn -i vethd5962a6c icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethd5962a6c, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
[root@10-10-40-93 ~]#

可以看到只有宿主机上的vethf0808a3e对应序号为8的网络接口上有抓到ICMP报文, 验证通过

通过Linux Bridge上的转发表来找到VEth设备的对端接口

另外一种奇淫异巧则则是通过Linux Bridge这个设备上的MAC地址对应关系来查找VEth设备的对端接口, 所有的VEth设备的一端实际上都连接在Linux Bridge上, 而Linux Bridge作为一个网络包转发的中间人, 当然是得知道两端的情况才行, 不然怎么做网络包的转发呢?

  • 查看Linux Bridge上的MAC和虚拟交换机端口对应关系
[root@10-10-40-93 ~]# brctl show
bridge name	bridge id	STP enabled	interfaces
cni0	8000.e6368b522162	no	vethd5962a6c
       vethf0808a3e
docker0	8000.024242addf4f	no
[root@10-10-40-93 ~]#
[root@10-10-40-93 ~]# brctl showmacs cni0
port no	mac addr	is local?	ageing timer
  3	5a:d8:0d:16:64:5e	no	80.94
  2	a6:d1:b0:67:6a:55	no	72.95
  2	b2:2f:ed:b3:d1:66	yes	0.00
  2	b2:2f:ed:b3:d1:66	yes	0.00
  3	be:14:67:cb:39:79	yes	0.00
  3	be:14:67:cb:39:79	yes	0.00
[root@10-10-40-93 ~]#

可以看到Linux Bridge上总共有两个接口, 接口2跟接口3, 前面两个local标志为no的表示的就是VEth设备的对端, 端口号一致的表示同一个VEth设备, 通过对比宿主机上ip a和容器当中ip a输出的结果对MAC地址进行比对即可发现跟第一种方法的结果是一致的, 同样可以通过抓包的方式来验证 : )