操作系统-5.2 linux网络

308 阅读7分钟

网络协议栈

Linux 网络协议栈简单总结

网络分层概述

iptables

iptables详解

MASQUERADE,地址伪装,算是snat中的一种特例,可以实现自动化的SNA

IPtables中SNAT、DNAT和MASQUERADE的含义

tcp

通过tcpdump了解协议

tcpdump 80端口,执行curl www.baiducom查看数据传输过程

第一个会话窗口

[root@hadoop3 client]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.17.0.1      0.0.0.0         UG    0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 eth0
[root@hadoop3 client]# arp -an
? (172.17.0.1) at 02:42:34:d8:1b:a2 [ether] on eth0
? (172.17.0.2) at 02:42:ac:11:00:02 [ether] on eth0
# 为了查看arp请求的包
[root@hadoop3 client]# arp -d 172.17.0.1 & curl www.baidu.com

第二个会话窗口

[root@hadoop3 fd]# tcpdump -nn -i eth0 port 80  
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
# arp请求路由表的默认网关ip的mac地址
00:05:59.307023 ARP, Request who-has 172.17.0.1 tell 172.17.0.3, lenth 20
00:05:59.307029 ARP, Reply 172.17.0.1 is at 02:42:34:d8:1b:a2, lenth 46
# 三次握手
00:05:59.307046 IP 172.17.0.3.36974 > 61.135.185.32.80: Flags [S], seq 2178843908, win 29200, options [mss 1460,sackOK,TS val 2119316142 ecr 0,nop,wscale 7], length 0
00:05:59.313330 IP 61.135.185.32.80 > 172.17.0.3.36974: Flags [S.], seq 452870419, ack 2178843909, win 65535, options [mss 1460,wscale 2,eol], length 0
00:05:59.313388 IP 172.17.0.3.36974 > 61.135.185.32.80: Flags [.], ack 1, win 229, length 0
# 数据传输
00:05:59.313575 IP 172.17.0.3.36974 > 61.135.185.32.80: Flags [P.], seq 1:78, ack 1, win 229, length 77: HTTP: GET / HTTP/1.1
00:05:59.313877 IP 61.135.185.32.80 > 172.17.0.3.36974: Flags [.], ack 78, win 65535, length 0
00:05:59.321110 IP 61.135.185.32.80 > 172.17.0.3.36974: Flags [P.], seq 1:1413, ack 78, win 65535, length 1412: HTTP: HTTP/1.1 200 OK
00:05:59.321156 IP 172.17.0.3.36974 > 61.135.185.32.80: Flags [.], ack 1413, win 251, length 0
00:05:59.321304 IP 61.135.185.32.80 > 172.17.0.3.36974: Flags [P.], seq 1413:2782, ack 78, win 65535, length 1369: HTTP
00:05:59.321336 IP 172.17.0.3.36974 > 61.135.185.32.80: Flags [.], ack 2782, win 274, length 0
# 四次分手
00:05:59.321761 IP 172.17.0.3.36974 > 61.135.185.32.80: Flags [F.], seq 78, ack 2782, win 274, length 0
00:05:59.321949 IP 61.135.185.32.80 > 172.17.0.3.36974: Flags [.], ack 79, win 65535, length 0
00:05:59.326636 IP 61.135.185.32.80 > 172.17.0.3.36974: Flags [F.], seq 2782, ack 79, win 65535, length 0
00:05:59.326686 IP 172.17.0.3.36974 > 61.135.185.32.80: Flags [.], ack 2783, win 274, length 0

tcp连接与fd

可以使用bash shell中的以下语法打开TCP/UDP套接字。 execexec{file-descriptor} </dev/protocol/{protocol}/{host}/${port} “文件描述符”是与每个套接字相关联的唯一的非负整数。文件描述符0,1和2分别保留给stdin,stdout和stderr。因此,你必须指定3或更高(以未使用者为准)作为文件描述符。 “<>”意味着套接字对于读写都是打开的,根据你的需要,你可以打开只读(<)或只写(>)的套接字。

可以使用bash shell中的以下语法向TCP/UDP套接字传输数据。 echo -n "TEST MESSAGE" >/dev/udp/127.0.0.1/10015

#打开tcp链接
exec 9<> /dev/tcp/www.baidu.com/80
#查看链接的socket,对应fd9
ll /proc/$$/fd 
#向fd写入http请求
echo -e "GET / HTTP/1.0/n" 1>& 9
#读取返回值,读取要快,因为tcp链接超时会断开
cat 0<& 

网络协议栈与故障排查

服务器开发中网络故障排查经验漫谈

Linux虚拟网络

虚拟机的网络功能由虚拟网卡(vNIC)提供,Hypervisor可以为每个虚拟机创建一个或多个vNIC,从虚拟机的角度出发,这些vNIC等同于物理的网卡,为了实现与传统物理网络一样的网络功能,与物理网卡一样,Switch也被虚拟化成虚拟交换机(OpenvSwitch),各个vNIC连接在vSwitch的端口(br-int)上,最后这些vSwitch通过物理服务器的物理网卡访问外部的物理网络。

对一个虚拟的二层网络结构而言,主要是完成两种网络设备的虚拟化,即物理网卡和交换设备。在Linux环境下网络设备的虚拟化主要有以下几种形式:

1)TAP/TUN/VETH

提到Neutron的虚拟网络功能实现,不得不先提基于Linux内核级的虚拟设备。 TAP/TUN/VETH是Linux内核实现的一对虚拟网络设备,TAP工作在二层,收发的是 MAC 层数据帧;TUN工作在三层,收发的是 IP 层数据包。Linux 内核通过TAP/TUN设备向绑定该设备的用户程序发送数据,反之,用户程序也可以像操作硬件网络设备一样,通过TAP/TUN设备接收数据。 基于TAP设备,实现的是虚拟网卡的功能,当一个TAP设备被创建时,在Linux的设备文件目录下将会生成一个对应的字符设备文件(/dev/tapX文件),而运行其上的用户程序便可以像使用普通文件一样打开这个文件进行读写。 VETH设备总是成对出现的,接收数据的一端会从另一端发送出去,理解为一根虚拟的网线即可。

2)Linux Bridge

Linux Bridge(Linux内核实现的网桥)是工作在二层的虚拟网络设备,功能类似于物理的交换机。 它的实现原理是,通过将其他Linux网络设备绑定到自身的Bridge上,并将这些设备虚拟化为端口。为什么我们已经有了OVS,还要有Linux Bridge 呢?这是因为Linux Bridge实现了qbrxxx设备,提供了OVS无法支持的安全组(Security Group)功能。

3)Open vSwitch

对于云计算中的虚拟网络而言,交换设备的虚拟化是很关键的一环,vSwitch负责连接vNIC与物理网卡,同时也桥接同一物理服务器内的各个VM的vNIC。 因此,我们可以像配置物理交换机一样,将接入到OpenvSwitch(需要指出的是在多个以上时,vSwitch是分布式虚拟交换机)上的各个VM分配到不同的VLAN中实现网络隔离,并且,我们也可以在OVS端口上为VM配置QOS,同时OVS也支持包括NetFlow、sFlow等标准的管理接口和协议。从而,通过这些接口可以实现VM流量监控的任务。 运行在云环境中各种或相同虚拟化平台上的多个vSwitch实现了分布式架构的虚拟交换机。一个物理服务器上的vSwitch可以透明的与其他服务器上的vSwitch连接通信。 关于OVS更加详细的内容,请参阅其他资料。

linux网络实战

网桥

网桥工作在数据链路层,是二层网络设备。linux是通过一个虚拟的网桥设备来实现桥接的。这个虚拟设备可以绑定若干个以太网接口设备,从而将它们桥接起来。

linux配置网桥需要bridge-utils。

yum install bridge-utils
[root@A06-R12-302F0202-I32-139 test]# brctl  help
never heard of command [help]
Usage: brctl [commands]
commands:
        addbr           <bridge>                add bridge
        delbr           <bridge>                delete bridge
        addif           <bridge> <device>       add interface to bridge
        delif           <bridge> <device>       delete interface from bridge
        hairpin         <bridge> <port> {on|off}        turn hairpin on/off
        setageing       <bridge> <time>         set ageing time
        setbridgeprio   <bridge> <prio>         set bridge priority
        setfd           <bridge> <time>         set bridge forward delay
        sethello        <bridge> <time>         set hello time
        setmaxage       <bridge> <time>         set max message age
        setpathcost     <bridge> <port> <cost>  set path cost
        setportprio     <bridge> <port> <prio>  set port priority
        show            [ <bridge> ]            show a list of bridges
        showmacs        <bridge>                show a list of mac addrs
        showstp         <bridge>                show bridge stp info
        stp             <bridge> {on|off}       turn stp on/off

vlan

把同一物理局域网内的不同用户逻辑地划分成不同的广播域,各vlan通过三层路由来完成通信。Linux 里的 VLAN 设备是对 802.1.q 协议的一种内部软件实现。

linux 使用vconfig配置vlan

[root@A06-R12-302F0202-I32-139 test]# yum -y install epel-release
[root@A06-R12-302F0202-I32-139 test]# yum instal vconfig
[root@A06-R12-302F0202-I32-139 test]# modprobe  8021q

vconfig命令

[root@A06-R12-302F0202-I32-139 test]# vconfig
Expecting argc to be 3-5, inclusive.  Was: 1
Usage: add             [interface-name] [vlan_id]
       rem             [vlan-name]
       set_flag        [interface-name] [flag-num]       [0 | 1]
       set_egress_map  [vlan-name]      [skb_priority]   [vlan_qos]
       set_ingress_map [vlan-name]      [skb_priority]   [vlan_qos]
       set_name_type   [name-type]
* The [interface-name] is the name of the ethernet card that hosts
  the VLAN you are talking about.
* The vlan_id is the identifier (0-4095) of the VLAN you are operating on.
* skb_priority is the priority in the socket buffer (sk_buff).
* vlan_qos is the 3 bit priority in the VLAN header
* name-type:  VLAN_PLUS_VID (vlan0005), VLAN_PLUS_VID_NO_PAD (vlan5),
              DEV_PLUS_VID (eth0.0005), DEV_PLUS_VID_NO_PAD (eth0.5)
* bind-type:  PER_DEVICE  # Allows vlan 5 on eth0 and eth1 to be unique.
              PER_KERNEL  # Forces vlan 5 to be unique across all devices.
* FLAGS:  1 REORDER_HDR  When this is set, the VLAN device will move the
            ethernet header around to make it look exactly like a real
            ethernet device.  This may help programs such as DHCPd which
            read the raw ethernet packet and make assumptions about the
            location of bytes.  If you don't need it, don't turn it on, because
            there will be at least a small performance degradation.  Default

vlan配置 vlan的名字为interface-name.vlan_id

[root@controller ~]# vconfig  add enp0s3 100
Added VLAN with VID == 100 to IF -:enp0s3:-
[root@controller ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 08:00:27:4a:43:a8 brd ff:ff:ff:ff:ff:ff
3: enp0s3.100@enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT
    link/ether 08:00:27:4a:43:a8 brd ff:ff:ff:ff:ff:ff
[root@controller ~]# ip addr add 10.0.100.15/24 dev enp0s3.100
[root@controller ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:4a:43:a8 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 85520sec preferred_lft 85520sec
    inet6 fe80::a00:27ff:fe4a:43a8/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s3.100@enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether 08:00:27:4a:43:a8 brd ff:ff:ff:ff:ff:ff
    inet 10.0.100.15/24 scope global enp0s3.100
       valid_lft forever preferred_lft forever
 
[root@controller ~]# cat /proc/net/vlan/enp0s3.100
enp0s3.100  VID: 100     REORDER_HDR: 1  dev->priv_flags: 1
         total frames received            0
          total bytes received            0
      Broadcast/Multicast Rcvd            0
      total frames transmitted            0
       total bytes transmitted            0
Device: enp0s3
INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0
 EGRESS priority mappings:
 
[root@controller ~]# vconfig rem enp0s3.100
Removed VLAN -:enp0s3.100:-
[root@controller ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 08:00:27:4a:43:a8 brd ff:ff:ff:ff:ff:ff

tun/tap

TUN 与 TAP 是操作系统内核中的虚拟网络设备。TAP 等同于一个以太网设备,它操作第二层数据包如以太网数据帧。TUN 模拟了网络层设备,操作第三层数据包比如 IP 数据封包。tun 是点对点的设备,tap 是一个普通的以太网卡设备。tun 设备其实完全不需要有物理地址的。它收到和发出的包不需要 arp,也不需要有数据链路层的头。而 tap 设备则是有完整的物理地址和完整的以太网帧。

可以通过以下方法创建

[root@controller ~]# ip  tuntap add mode tap tap0
[root@controller ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 08:00:27:4a:43:a8 brd ff:ff:ff:ff:ff:ff
4: tap0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500
    link/ether ba:a3:0d:c4:8c:c4 brd ff:ff:ff:ff:ff:ff
[root@controller ~]# ip tuntap add mode tun tun0
[root@controller ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 08:00:27:4a:43:a8 brd ff:ff:ff:ff:ff:ff
4: tap0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500
    link/ether ba:a3:0d:c4:8c:c4 brd ff:ff:ff:ff:ff:ff
5: tun0: <POINTOPOINT,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500
    link/none

veth

虚拟ethernet接口,通常以pair的方式出现,一端发出的网包,会被另一端接收,可以形成两个网络namesapce的通道

建立 veth 类型的设备可以用下面的命令

[root@controller ~]# ip link add name veth0 type veth peer name veth1
[root@controller ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 08:00:27:4a:43:a8 brd ff:ff:ff:ff:ff:ff
13: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 7a:66:43:cf:f6:7e brd ff:ff:ff:ff:ff:ff
14: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 76:ae:b4:86:42:b8 brd ff:ff:ff:ff:ff:ff