MetalLB
Kubernetes 不提供网络负载均衡器的实现(LoadBalancer 类型的服务) 用于裸机集群,MetalLB 可以提供与标准网络设备集成的网络负载均衡器。支持Layer 2模式 (ARP/NDP)和BGP模式。
部署使用
部署
如果集群是使用IPVS模式下kube-proxy,则从kubernetes v.1.14.2开始,必须启用ARP模式
$ kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e "s/strictARP: false/strictARP: true/" | \
kubectl diff -f - -n kube-system
- strictARP: false
+ strictARP: true
# 修改
$ kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e "s/strictARP: false/strictARP: true/" | \
kubectl apply -f - -n kube-system
默认会将MetalLB部署到metallb-system 的名称空间
# L2
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.7/config/manifests/metallb-native.yaml
# frr
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.7/config/manifests/metallb-frr.yaml
$ kubectl get po -n metallb-system
NAME READY STATUS RESTARTS AGE
controller-577b5bdfcc-vd4dr 1/1 Running 0 32s
speaker-fl9mw 1/1 Running 0 32s
speaker-nllmp 1/1 Running 0 32s
speaker-xkxpb 1/1 Running 0 32s
Layer2模式
在Layer2模式下,服务 IP 的所有流量都进入一个节点,kube-proxy再将流量传播到所有服务的pod。Layer2模式本质上没有实现负载均衡器,但是实现了故障转移机制。
与 Keepalived 的比较:
- Keepalived使用虚拟路由器冗余协议(VRRP)。Keepalived的实例不断相互交换VRRP消息,以选择领导者并注意该领导者何时离开。
- MetalLB依赖于memberlist了解集群中的节点何时无法再访问,以及该节点的服务IP应移动到其他地方
配置
创建IPAddressPool,并指定用于分配的IP池
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: ip-pool
namespace: metallb-system
spec:
addresses:
- 10.246.159.235-10.246.159.240 #分配给LB的IP池
创建广播声明,此处未指定IP池,则默认会使用所有IP池地址
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: l2adver
namespace: metallb-system
spec:
ipAddressPools: # 可以指定对应的ipAddressPools
- first-pool
nodeSelectors:
- matchLabels:
kubernetes.io/hostname: kube-master65
interfaces:
- eth0
查看
$ kubectl get IPAddressPool -n metallb-system
NAME AUTO ASSIGN AVOID BUGGY IPS ADDRESSES
ip-pool true false ["10.246.159.235-10.246.159.240"]
$ kubectl get L2Advertisement -n metallb-system
NAME IPADDRESSPOOLS IPADDRESSPOOL SELECTORS INTERFACES
l2adver
BGP模式
在BGP模式下,集群中的每个节点与您的网络路由器建立BGP对等会话,并使用该对等会话来宣传外部集群服务的IP。在数据包到达节点后,kube-proxy负责流量路由的最终跳转,将数据包发送到服务中的一个特定pod。
负载平衡的确切行为取决于您的特定路由器模型和配置,但通常的行为是基于数据包散列(3元组和5元组)来平衡每个连接。
配置
使用FRR模式,BGP会话可以通过BFD会话备份,以提供比BGP本身更快的路径故障检测(可选)
apiVersion: metallb.io/v1beta1
kind: BFDProfile
metadata:
name: testbfdprofile
namespace: metallb-system
spec:
receiveInterval: 380
transmitInterval: 270
创建BGPPeer
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
name: sample
namespace: metallb-system
spec:
myASN: 64500
peerASN: 64501
peerAddress: 10.0.0.1
#bfdProfile: testbfdprofile # bfd检测
配置IP地址池(和L2一样)
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: first-pool
namespace: metallb-system
spec:
addresses:
- 192.168.10.0/24
创建广播声明
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: bgpadver
namespace: metallb-system
# 添加以下,就是指定对应的ipAddressPools
#spec:
# ipAddressPools:
# - first-pool
L2模式下测试
apiVersion: v1
kind: Service
metadata:
name: myapp-svc
spec:
selector:
app: myapp
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer #类型选择LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deployment
labels:
app: myapp
spec:
replicas: 2
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: nginx
image: nginx:1.19.4
ports:
- containerPort: 80
查看LoadBalancer
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myapp-svc LoadBalancer 10.140.178.75 10.246.159.235 80:31077/TCP 9s
$ nc -v 10.246.159.235 80
Connection to 10.246.159.235 port 80 [tcp/http] succeeded!
正常测试
# 在另外一台机器上测试
$ ab -n 100000 -c 100 http://10.246.159.235:80/
# 在kube-master65上抓包
kube-master65:~# tcpdump -ni eth0 -c 100 host 10.246.159.235
14:28:12.512808 IP 10.246.159.235.80 > 10.246.159.76.51894: Flags [S.], seq 811677203, ack 1143793368, win 64260, options [mss 1440,sackOK,TS val 2680405645 ecr 786231733,nop,wscale 7], length 0
14:28:12.512826 IP 10.246.159.235.80 > 10.246.159.76.51896: Flags [S.], seq 777228997, ack 898907841, win 64260, options [mss 1440,sackOK,TS val 2680405645 ecr 786231733,nop,wscale 7], length 0
14:28:12.512846 IP 10.246.159.235.80 > 10.246.159.76.51898: Flags [S.], seq 2589275264, ack 165492660, win 64260, options [mss 1440,sackOK,TS val 2680405645 ecr 786231733,nop,wscale 7], length 0
100 packets captured
111 packets received by filter
0 packets dropped by kernel
但是如果只配置了nodeSelectors,当指定节点宕机,测试发现ip会无法满足匹配条件而无法迁移,所以nodeSelectors要配置多个或者不用配置。
配置多个nodeSelectors,一台宕机后,会以1、2、3、5秒发起检测,检测都失败后就进行迁移
14:46:18.244308 IP 10.246.159.76.54092 > 10.246.159.235.80: Flags [S], seq 1393918368, win 64240, options [mss 1460,sackOK,TS val 787317457 ecr 0,nop,wscale 7], length 0
14:46:19.264137 IP 10.246.159.76.54092 > 10.246.159.235.80: Flags [S], seq 1393918368, win 64240, options [mss 1460,sackOK,TS val 787318476 ecr 0,nop,wscale 7], length 0
14:46:21.280119 IP 10.246.159.76.54092 > 10.246.159.235.80: Flags [S], seq 1393918368, win 64240, options [mss 1460,sackOK,TS val 787320492 ecr 0,nop,wscale 7], length 0
14:46:25.312180 IP 10.246.159.76.54092 > 10.246.159.235.80: Flags [S], seq 1393918368, win 64240, options [mss 1460,sackOK,TS val 787324524 ecr 0,nop,wscale 7], length 0
14:46:33.504243 IP 10.246.159.76.54092 > 10.246.159.235.80: Flags [S], seq 1393918368, win 64240, options [mss 1460,sackOK,TS val 787332716 ecr 0,nop,wscale 7], length 0