使用Cilium 完全替换kube-proxy

630 阅读5分钟

该文档首先会提供一个没有kube-proxy的k8s集群,然后使用Cilium平替kube-proxy。为了简化,会基于kubeadm创建集群。

cilium 替换kube-proxy 依赖 主机可达性服务特性,需要使用较新的内核版本。v5.8 版本之前添加了一些进一步优化 kube-proxy 替换的实现。用5.8版本之后的即可。

1. 升级内核

# Update CentOS Repositories
yum -y update
# Enable the ELRepo Repository
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# Install the ELRepo repository
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
# List Available Kernels
yum list available --disablerepo='*' --enablerepo=elrepo-kernel
>Loaded plugins: fastestmirror
>Loading mirror speeds from cached hostfile
> * elrepo-kernel: mirror.rackspace.com
>Available Packages
>kernel-lt.x86_64                     5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-devel.x86_64               5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-doc.noarch                 5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-headers.x86_64             5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-tools.x86_64               5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-tools-libs.x86_64          5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-tools-libs-devel.x86_64    5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-ml-doc.noarch                 5.17.6-1.el7.elrepo             elrepo-kernel
>kernel-ml-tools.x86_64               5.17.6-1.el7.elrepo             elrepo-kernel
>kernel-ml-tools-libs.x86_64          5.17.6-1.el7.elrepo             elrepo-kernel
>kernel-ml-tools-libs-devel.x86_64    5.17.6-1.el7.elrepo             elrepo-kernel
>perf.x86_64                          5.17.6-1.el7.elrepo             elrepo-kernel
>python-perf.x86_64                   5.17.6-1.el7.elrepo             elrepo-kernel
# Install New CentOS Kernel Version
yum --enablerepo=elrepo-kernel install kernel-ml kernel-ml-devel kernel-ml-headers
# Set Default Kernel Version
vim /etc/default/grub  # Once the file opens, look for the line that says GRUB_DEFAULT=X, and change it to GRUB_DEFAULT=0 (zero). This line will instruct the boot loader to default to the first kernel on the list, which is the latest.
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot

如果熟练使用kubesray的话,那么以下手动操作的步骤(2,3,4,5)可直接基于kubespray负责安装即可,包括跳过kube-proxy的安装:github.com/kubernetes-…

2. 安装contanerd 以及crictl

开启ipv4转发,以及允许iptables 看到桥接的流量

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system
# generate containerd config
containerd config default | tee /etc/containerd/config.toml
# change containerd use SystemdCgroup
sed -i "s/SystemdCgroup = false/SystemdCgroup = true/g" /etc/containerd/config.toml
# restart containerd
systemctl restart containerd
# crictl
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
EOF

3. Install kubeadm

kubeadm 官网

4. Install helm

helm 官网

5. 创建k8s集群

Initialize the control-plane node via kubeadm init and skip the installation of the kube-proxy add-on:

kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/16 --skip-phases=addon/kube-proxy
# optional
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

重点部分开始...

6. helm 安装cilium

helm repo add cilium https://helm.cilium.io/

安装cilium的时候注意根据集群配置修改参数

cilium helm chart values help

helm install cilium cilium/cilium --version 1.11.4 \
--namespace kube-system \
--set operator.replicas=1 \
--set nodeinit.enabled=true \
--set nodeinit.restartPods=true \
--set externalIPs.enabled=true \
--set nodePort.enabled=true \
--set hostPort.enabled=true \
--set tunnel=disabled \
--set bpf.masquerade=true \
--set bpf.clockProbe=true \
--set bpf.waitForMount=true \
--set bpf.preallocateMaps=true \
--set bpf.tproxy=true \
--set bpf.hostRouting=true \
--set autoDirectNodeRoutes=true \
--set localRedirectPolicy=true \
--set enableCiliumEndpointSlice=true \
--set enableK8sEventHandover=true \
--set enableK8sEndpointSlice=true \
--set wellKnownIdentities.enabled=true \
--set sockops.enabled=true \
--set ipam.operator.clusterPoolIPv4PodCIDRList=10.244.0.0/16 \
--set ipv4NativeRoutingCIDR=10.244.0.0/16 \
--set nodePort.directRoutingDevice=eth0 \
--set devices=eth0 \
--set bandwidthManager=true \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set installNoConntrackIptablesRules=true \
--set egressGateway.enabled=true \
--set endpointRoutes.enabled=true \
--set pullPolicy=IfNotPresent \
--set kubeProxyReplacement=strict \
--set loadBalancer.algorithm=maglev \
--set loadBalancer.mode=dsr \
--set hostServices.enabled=true \
--set k8sServiceHost=172.16.127.45 \
--set k8sServicePort=6443

检查cilium状态

root@test ~ 10:31:38 # kubectl -n kube-system get pod -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP              NODE   NOMINATED NODE   READINESS GATES
cilium-8wvvp                       1/1     Running   0          32m   172.16.127.45   test   <none>           <none>
cilium-node-init-f5rmz             1/1     Running   0          32m   172.16.127.45   test   <none>           <none>
cilium-operator-7469d54548-4pr9c   1/1     Running   0          32m   172.16.127.45   test   <none>           <none>
coredns-6d4b75cb6d-956sh           1/1     Running   0          33m   10.244.0.146    test   <none>           <none>
coredns-6d4b75cb6d-wjk9p           1/1     Running   0          33m   10.244.0.105    test   <none>           <none>
etcd-test                          1/1     Running   1          34m   172.16.127.45   test   <none>           <none>
kube-apiserver-test                1/1     Running   0          34m   172.16.127.45   test   <none>           <none>
kube-controller-manager-test       1/1     Running   0          34m   172.16.127.45   test   <none>           <none>
kube-scheduler-test                1/1     Running   1          34m   172.16.127.45   test   <none>           <none>

查看cilium status

root@test ~ 10:40:14 # kubectl exec -it -n kube-system cilium-8wvvp -- cilium status --verbose
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), wait-for-node-init (init), clean-cilium-state (init)
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.24 (v1.24.0) [linux/amd64]
Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEgressNATPolicy", "cilium/v2::CiliumLocalRedirectPolicy", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumEndpointSlice", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   Strict   [eth0 172.16.127.45 (Direct Routing)]
Host firewall:          Disabled
Cilium:                 Ok   1.11.4 (v1.11.4-9d25463)
NodeMonitor:            Disabled
Cilium health daemon:   Ok   
IPAM:                   IPv4: 4/254 allocated from 10.244.0.0/24, 
Allocated addresses:
  10.244.0.105 (kube-system/coredns-6d4b75cb6d-wjk9p)
  10.244.0.146 (kube-system/coredns-6d4b75cb6d-956sh)
  10.244.0.225 (router)
  10.244.0.239 (health)
BandwidthManager:       EDT with BPF   [eth0]
Host Routing:           Legacy
Masquerading:           BPF   [eth0]   10.244.0.0/16 [IPv4: Enabled, IPv6: Disabled]
Clock Source for BPF:   ktime
Controller Status:      30/30 healthy
  Name                                  Last success   Last error   Count   Message
  bpf-map-sync-cilium_ipcache           1s ago         34m13s ago   0       no error   
  bpf-map-sync-cilium_throttle          5s ago         never        0       no error   
  cilium-health-ep                      54s ago        never        0       no error   
  dns-garbage-collector-job             12s ago        never        0       no error   
  endpoint-1338-regeneration-recovery   never          never        0       no error   
  endpoint-428-regeneration-recovery    never          never        0       no error   
  endpoint-723-regeneration-recovery    never          never        0       no error   
  endpoint-815-regeneration-recovery    never          never        0       no error   
  endpoint-gc                           4m13s ago      never        0       no error   
  ipcache-inject-labels                 34m9s ago      34m11s ago   0       no error   
  k8s-heartbeat                         13s ago        never        0       no error   
  mark-k8s-node-as-available            33m55s ago     never        0       no error   
  metricsmap-bpf-prom-sync              2s ago         never        0       no error   
  resolve-identity-1338                 3m55s ago      never        0       no error   
  resolve-identity-428                  3m56s ago      never        0       no error   
  resolve-identity-723                  3m55s ago      never        0       no error   
  resolve-identity-815                  3m55s ago      never        0       no error   
  sync-endpoints-and-host-ips           56s ago        never        0       no error   
  sync-lb-maps-with-k8s-services        33m56s ago     never        0       no error   
  sync-node-with-ciliumnode (test)      34m10s ago     34m11s ago   0       no error   
  sync-policymap-1338                   43s ago        never        0       no error   
  sync-policymap-428                    42s ago        never        0       no error   
  sync-policymap-723                    43s ago        never        0       no error   
  sync-policymap-815                    43s ago        never        0       no error   
  sync-to-k8s-ciliumendpoint (1338)     5s ago         never        0       no error   
  sync-to-k8s-ciliumendpoint (428)      5s ago         never        0       no error   
  sync-to-k8s-ciliumendpoint (723)      5s ago         never        0       no error   
  sync-to-k8s-ciliumendpoint (815)      4s ago         never        0       no error   
  template-dir-watcher                  never          never        0       no error   
  update-k8s-node-annotations           34m11s ago     never        0       no error   
Proxy Status:   OK, ip 10.244.0.225, 0 redirects active on ports 10000-20000
Hubble:         Disabled
KubeProxyReplacement Details:
  Status:                 Strict
  Socket LB Protocols:    TCP, UDP
  Devices:                eth0 172.16.127.45 (Direct Routing)
  Mode:                   DSR
  Backend Selection:      Maglev (Table Size: 16381)
  Session Affinity:       Enabled
  Graceful Termination:   Enabled
  XDP Acceleration:       Disabled
  Services:
  - ClusterIP:      Enabled
  - NodePort:       Enabled (Range: 30000-32767) 
  - LoadBalancer:   Enabled 
  - externalIPs:    Enabled 
  - HostPort:       Enabled
BPF Maps:   dynamic sizing: on (ratio: 0.002500)
  Name                          Size
  Non-TCP connection tracking   65536
  TCP connection tracking       131072
  Endpoint policy               65535
  Events                        64
  IP cache                      512000
  IP masquerading agent         16384
  IPv4 fragmentation            8192
  IPv4 service                  65536
  IPv6 service                  65536
  IPv4 service backend          65536
  IPv6 service backend          65536
  IPv4 service reverse NAT      65536
  IPv6 service reverse NAT      65536
  Metrics                       1024
  NAT                           131072
  Neighbor table                131072
  Global policy                 16384
  Per endpoint policy           65536
  Session affinity              65536
  Signal                        64
  Sockmap                       65535
  Sock reverse NAT              65536
  Tunnel                        65536
Encryption:          Disabled
Cluster health:      1/1 reachable   (2022-05-12T02:41:03Z)
  Name               IP              Node        Endpoints
  test (localhost)   172.16.127.45   reachable   reachable

确认服务 可以看到和ipvs大致一致

root@test ~ 10:43:41 # kubectl exec -it -n kube-system cilium-8wvvp -- cilium service list
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), wait-for-node-init (init), clean-cilium-state (init)
ID   Frontend          Service Type   Backend                   
1    10.96.0.1:443     ClusterIP      1 => 172.16.127.45:6443   
2    10.96.0.10:53     ClusterIP      1 => 10.244.0.146:53      
                                      2 => 10.244.0.105:53      
3    10.96.0.10:9153   ClusterIP      1 => 10.244.0.146:9153    
                                      2 => 10.244.0.105:9153

可以看到iptables以及ipvs都是空记录

root@test ~ 10:43:17 # iptables-save | grep KUBE-SVC
[ empty line ]
root@test ~ 10:43:12 # ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

REF

www.yfdou.com/archives/us…