该文档首先会提供一个没有kube-proxy的k8s集群,然后使用Cilium平替kube-proxy。为了简化,会基于kubeadm创建集群。
cilium 替换kube-proxy 依赖 主机可达性服务特性,需要使用较新的内核版本。v5.8 版本之前添加了一些进一步优化 kube-proxy 替换的实现。用5.8版本之后的即可。
1. 升级内核
# Update CentOS Repositories
yum -y update
# Enable the ELRepo Repository
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# Install the ELRepo repository
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
# List Available Kernels
yum list available --disablerepo='*' --enablerepo=elrepo-kernel
>Loaded plugins: fastestmirror
>Loading mirror speeds from cached hostfile
> * elrepo-kernel: mirror.rackspace.com
>Available Packages
>kernel-lt.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-devel.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-doc.noarch 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-headers.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-tools.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-tools-libs.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-tools-libs-devel.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-ml-doc.noarch 5.17.6-1.el7.elrepo elrepo-kernel
>kernel-ml-tools.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
>kernel-ml-tools-libs.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
>kernel-ml-tools-libs-devel.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
>perf.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
>python-perf.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
# Install New CentOS Kernel Version
yum --enablerepo=elrepo-kernel install kernel-ml kernel-ml-devel kernel-ml-headers
# Set Default Kernel Version
vim /etc/default/grub # Once the file opens, look for the line that says GRUB_DEFAULT=X, and change it to GRUB_DEFAULT=0 (zero). This line will instruct the boot loader to default to the first kernel on the list, which is the latest.
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
如果熟练使用kubesray的话,那么以下手动操作的步骤(2,3,4,5)可直接基于kubespray负责安装即可,包括跳过kube-proxy的安装:github.com/kubernetes-…
2. 安装contanerd 以及crictl
开启ipv4转发,以及允许iptables 看到桥接的流量
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply sysctl params without reboot
sudo sysctl --system
# generate containerd config
containerd config default | tee /etc/containerd/config.toml
# change containerd use SystemdCgroup
sed -i "s/SystemdCgroup = false/SystemdCgroup = true/g" /etc/containerd/config.toml
# restart containerd
systemctl restart containerd
# crictl
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
EOF
3. Install kubeadm
4. Install helm
5. 创建k8s集群
Initialize the control-plane node via kubeadm init and skip the installation of the kube-proxy add-on:
kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/16 --skip-phases=addon/kube-proxy
# optional
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
重点部分开始...
6. helm 安装cilium
helm repo add cilium https://helm.cilium.io/
安装cilium的时候注意根据集群配置修改参数
helm install cilium cilium/cilium --version 1.11.4 \
--namespace kube-system \
--set operator.replicas=1 \
--set nodeinit.enabled=true \
--set nodeinit.restartPods=true \
--set externalIPs.enabled=true \
--set nodePort.enabled=true \
--set hostPort.enabled=true \
--set tunnel=disabled \
--set bpf.masquerade=true \
--set bpf.clockProbe=true \
--set bpf.waitForMount=true \
--set bpf.preallocateMaps=true \
--set bpf.tproxy=true \
--set bpf.hostRouting=true \
--set autoDirectNodeRoutes=true \
--set localRedirectPolicy=true \
--set enableCiliumEndpointSlice=true \
--set enableK8sEventHandover=true \
--set enableK8sEndpointSlice=true \
--set wellKnownIdentities.enabled=true \
--set sockops.enabled=true \
--set ipam.operator.clusterPoolIPv4PodCIDRList=10.244.0.0/16 \
--set ipv4NativeRoutingCIDR=10.244.0.0/16 \
--set nodePort.directRoutingDevice=eth0 \
--set devices=eth0 \
--set bandwidthManager=true \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set installNoConntrackIptablesRules=true \
--set egressGateway.enabled=true \
--set endpointRoutes.enabled=true \
--set pullPolicy=IfNotPresent \
--set kubeProxyReplacement=strict \
--set loadBalancer.algorithm=maglev \
--set loadBalancer.mode=dsr \
--set hostServices.enabled=true \
--set k8sServiceHost=172.16.127.45 \
--set k8sServicePort=6443
检查cilium状态
root@test ~ 10:31:38 # kubectl -n kube-system get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-8wvvp 1/1 Running 0 32m 172.16.127.45 test <none> <none>
cilium-node-init-f5rmz 1/1 Running 0 32m 172.16.127.45 test <none> <none>
cilium-operator-7469d54548-4pr9c 1/1 Running 0 32m 172.16.127.45 test <none> <none>
coredns-6d4b75cb6d-956sh 1/1 Running 0 33m 10.244.0.146 test <none> <none>
coredns-6d4b75cb6d-wjk9p 1/1 Running 0 33m 10.244.0.105 test <none> <none>
etcd-test 1/1 Running 1 34m 172.16.127.45 test <none> <none>
kube-apiserver-test 1/1 Running 0 34m 172.16.127.45 test <none> <none>
kube-controller-manager-test 1/1 Running 0 34m 172.16.127.45 test <none> <none>
kube-scheduler-test 1/1 Running 1 34m 172.16.127.45 test <none> <none>
查看cilium status
root@test ~ 10:40:14 # kubectl exec -it -n kube-system cilium-8wvvp -- cilium status --verbose
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), wait-for-node-init (init), clean-cilium-state (init)
KVStore: Ok Disabled
Kubernetes: Ok 1.24 (v1.24.0) [linux/amd64]
Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEgressNATPolicy", "cilium/v2::CiliumLocalRedirectPolicy", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumEndpointSlice", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: Strict [eth0 172.16.127.45 (Direct Routing)]
Host firewall: Disabled
Cilium: Ok 1.11.4 (v1.11.4-9d25463)
NodeMonitor: Disabled
Cilium health daemon: Ok
IPAM: IPv4: 4/254 allocated from 10.244.0.0/24,
Allocated addresses:
10.244.0.105 (kube-system/coredns-6d4b75cb6d-wjk9p)
10.244.0.146 (kube-system/coredns-6d4b75cb6d-956sh)
10.244.0.225 (router)
10.244.0.239 (health)
BandwidthManager: EDT with BPF [eth0]
Host Routing: Legacy
Masquerading: BPF [eth0] 10.244.0.0/16 [IPv4: Enabled, IPv6: Disabled]
Clock Source for BPF: ktime
Controller Status: 30/30 healthy
Name Last success Last error Count Message
bpf-map-sync-cilium_ipcache 1s ago 34m13s ago 0 no error
bpf-map-sync-cilium_throttle 5s ago never 0 no error
cilium-health-ep 54s ago never 0 no error
dns-garbage-collector-job 12s ago never 0 no error
endpoint-1338-regeneration-recovery never never 0 no error
endpoint-428-regeneration-recovery never never 0 no error
endpoint-723-regeneration-recovery never never 0 no error
endpoint-815-regeneration-recovery never never 0 no error
endpoint-gc 4m13s ago never 0 no error
ipcache-inject-labels 34m9s ago 34m11s ago 0 no error
k8s-heartbeat 13s ago never 0 no error
mark-k8s-node-as-available 33m55s ago never 0 no error
metricsmap-bpf-prom-sync 2s ago never 0 no error
resolve-identity-1338 3m55s ago never 0 no error
resolve-identity-428 3m56s ago never 0 no error
resolve-identity-723 3m55s ago never 0 no error
resolve-identity-815 3m55s ago never 0 no error
sync-endpoints-and-host-ips 56s ago never 0 no error
sync-lb-maps-with-k8s-services 33m56s ago never 0 no error
sync-node-with-ciliumnode (test) 34m10s ago 34m11s ago 0 no error
sync-policymap-1338 43s ago never 0 no error
sync-policymap-428 42s ago never 0 no error
sync-policymap-723 43s ago never 0 no error
sync-policymap-815 43s ago never 0 no error
sync-to-k8s-ciliumendpoint (1338) 5s ago never 0 no error
sync-to-k8s-ciliumendpoint (428) 5s ago never 0 no error
sync-to-k8s-ciliumendpoint (723) 5s ago never 0 no error
sync-to-k8s-ciliumendpoint (815) 4s ago never 0 no error
template-dir-watcher never never 0 no error
update-k8s-node-annotations 34m11s ago never 0 no error
Proxy Status: OK, ip 10.244.0.225, 0 redirects active on ports 10000-20000
Hubble: Disabled
KubeProxyReplacement Details:
Status: Strict
Socket LB Protocols: TCP, UDP
Devices: eth0 172.16.127.45 (Direct Routing)
Mode: DSR
Backend Selection: Maglev (Table Size: 16381)
Session Affinity: Enabled
Graceful Termination: Enabled
XDP Acceleration: Disabled
Services:
- ClusterIP: Enabled
- NodePort: Enabled (Range: 30000-32767)
- LoadBalancer: Enabled
- externalIPs: Enabled
- HostPort: Enabled
BPF Maps: dynamic sizing: on (ratio: 0.002500)
Name Size
Non-TCP connection tracking 65536
TCP connection tracking 131072
Endpoint policy 65535
Events 64
IP cache 512000
IP masquerading agent 16384
IPv4 fragmentation 8192
IPv4 service 65536
IPv6 service 65536
IPv4 service backend 65536
IPv6 service backend 65536
IPv4 service reverse NAT 65536
IPv6 service reverse NAT 65536
Metrics 1024
NAT 131072
Neighbor table 131072
Global policy 16384
Per endpoint policy 65536
Session affinity 65536
Signal 64
Sockmap 65535
Sock reverse NAT 65536
Tunnel 65536
Encryption: Disabled
Cluster health: 1/1 reachable (2022-05-12T02:41:03Z)
Name IP Node Endpoints
test (localhost) 172.16.127.45 reachable reachable
确认服务 可以看到和ipvs大致一致
root@test ~ 10:43:41 # kubectl exec -it -n kube-system cilium-8wvvp -- cilium service list
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), wait-for-node-init (init), clean-cilium-state (init)
ID Frontend Service Type Backend
1 10.96.0.1:443 ClusterIP 1 => 172.16.127.45:6443
2 10.96.0.10:53 ClusterIP 1 => 10.244.0.146:53
2 => 10.244.0.105:53
3 10.96.0.10:9153 ClusterIP 1 => 10.244.0.146:9153
2 => 10.244.0.105:9153
可以看到iptables以及ipvs都是空记录
root@test ~ 10:43:17 # iptables-save | grep KUBE-SVC
[ empty line ]
root@test ~ 10:43:12 # ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn