cilium agent mount BPF
root@cili-control:~# k get po -n kube-system cilium-f2rh4 -o yaml | grep "/sys/fs/bpf"
- mountPath: /sys/fs/bpf
- mount | grep "/sys/fs/bpf type bpf" || mount -t bpf bpf /sys/fs/bpf
- mountPath: /sys/fs/bpf
- mountPath: /sys/fs/bpf
path: /sys/fs/bpf
- mountPath: /sys/fs/bpf
- mountPath: /sys/fs/bpf
- mountPath: /sys/fs/bpf
root@cili-control:~# k get po -n kube-system cilium-f2rh4 -o yaml | grep "bpf"
- mountPath: /sys/fs/bpf
name: bpf-maps
- mount | grep "/sys/fs/bpf type bpf" || mount -t bpf bpf /sys/fs/bpf
name: mount-bpf-fs
- mountPath: /sys/fs/bpf
name: bpf-maps
key: clean-cilium-bpf-state
- mountPath: /sys/fs/bpf
name: bpf-maps
path: /sys/fs/bpf
name: bpf-maps
- mountPath: /sys/fs/bpf
name: bpf-maps
name: mount-bpf-fs
- mountPath: /sys/fs/bpf
name: bpf-maps
- mountPath: /sys/fs/bpf
name: bpf-maps
############### 这里只使用了 tc
root@cili-control:~# tree /sys/fs/bpf/
/sys/fs/bpf/
└── tc
└── globals
├── cilium_auth_map
├── cilium_call_policy
├── cilium_calls_00360
├── cilium_calls_hostns_02993
├── cilium_calls_netdev_00006
├── cilium_calls_overlay_2
├── cilium_ct4_global
├── cilium_ct_any4_global
├── cilium_encrypt_state
├── cilium_events
├── cilium_ipcache
├── cilium_ipv4_frag_datagrams
├── cilium_l2_responder_v4
├── cilium_lb4_backends_v3
├── cilium_lb4_reverse_nat
├── cilium_lb4_services_v2
├── cilium_lxc
├── cilium_metrics
├── cilium_node_map
├── cilium_policy_00360
├── cilium_policy_02993
├── cilium_ratelimit
├── cilium_runtime_config
├── cilium_signals
└── cilium_tunnel_map
2 directories, 25 files
在 Cilium 中,使用 tc(Traffic Control,流量控制工具)主要是为了在 Linux 内核的网络协议栈中插入 BPF 程序,实现对容器网络流量的精细化控制。结合你提供的 /sys/fs/bpf/tc/globals 目录下的 BPF 映射(maps),可以具体解释 tc 的作用如下:
1. 在网络设备层面挂载 BPF 程序
tc 是 Linux 内核提供的流量控制框架,允许在网络接口(如容器虚拟网卡、主机物理网卡)的 “接收”(ingress)或 “发送”(egress)路径上附加 BPF 程序。Cilium 利用这一机制,将 BPF 程序挂载到容器网络的关键节点(如容器的 veth 接口、主机的网桥或物理网卡),从而在流量进入 / 离开容器时进行实时处理。
2. 核心功能场景
结合你看到的 BPF 映射(如 cilium_policy_*、cilium_ct4_global 等),tc 挂载的 BPF 程序主要负责:
- **网络策略 enforcement(策略执行)**通过
cilium_policy_*等映射存储的网络策略规则,tc层的 BPF 程序会在流量经过时检查源 / 目的 IP、端口、协议等信息,决定是否允许 / 拒绝流量(实现 L3/L4 甚至 L7 策略)。 - 连接跟踪(Connection Tracking)
cilium_ct4_global(IPv4 连接跟踪)等映射用于记录连接状态(如新建、 Established、关闭),tc层的 BPF 程序依赖这些状态实现状态化策略(如允许回包、拒绝无效连接)。 - 负载均衡(Load Balancing)
cilium_lb4_services_v2(服务信息)和cilium_lb4_backends_v3(后端节点)等映射存储了 Kubernetes Service 的负载均衡规则,tc层的 BPF 程序会在此处完成流量的转发决策(如将请求分发到后端 Pod)。 - 地址转换(NAT)
cilium_lb4_reverse_nat等映射用于记录反向 NAT 规则,tc层的 BPF 程序会处理 Pod 与外部通信时的地址转换(如将 Pod IP 转换为主机 IP 以实现外部访问)。 - 隧道封装 / 解封装(如 VXLAN)
cilium_tunnel_map存储了节点间隧道信息,tc层的 BPF 程序可能参与跨节点容器通信时的隧道封装(如给流量加上 VXLAN 头部)或解封装。 - 流量监控与 metrics
cilium_metrics等映射用于收集流量统计信息(如吞吐量、丢包数),tc层的 BPF 程序会实时更新这些指标,供 Cilium 监控和告警使用。
3. 为什么用 tc 而不是其他机制?
Cilium 选择 tc 是因为它能在网络协议栈的早期阶段(接近网卡驱动层)处理流量,相比传统的 iptables(工作在协议栈上层),具有更低的延迟和更高的性能。同时,tc 支持 BPF 程序的动态挂载和更新,满足 Cilium 对网络策略实时生效的需求。
总结来说,Cilium 中的 tc 是 BPF 程序的 “挂载点”,通过在网络设备的流量路径上插入 BPF 逻辑,实现了容器网络的策略控制、负载均衡、连接跟踪等核心功能,是 Cilium 高性能网络方案的关键组成部分。
环境信息
root@cili-control:~# k get node -A -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
cili-control Ready control-plane 7m15s v1.33.1 11.0.1.200 <none> Ubuntu 22.04.5 LTS 5.15.0-160-generic containerd://1.7.13
cili-work1 Ready worker 7m5s v1.33.1 11.0.1.201 <none> Ubuntu 22.04.5 LTS 5.15.0-160-generic containerd://1.7.13
cili-work2 Ready worker 7m5s v1.33.1 11.0.1.202 <none> Ubuntu 22.04.5 LTS 5.15.0-160-generic containerd://1.7.13
root@cili-control:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:0c:29:c6:51:a4 brd ff:ff:ff:ff:ff:ff
altname enp3s0
inet 11.0.1.200/24 brd 11.0.1.255 scope global ens160
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fec6:51a4/64 scope link
valid_lft forever preferred_lft forever
3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:0c:29:c6:51:ae brd ff:ff:ff:ff:ff:ff
altname enp11s0
inet 11.0.2.200/24 brd 11.0.2.255 scope global ens192
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fec6:51ae/64 scope link
valid_lft forever preferred_lft forever
4: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 56:3c:49:e5:6c:d2 brd ff:ff:ff:ff:ff:ff
inet 169.254.25.10/32 scope global nodelocaldns
valid_lft forever preferred_lft forever
5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether ca:7c:9f:83:93:25 brd ff:ff:ff:ff:ff:ff
inet 10.233.0.1/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.233.0.3/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.233.250.103/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
6: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fa:73:c8:76:ae:b4 brd ff:ff:ff:ff:ff:ff
inet6 fe80::f873:c8ff:fe76:aeb4/64 scope link
valid_lft forever preferred_lft forever
7: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 5a:21:54:a0:8d:00 brd ff:ff:ff:ff:ff:ff
inet 10.222.2.61/32 scope global cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::5821:54ff:fea0:8d00/64 scope link
valid_lft forever preferred_lft forever
8: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 1e:61:6a:27:d7:f9 brd ff:ff:ff:ff:ff:ff
inet6 fe80::1c61:6aff:fe27:d7f9/64 scope link
valid_lft forever preferred_lft forever
10: lxc_health@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 8a:a0:f5:26:fd:fe brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::88a0:f5ff:fe26:fdfe/64 scope link
valid_lft forever preferred_lft forever
root@cili-control:~#
root@cili-control:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 11.0.1.2 0.0.0.0 UG 0 0 0 ens160
10.222.0.0 10.222.2.61 255.255.255.0 UG 0 0 0 cilium_host
10.222.1.0 10.222.2.61 255.255.255.0 UG 0 0 0 cilium_host
10.222.2.0 10.222.2.61 255.255.255.0 UG 0 0 0 cilium_host
10.222.2.61 0.0.0.0 255.255.255.255 UH 0 0 0 cilium_host
11.0.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160
11.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 ens192
root@cili-control:~# kgp | grep coredns
kube-system coredns-5847cf56c7-8mx6p 1/1 Running 0 3m23s 10.222.1.33 cili-work1 <none> <none>
kube-system coredns-5847cf56c7-hqk4f 1/1 Running 0 3m23s 10.222.1.158 cili-work1 <none> <none>
root@cili-control:~# kgp | grep cilium
kube-system cilium-2hnpl 1/1 Running 0 3m53s 11.0.1.202 cili-work2 <none> <none>
kube-system cilium-f2rh4 1/1 Running 0 3m53s 11.0.1.200 cili-control <none> <none>
kube-system cilium-operator-7c6b45754-lp9nd 1/1 Running 0 3m53s 11.0.1.201 cili-work1 <none> <none>
kube-system cilium-st9hs 1/1 Running 0 3m53s 11.0.1.201 cili-work1 <none> <none>
root@cili-control:~# k get po -n kube-system cilium-f2rh4 -o yaml | grep "/sys/fs/bpf"
- mountPath: /sys/fs/bpf
- mount | grep "/sys/fs/bpf type bpf" || mount -t bpf bpf /sys/fs/bpf
- mountPath: /sys/fs/bpf
- mountPath: /sys/fs/bpf
path: /sys/fs/bpf
- mountPath: /sys/fs/bpf
- mountPath: /sys/fs/bpf
- mountPath: /sys/fs/bpf
root@cili-control:~# k get po -n kube-system cilium-f2rh4 -o yaml | grep "bpf"
- mountPath: /sys/fs/bpf
name: bpf-maps
- mount | grep "/sys/fs/bpf type bpf" || mount -t bpf bpf /sys/fs/bpf
name: mount-bpf-fs
- mountPath: /sys/fs/bpf
name: bpf-maps
key: clean-cilium-bpf-state
- mountPath: /sys/fs/bpf
name: bpf-maps
path: /sys/fs/bpf
name: bpf-maps
- mountPath: /sys/fs/bpf
name: bpf-maps
name: mount-bpf-fs
- mountPath: /sys/fs/bpf
name: bpf-maps
- mountPath: /sys/fs/bpf
name: bpf-maps
root@cili-control:~# tree /sys/fs/bpf/
/sys/fs/bpf/
└── tc
└── globals
├── cilium_auth_map
├── cilium_call_policy
├── cilium_calls_00360
├── cilium_calls_hostns_02993
├── cilium_calls_netdev_00006
├── cilium_calls_overlay_2
├── cilium_ct4_global
├── cilium_ct_any4_global
├── cilium_encrypt_state
├── cilium_events
├── cilium_ipcache
├── cilium_ipv4_frag_datagrams
├── cilium_l2_responder_v4
├── cilium_lb4_backends_v3
├── cilium_lb4_reverse_nat
├── cilium_lb4_services_v2
├── cilium_lxc
├── cilium_metrics
├── cilium_node_map
├── cilium_policy_00360
├── cilium_policy_02993
├── cilium_ratelimit
├── cilium_runtime_config
├── cilium_signals
└── cilium_tunnel_map
2 directories, 25 files
3. calico 也有使用 BPF
如果把 calico 切换到 cilium, 两者对 bpf 目录的使用会发生冲突(覆盖同一个目录)么?
3.2 calico 直接切换到 cilium 之后
root@k8s-work2:~/test# tree /sys/fs/bpf/
/sys/fs/bpf/
├── calico
│ ├── calico_failsafe_ports_v1
│ └── xdp
├── cilium
│ ├── devices
│ │ ├── cilium_geneve
│ │ │ └── links
│ │ ├── cilium_host
│ │ │ └── links
│ │ ├── cilium_net
│ │ │ └── links
│ │ ├── ens160
│ │ │ └── links
│ │ ├── ens256
│ │ │ └── links
│ │ └── tunl0
│ │ └── links
│ ├── endpoints
│ │ ├── 125
│ │ │ └── links
│ │ ├── 1642
│ │ │ └── links
│ │ ├── 1780
│ │ │ └── links
│ │ └── 623
│ │ └── links
│ └── socketlb
│ └── links
│ └── cgroup
│ ├── cil_sock4_connect
│ ├── cil_sock4_getpeername
│ ├── cil_sock4_post_bind
│ ├── cil_sock4_recvmsg
│ ├── cil_sock4_sendmsg
│ ├── cil_sock6_connect
│ ├── cil_sock6_getpeername
│ ├── cil_sock6_post_bind
│ ├── cil_sock6_recvmsg
│ ├── cil_sock6_sendmsg
│ └── cil_sock_release
└── tc
└── globals
├── cilium_auth_map
├── cilium_call_policy
├── cilium_calls_00125
├── cilium_calls_00623
├── cilium_calls_01642
├── cilium_calls_01780
├── cilium_calls_hostns_00713
├── cilium_calls_netdev_00002
├── cilium_calls_netdev_00003
├── cilium_calls_netdev_00007
├── cilium_calls_netdev_00025
├── cilium_calls_overlay_2
├── cilium_ct4_global
├── cilium_ct_any4_global
├── cilium_egresscall_policy
├── cilium_events
├── cilium_ipcache_v2
├── cilium_ipv4_frag_datagrams
├── cilium_l2_responder_v4
├── cilium_lb4_affinity
├── cilium_lb4_backends_v3
├── cilium_lb4_maglev
├── cilium_lb4_reverse_nat
├── cilium_lb4_reverse_sk
├── cilium_lb4_services_v2
├── cilium_lb4_source_range
├── cilium_lb_affinity_match
├── cilium_lxc
├── cilium_metrics
├── cilium_node_map_v2
├── cilium_nodeport_neigh4
├── cilium_policystats
├── cilium_policy_v2_00125
├── cilium_policy_v2_00623
├── cilium_policy_v2_00713
├── cilium_policy_v2_01642
├── cilium_policy_v2_01780
├── cilium_ratelimit
├── cilium_ratelimit_metrics
├── cilium_runtime_config
├── cilium_signals
├── cilium_skip_lb4
├── cilium_snat_v4_alloc_retries
└── cilium_snat_v4_external
30 directories, 56 files
root@k8s-work2:~/test#