Kubernetes v1.33.5 高可用集群部署教程

0 阅读4分钟

一 安装说明

部署说明

本次部署采用的系统及组件版本:

项目版本
操作系统AlmaLinux release 10.0 (基本和centos7操作一样)
内核版本6.12.0
Kubernetesv1.33.5
containerd2.1.4
CNI 插件v1.8.0
crictl1.34.0
etcd3.5.21-0

离线安装包: 链接: pan.baidu.com/s/19CjX1Imi… 提取码: 8888


二 准备开始

  • Linux 主机,兼容 Debian / RedHat 系列或其他无包管理器的发行版。
  • 如果不在 AlmaLinux/类似系统上,请确认内核版本 ≥ v5.13(参考 官方文档)。
  • 每台机器 ≥ 2 GB RAM;控制平面节点建议 ≥ 2 CPU。
  • 集群中的所有机器之间必须能网络互通。
  • 每节点主机名、MAC 地址、product_uuid 唯一。
  • 禁用 swap。

三 集群安装

3.1 基本网络 / 主机名 /静态 IP 配置

主机IP 地址主机名备注
192.168.1.11master01主控制平面14C / 4G / 40G
192.168.1.12master02主控制平面2同上
192.168.1.13master03主控制平面3同上
192.168.1.100master-lbVIP(虚拟 IP)与公司内网不冲突

Kubernetes Service 网段:10.96.0.0/12 Pod 网段:10.244.0.0/16


3.2 系统环境 & 基本配置(所有节点)

3.2.1 系统版本确认

cat /etc/redhat-release
# 应为 AlmaLinux release 10.0 (Purple Lion)

3.2.2 修改 /etc/hosts

在所有节点:

echo '192.168.1.11 master01
192.168.1.12 master02
192.168.1.13 master03
192.168.1.100 master-lb' >> /etc/hosts

3.2.3 关闭防火墙与 SELinux

systemctl disable --now firewalld

setenforce 0
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/sysconfig/selinux
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config

3.2.4 禁用 swap

swapoff -a
sed -i.bak '/swap/s/^/#/' /etc/fstab

3.2.5 时间同步

  • 安装 ntp 或
  • 使用 chronyd
dnf install -y ntpd
# 或
systemctl status chronyd

3.2.6 系统限制(limits)

echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
echo "* soft nproc 65536"  >> /etc/security/limits.conf
echo "* hard nproc 65536"  >> /etc/security/limits.conf
echo "* soft memlock unlimited" >> /etc/security/limits.conf
echo "* hard memlock unlimited" >> /etc/security/limits.conf

3.2.7 无密码 SSH 登陆(Master01 -> 所有节点)

在主控机 Master01 上:

ssh-keygen -t rsa   # 回车全部默认值
for i in master01 master02 master03; do
  ssh-copy-id -i ~/.ssh/id_rsa.pub $i
done

3.3 内核与 ipvs 配置

3.3.1 安装 ipvsadm 及相关模块

dnf install -y ipvsadm ipset sysstat conntrack libseccomp
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack

3.3.2 ipvs 模块开机加载

cat > /etc/modules-load.d/ipvs.conf <<EOF
ip_vs
ip_vs_lc
ip_vs_wlc
ip_vs_rr
ip_vs_wrr
ip_vs_lblc
ip_vs_lblcr
ip_vs_dh
ip_vs_sh
ip_vs_fo
ip_vs_nq
ip_vs_sed
ip_vs_ftp
ip_vs_sh
nf_conntrack
ip_tables
ip_set
xt_set
ipt_set
ipt_rpfilter
ipt_REJECT
ipip
EOF

systemctl enable --now systemd-modules-load.service

检查加载情况:

lsmod | grep -e ip_vs -e nf_conntrack

3.3.3 配置内核参数

在所有节点创建 /etc/sysctl.d/k8s.conf

cat <<EOF > /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
fs.may_detach_mounts = 1
vm.overcommit_memory = 1
net.ipv4.conf.all.route_localnet = 1

vm.panic_on_oom = 0
fs.inotify.max_user_watches = 89100
fs.file-max = 52706963
fs.nr_open = 52706963
net.netfilter.nf_conntrack_max = 2310720

net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_timestamps = 0
net.core.somaxconn = 16384
EOF

sysctl --system

重启机器以保证修改生效:

reboot

重启后确认内核模块仍已加载:

lsmod | grep --color=auto -e ip_vs -e nf_conntrack

3.4 安装 containerd + CRI 工具

配置内核参数 转发 IPv4 并让 iptables 看到桥接流量

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# 应用 sysctl 参数而不重新启动
sudo sysctl --system

通过运行以下指令确认 br_netfilter 和 overlay 模块被加载:

lsmod | grep br_netfilter
lsmod | grep overlay

查看内核参数是否为 1

sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward

3.4.1 下载与安装 containerd

wget https://github.com/containerd/containerd/releases/download/v2.1.4/containerd-2.1.4-linux-amd64.tar.gz
tar xvf containerd-2.1.4-linux-amd64.tar.gz
mv bin/* /usr/local/bin/
mkdir /etc/containerd
containerd config default > /etc/containerd/config.toml

3.4.2 containerd 启动文件

cat > /usr/lib/systemd/system/containerd.service <<EOF
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now containerd

3.4.3 安装 runc

install -m 755 runc.amd64 /usr/local/sbin/runc

3.4.4 安装 CNI 插件

mkdir -p /opt/cni/bin
tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.8.0.tgz

3.4.5 安装 crictl

tar -xf crictl-v1.34.0-linux-amd64.tar.gz -C /usr/local/bin

cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 30
debug: false
pull-image-on-create: false
EOF

3.4.6 启用 systemd cgroup 驱动

cgroup 详细介绍请查看 官方文档

编辑 /etc/containerd/config.toml 中对应部分:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    ShimCgroup = ''  # 在这行下面添加
    SystemdCgroup = true # 默认是没有这行的

重启 containerd:

systemctl restart containerd

3.5 高可用组件:HAProxy + Keepalived

3.5.1 安装

在所有 Master 节点上:

dnf install -y haproxy keepalived

3.5.2 配置 HAProxy

所有 Master 节点共享相同配置文件 /etc/haproxy/haproxy.cfg,内容如下:

cat > /etc/haproxy/haproxy.cfg << EOF
global
  maxconn 2000
  ulimit-n 16384
  log 127.0.0.1 local0 err
  stats timeout 30s

defaults
  log global
  mode http
  option httplog
  timeout connect 5000
  timeout client 50000
  timeout server 50000
  timeout http-request 15s
  timeout http-keep-alive 15s

frontend k8s-master
  bind 0.0.0.0:8443
  mode tcp
  option tcplog
  tcp-request inspect-delay 5s
  default_backend k8s-master

backend k8s-master
  mode tcp
  balance roundrobin
  option httpchk GET /healthz
  http-check expect status 200
  option tcp-check
  default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
  server master01 192.168.1.11:6443 check
  server master02 192.168.1.12:6443 check
  server master03 192.168.1.13:6443 check
EOF

3.5.3 Keepalived 配置(不同节点略有差异)

Master01:

cat > /etc/keepalived/keepalived.conf << EOF
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
}
vrrp_script chk_apiserver {
    script "/etc/keepalived/check_apiserver.sh"
    interval 5
    weight -5
    fall 2
    rise 1
}
vrrp_instance VI_1 {
    state MASTER
    interface ens33
    mcast_src_ip 192.168.1.11
    virtual_router_id 51
    priority 100
    nopreempt
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass K8SHA_KA_AUTH
    }
    virtual_ipaddress {
        192.168.1.100
    }
    track_script {
        chk_apiserver
    }
}
EOF

Master02:

cat > /etc/keepalived/keepalived.conf << EOF
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
}
vrrp_script chk_apiserver {
    script "/etc/keepalived/check_apiserver.sh"
    interval 5 
    weight -5
    fall 2
    rise 1
}
vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    mcast_src_ip 192.168.1.12
    virtual_router_id 51
    priority 99
    nopreempt
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass K8SHA_KA_AUTH
    }
    virtual_ipaddress {
        192.168.1.100
    }
    track_script {
      chk_apiserver 
} }
EOF

Master03:

cat > /etc/keepalived/keepalived.conf << EOF
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
}
vrrp_script chk_apiserver {
    script "/etc/keepalived/check_apiserver.sh"
    interval 5 
    weight -5
    fall 2
    rise 1
}
vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    mcast_src_ip 192.168.1.13
    virtual_router_id 51
    priority 98
    nopreempt
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass K8SHA_KA_AUTH
    }
    virtual_ipaddress {
        192.168.1.100
    }
    track_script {
      chk_apiserver 
} }
EOF

健康检查脚本 /etc/keepalived/check_apiserver.sh

 cat > /etc/keepalived/check_apiserver.sh  << EOF
#!/bin/bash

err=0
for k in $(seq 1 3)
do
    check_code=$(pgrep haproxy)
    if [[ $check_code == "" ]]; then
        err=$(expr $err + 1)
        sleep 1
        continue
    else
        err=0
        break
    fi
done

if [[ $err != "0" ]]; then
    echo "systemctl stop keepalived"
    /usr/bin/systemctl stop keepalived
    exit 1
else
    exit 0
fi
EOF

chmod +x /etc/keepalived/check_apiserver.sh

启动服务:

systemctl daemon-reload
systemctl enable --now haproxy
systemctl enable --now keepalived

测试 VIP 是否可 ping 通:

ping 192.168.1.100

3.6 安装 Kubernetes 核心组件:kubeadm, kubelet, kubectl

3.6.1 配置 yum 源

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.33/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.33/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF

3.6.2 安装并启用服务

dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
systemctl enable --now kubelet

3.7 初始化 Master01(控制面第一个节点)

3.7.1 查看所需镜像并预先拉取

kubeadm config images list

所需镜像(版本 v1.33.5):

  • registry.k8s.io/kube-apiserver:v1.33.5
  • registry.k8s.io/kube-controller-manager:v1.33.5
  • registry.k8s.io/kube-scheduler:v1.33.5
  • registry.k8s.io/kube-proxy:v1.33.5
  • registry.k8s.io/coredns/coredns:v1.12.0
  • registry.k8s.io/pause:3.10
  • registry.k8s.io/etcd:3.5.21-0

导入镜像示例:

ctr -n k8s.io image import 加镜像名字 # 或者导入自己的镜像仓库在pull下来
# 倒入好镜像以后用crictl查看ctr也能查看,但是不直观
crictl images
# ctr 好像又命名空间的概念 我也没研究过 要是嫌麻烦可以安装docker客户端工具管理containerd
ctr -n k8s.io images ls  

3.7.2 生成并修改初始化配置文件

kubeadm config print init-defaults > kubeadm-init.yaml

修改生成的 kubeadm-init.yaml,例子如下:

cat  > ./kubeadm-init.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta4

# 引导令牌(保持默认即可)
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication

# 本地API端点
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.1.11
  bindPort: 6443

nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  imagePullSerial: true
  name: master01
  taints: null

# 超时设置(保持默认即可)
timeouts:
  controlPlaneComponentHealthCheck: 4m0s
  discovery: 5m0s
  etcdAPICall: 2m0s
  kubeletHealthCheck: 4m0s
  kubernetesAPICall: 1m0s
  tlsBootstrap: 5m0s
  upgradeManifests: 5m0s

---
apiServer: {}
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 87600h0m0s
certificateValidityPeriod: 8760h0m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes

controllerManager: {}
dns: {}
encryptionAlgorithm: RSA-2048

etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.k8s.io

kind: ClusterConfiguration
kubernetesVersion: 1.33.5

networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/16 

# 如果不是高可用集群  删除这行即可
controlPlaneEndpoint: "192.168.1.100:8443"

proxy: {}
scheduler: {}
EOF

3.7.3 执行初始化

  • 初始化以后会在/etc/kubernetes目录下生成对应的证书和配置文件,之后其他Master节点加入Master01即可。
  • 初始化的时候可以看详细日志 在后面添加 --v=5 即可
kubeadm init --config kubeadm-init.yaml --upload-certs

若初始化失败,可重置再来:

kubeadm reset -f ; ipvsadm --clear  ; rm -rf ~/.kube

初始化成功后,配置 kubeconfig:

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
# 或者如果是 root 用户
export KUBECONFIG=/etc/kubernetes/admin.conf
  • 初始化成功以后显示如下 在这里插入图片描述

解释: 要开始使用您的集群,您需要以普通用户身份运行以下命令

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

或者,如果您是 root 用户,可以运行:

export KUBECONFIG=/etc/kubernetes/admin.conf

接下来,您需要部署一个 Pod 网络到集群中。您可以在以下链接中的选项之一中选择,并运行 kubectl apply -f [podnetwork].yaml:

kubernetes.io/docs/concep…

您现在可以通过在每个控制平面节点上以 root 用户身份运行以下命令来加入任意数量的控制平面节点:

kubeadm join 192.168.1.100:8443 --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:3fcd0d0ac88c9a4f1321f6d15cb484b8f67b1492c10282f5faa3070b5741635f \
  --control-plane --certificate-key bf521ccd59a5d33a2d8370e0ae9f10b7f00db3412f1c066aafd0e516c80664ae

请注意,certificate-key 提供对集群敏感数据的访问权限,请保密!为了安全起见,上传的证书将在两个小时后被删除;如果需要,您可以使用 "kubeadm init phase upload-certs --upload-certs" 在之后重新加载证书。

然后,您可以通过在每个工作节点上以 root 用户身份运行以下命令来加入任意数量的工作节点:

kubeadm join 192.168.1.100:8443 --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:3fcd0d0ac88c9a4f1321f6d15cb484b8f67b1492c10282f5faa3070b5741635f

3.8 部署网络插件(Calico)

下载地址:github.com/projectcali… 下载好以后修改配置

# 添加etcd 节点
sed -i 's#etcd_endpoints: "http://<ETCD_IP>:<ETCD_PORT>"#etcd_endpoints: "https://192.168.1.11:2379,https://192.168.1.12:2379,https://192.168.1.13:2379"#g' calico-etcd.yaml

# 添加证书
ETCD_CA=`cat /etc/kubernetes/pki/etcd/ca.crt | base64 | tr -d '\n'`
ETCD_CERT=`cat /etc/kubernetes/pki/etcd/server.crt | base64 | tr -d '\n'`
ETCD_KEY=`cat /etc/kubernetes/pki/etcd/server.key | base64 | tr -d '\n'`
sed -i "s@# etcd-key: null@etcd-key: ${ETCD_KEY}@g; s@# etcd-cert: null@etcd-cert: ${ETCD_CERT}@g; s@# etcd-ca: null@etcd-ca: ${ETCD_CA}@g" calico-etcd.yaml

# 添加证书路径
sed -i 's#etcd_ca: ""#etcd_ca: "/calico-secrets/etcd-ca"#g; s#etcd_cert: ""#etcd_cert: "/calico-secrets/etcd-cert"#g; s#etcd_key: "" #etcd_key: "/calico-secrets/etcd-key" #g' calico-etcd.yaml

# 修改pod网段地址
POD_SUBNET="10.244.0.0/16"
sed -i 's@# - name: CALICO_IPV4POOL_CIDR@- name: CALICO_IPV4POOL_CIDR@g; s@#   value: "192.168.0.0/16"@  value: '"${POD_SUBNET}"'@g' calico-etcd.yaml

全部修改好以后检查一遍没问题就可以部署了

kubectl create -f calico-etcd.yaml

部署成功以后再次查看集群状态就没问题了

[root@master01 ~]# kubectl get node 
NAME       STATUS     ROLES           AGE     VERSION
master01   Ready      control-plane   1h45m   v1.33.5
master02   Ready      control-plane   1h24m   v1.33.5
master03   Ready      control-plane   1h23m   v1.33.5

3.9 部署 Metrics Server

安装之前需要删除污点

kubectl taint node --all   node-role.kubernetes.io/control-plane:NoSchedule-

这是官方配置文件,直接拿来用会提示缺少证书:github.com/kubernetes-…

以下为修改添加证书相关路径添加挂在点等等 证书文件路径为/etc/kubernetes/pki/front-proxy-ca.crt(部署集群时自动生成的证书) 在安装Metrics

cat > ./components.yaml << E
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - appProtocol: https
    name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=10250
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls
        - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt 
        - --requestheader-username-headers=X-Remote-User
        - --requestheader-group-headers=X-Remote-Group
        - --requestheader-extra-headers-prefix=X-Remote-Extra-
        image: registry.k8s.io/metrics-server/metrics-server:v0.8.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 10250
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
        - mountPath: /etc/kubernetes/pki
          name: k8s-certs
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
      - hostPath:
          path: /etc/kubernetes/pki
        name: k8s-certs
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100
E
kubectl create -f components.yaml

3.10 将 kube-proxy 切换到 ipvs 模式

kubectl edit cm kube-proxy -n kube-system
# 将 mode 修改为 "ipvs"

更新Kube-Proxy的Pod:

kubectl patch daemonset kube-proxy -n kube-system -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"$(date +'%s')\"}}}}}"

验证模式:

curl 127.0.0.1:10249/proxyMode
# 应显示 ipvs

3.11 其它工具、Ingress、Storage 等(可选)

注意:一下的相关组件都是一年以前的老版本 如果需要新版本直接在官网下载最新版本安装即可 安装方法可以参考我的教程

安装 Helm 点击跳转

安装 ingress 控制器 点击跳转

安装rook存储 点击跳转


四 注意事项

  • kubeadm 默认签发的证书有效期为 一年,生产环境可考虑延长或设置自动更新。
  • 控制平面组件(kube-apiserver、controller-manager、scheduler、etcd)以静态 Pod 方式运行,配置文件在 /etc/kubernetes/manifests;更改后 kubelet 会自动重启对应 Pod。
  • kubelet 的配置在 /etc/sysconfig/kubelet/var/lib/kubelet/config.yaml
  • 默认情况下 control-plane/master 节点有污点,不允许调度普通 Pod;如需在 master 上部署 Pod,需要移除污点:
## 查看污点
kubectl describe node  | grep Taint
## 删除污点
kubectl taint node --all node-role.kubernetes.io/control-plane:NoSchedule-

五 安装 Kuboard(管理平台 可选)

  • 官网:Kuboard(支持在线及离线安装)
  • 安装命令:
kubectl apply -f https://addons.kuboard.cn/kuboard/kuboard-v3.yaml

六 验证集群状态

6.1 查看节点状态

kubectl get nodes

确保所有节点状态为 Ready,角色分配符合预期(例如 control-plane、worker 等)。

6.2 查看系统组件 Pod 状态

kubectl get pods -n kube-system

核心服务(如 CoreDNS、kube-proxy、calico/node 等)应为 Running

6.3 确认控制面组件健康状态

kubectl get componentstatuses

(注意:kubectl get cs/componentstatuses 在新版 Kubernetes 中已被标注为弃用,但仍可用于基本诊断)(

6.4 查看集群信息

kubectl cluster-info

这会显示 API Server、DNS 等服务的地址,确保它们都处于可访问状态。


6.5 API Server 健康检查

  • 使用 readiness probe 查看 API 的健康状态:

    kubectl get --raw='/readyz?verbose'
    

    返回 ok 表示 API 已准备就绪并可以处理请求。


6.6 CNI 网络检查

  • 核心 DNS 能否正常解析: 部署一个临时 Pod(如 dnsutilsbusybox),然后执行 nslookup:

    kubectl run dnsutils --image=tutum/dnsutils --command -- sleep inf
    kubectl exec -ti dnsutils -- nslookup kubernetes.default
    

    能解析成功说明网络正常


6.7 应用测试

  • 部署测试 Pod / Deployment

    kubectl apply -f https://k8s.io/examples/application/deployment.yaml
    kubectl get pods
    

    查看是否能正常创建并运行。

  • 暴露服务

    kubectl expose deployment nginx-deployment --port=80 --type=NodePort
    

    访问节点 IP + 分配的 NodePort,确保服务可访问。


6.8 Resource Metrics 测试(可选)

如果你已安装 Metrics Server:

kubectl top nodes
kubectl top pods -n kube-system

这些命令能返回节点和 Pod 的 CPU/内存使用情况,说明 Metrics API 正常工作。


6.9 事件查看与调试

  • 使用以下命令查看系统事件日志,如果有资源调度或服务启动失败等问题,可以及时发现原因:

    kubectl get events --sort-by='.metadata.creationTimestamp'
    
  • 也可以导出集群状态供诊断:

    kubectl cluster-info dump