本文档用于自学k8s,搭建测试环境使用,我是在mac上使用utm虚拟机工具,创建3台ubuntu22-arm64操作系统的虚拟机,配置均为2核2G,采用 1 master 2 worker架构。
创建时间:2024-09-21
一、部署架构规划
| 角色 | 主机名 | 运行组件 | ip | 配置 |
|---|---|---|---|---|
| master01 | server01 | etcd、apiserver、controller-manager、scheduler、kubelet、proxy、flannel、runc | 192.168.64.7 | 2核2G |
| worker01 | server02 | pod、kubelet、proxy、flannel、runc | 192.168.64.8 | 2核2G |
| worker02 | server03 | pod、kubelet、proxy、flannel、runc | 192.168.64.9 | 2核2G |
二、软件版本记录
- docker server :27.2.1
- containerd :1.7.22
- kubeadm : v1.28.2
- kubelet : v1.28.2
- kubectl : v1.28.2
三、集群服务器初始化(3.*操作,所有机器执行)
3.1 添加主机名
cat >> /etc/hosts <<EOF
192.168.64.7 m1 server01
192.168.64.8 w1 server02
192.168.64.9 w2 server03
EOF
3.2 主机时间保持一致,使用timedatectl
timedatectl set-time Asia/Shanghai #临时改变,重启失效
timedatectl status #查看当前时区和时间
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime #修改配置文件,永久生效
3.3 关闭ubuntu默认ufw防火墙
systemctl status ufw
systemctl disable ufw && systemctl stop ufw
## 显示 “Active: inactive (dead)”即可
3.4 关闭swap分区
swapoff -a #临时关闭swap分区
sed -ri 's/.swap*./#&/g' /etc/fstab #sed修改配置文件永久关闭
3.5 配置内核转发及网桥过滤
##为了确保网络和资源管理的正确性和性能。它们确保了正确的网络转发、防火墙规则以及内存管理策略。)
##添加配置文件
cat << EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
##加载模块
modprobe overlay
modprobe br_netfilter
##添加网桥过滤及内核转发配置文件
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
vm.swappiness = 0
EOF
##载入配置使之生效
sysctl --system
3.6 安装容器运行时
容器运行时 使 Kubernetes能够有效运行容器,它负责管理 Kubernetes 环境中容器的执行和生命周期。
从k8s 1.24版本开始,负责kubelet与docker直接握手的docker-shim已经不再支持,转而强制使用容器运行时Container Runtime作为kubelet和容器的桥梁。我们有两种方式来安装:
(1)安装docker其中包含containerd.io作为容器运行时。
(2)单独下载containerd+runc+cni 套装作为容器运行时。
图方便建议直接方式1安装docker;想要更深入理解组件作用,选择第2种方式。
3.6.1 安装docker
参考官方文档:docs.docker.com/engine/inst…
- 卸载旧文件及安装包
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
- 设置官方docker-apt仓库密钥并添加下载源
## Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
## Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
如网络不通,可以设置国内docker-apt仓库密钥并添加下载源
sudo curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
sudo echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://mirrors.aliyun.com/docker-ce/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
- 安装最新版本docker
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
4.验证docker版本
docker version
## 屏幕显示如下
Client: Docker Engine - Community
Version: 27.2.1
API version: 1.47
Go version: go1.22.7
Git commit: 9e34c9b
Built: Fri Sep 6 12:09:00 2024
OS/Arch: linux/arm64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.2.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.7
Git commit: 8b539b8
Built: Fri Sep 6 12:09:00 2024
OS/Arch: linux/arm64
Experimental: false
containerd: ## 作为容器运行时接口
Version: 1.7.22
GitCommit: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
runc: ##作为容器运行时与linux底层交互
Version: 1.1.14
GitCommit: v1.1.14-0-g2c9f560
docker-init:
Version: 0.19.0
GitCommit: de40ad0
5.检查服务状态,启动docker并设置为开机自启
##启动并设置开机自启
systemctl start docker
ststemctl enable docker
## 三个服务都应是running状态
systemctl status containerd.service
systemctl status docker.service
systemctl status docker.socket
3.6.2 手动安装容器运行时组件
1.下载 containerd 二进制文件并将其提取到 /usr/local
wget https://github.com/containerd/containerd/releases/download/v1.7.11/containerd-1.7.11-linux-arm64.tar.gz
sudo tar Cxzvf /usr/local containerd-1.7.11-linux-arm64.tar.gz
2.生成containerd配置文件
sudo mkdir /etc/containerd
containerd config default > config.toml
sudo cp config.toml /etc/containerd
3.为containerd配置service启动文件
wget https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
sudo cp containerd.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now containerd
sudo systemctl restart containerd
## 此时执行systemctl status containerd 应显示 Active: active (running)
- 安装runc (作为实际容器运行时与linux底层交互)
wget https://github.com/opencontainers/runc/releases/download/v1.1.10/runc.arm64
sudo install -m 755 runc.arm64 /usr/local/sbin/runc
5.为containerd安装cni插件来与网络交互
wget https://github.com/containernetworking/plugins/releases/download/v1.4.0/cni-plugins-linux-arm64-v1.4.0.tgz
sudo mkdir -p /opt/cni/bin
sudo tar Cxzvf /opt/cni/bin cni-plugins-linux-arm64-v1.4.0.tgz
3.7 配置容器运行时的systemd驱动
## 生成默认配置文件
containerd config default > /etc/containerd/config.toml
## 修改默认配置文件
vim /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]段落
默认:
SystemdCgroup = false
改为:
SystemdCgroup = true
## 这一步如果虚拟机可以科学上网也可使用默认地址
[plugins."io.containerd.grpc.v1.cri"]段落
默认:
sandbox_image = "registry.k8s.io/pause:3.6"
改为:
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
## 重启containerd
sudo systemctl daemon-reload
sudo systemctl restart containerd
sudo systemctl status containerd
3.8 设置官方kubernetes仓库密钥并添加下载源
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
如网络不通,可设置国内仓库密钥并添加下载源
sudo curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
sudo add-apt-repository "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main"
sudo apt update
3.9 安装kubernetes组件
## 我们首先下载一些辅助安装包
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
## 安装1.28指定版本
sudo apt-get install -y kubelet=1.28.4-1.1 kubeadm=1.28.4-1.1 kubectl=1.28.4-1.1
## 锁定软件版本,以防自动更新导致集群不可用
sudo apt-mark hold kubelet kubeadm kubectl
## 确认服务状态(此时kubelet暂未启动)
systemctl status kubelet
## 这里没有启动成功是正常的,因为kubelet服务成功启动的先决条件,需要kubelet的配置文件,所在目录/var/lib/kubelet还没有建立,主节点执行kubeadm init后可以正常启动。
四、kubenetes初始化 (只在master节点操作)
4.1 预拉取镜像
## 通过下面的命令看到kubeadm默认配置的kubernetes镜像,是外网的镜像
kubeadm config images list --kubernetes-version=v1.28.2
## 显示如下
registry.k8s.io/kube-apiserver:v1.28.2
registry.k8s.io/kube-controller-manager:v1.28.2
registry.k8s.io/kube-scheduler:v1.28.2
registry.k8s.io/kube-proxy:v1.28.2
registry.k8s.io/pause:3.9
registry.k8s.io/etcd:3.5.9-0
registry.k8s.io/coredns/coredns:v1.10.1
## 查看aliyun镜像源可以拉取的镜像
kubeadm config images list --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.28.2
## 显示如下
registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.2
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.28.2
registry.aliyuncs.com/google_containers/kube-scheduler:v1.28.2
registry.aliyuncs.com/google_containers/kube-proxy:v1.28.2
registry.aliyuncs.com/google_containers/pause:3.9
registry.aliyuncs.com/google_containers/etcd:3.5.9-0
registry.aliyuncs.com/google_containers/coredns:v1.10.1
## 根据本机网络情况拉取国内|官网镜像
kubeadm config images pull --kubernetes-version v1.28.2 --image-repository registry.aliyuncs.com/google_containers
4.2 初始化master
初始化有两种方式任选其一:(1)手工添加init参数;(2)生成配置文件修改后
方法1.手工添加init参数初始化
kubeadm init \
--control-plane-endpoint="192.168.64.7" \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.28.2 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16 \
--ignore-preflight-errors=all
### –-apiserver-advertise-address # 集群通告地址,单master时为控制面使用的m01的服务器IP
### --control-plane-endpoint="kubeapi.magedu.com" #和上面的参数二选一,如果用这个,则配置kubeapi这个规划好的master虚拟主机名
### –-image-repository #由于默认拉取镜像地址k8s.gcr.io国内无法访问,这里指定阿里云镜像仓库地址
### –-kubernetes-version #K8s版本,与上面安装的k8s软件一致
### –-service-cidr #集群内部虚拟网络,Pod统一访问入口,可以不用更改,直接用上面的参数
### –-pod-network-cidr #Pod网络,与下面部署的CNI网络组件yaml中保持一致,可以不用更改,直接用上面的参数
方法2.生成kubeadm默认配置文件初始化
kubeadm config print init-defaults > kubeadm.yaml
修改默认配置 vim kubeadm.yaml(总共四处需修改)
1.修改localAPIEndpoint.advertiseAddress为master的ip;
2.修改nodeRegistration.name为当前节点名称;
3.修改imageRepository为国内源:registry.cn-hangzhou.aliyuncs.com/google_containers
4.添加networking.podSubnet,该网络ip范围不能与networking.serviceSubnet冲突,也不能与节点网络192.168.64.0/24相冲突;所以我就设置成10.244.0.0/16;
修改后内容如下:
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.64.7
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: m1
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.28.2
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16 #添加了Pod网段信息
serviceSubnet: 10.96.0.0/12
scheduler: {}
执行初始化
sudo kubeadm init —config kubeadm.yaml
初始化成功显示如下,按照提示执行命令
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities and service account keys on each node and then running the following as root:
kubeadm join kubeapi.beety.com:6443 --token avjbin.trcl3dwub19jjcau \ --discovery-token-ca-cert-hash sha256:80943437321a578c95a90bc2dae6267f4955b1d09fdf0b8c4b76e938993a780e \ --control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join kubeapi.beety.com:6443 --token avjbin.trcl3dwub19jjcau \ --discovery-token-ca-cert-hash sha256:80943437321a578c95a90bc2dae6267f4955b1d09fdf0b8c4b76e938993a780e
此时节点还是NOT READY状态(因为集群没有安装网络插件,可以看到coredns还未启动)
4.3 安装集群网络
1.下载kube-flannel.yml 配置文件
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
2.拉取配置文件中所需镜像
grep image: kube-flannel.yml
docker pull docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel2
docker pull docker.io/flannel/flannel:v0.25.6
docker pull docker.io/flannel/flannel:v0.25.6
## 如果无法拉取可尝试 quay.io/coreos/flannel:$tag,然后同步更改配置文件中镜像配置
3.创建flannel网络
kubectl create -f kube-flannel.yml
## 创建命令的打印结果
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
##查看pod运行状态 kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-49klq 1/1 Running 0 14h
kube-flannel kube-flannel-ds-kx58p 1/1 Running 12 (13h ago) 14h
kube-flannel kube-flannel-ds-tbhz4 1/1 Running 12 (13h ago) 14h
kube-system coredns-66f779496c-95v9f 1/1 Running 0 3d3h
kube-system coredns-66f779496c-fw8c7 1/1 Running 0 3d3h
kube-system etcd-m1 1/1 Running 2 (14h ago) 3d3h
kube-system kube-apiserver-m1 1/1 Running 2 (14h ago) 3d3h
kube-system kube-controller-manager-m1 1/1 Running 2 (14h ago) 3d3h
kube-system kube-proxy-7dd5l 1/1 Running 13 (13h ago) 14h
kube-system kube-proxy-cjcgl 1/1 Running 2 (14h ago) 3d3h
kube-system kube-proxy-mxjr5 1/1 Running 12 (13h ago) 14h
kube-system kube-scheduler-m1 1/1 Running 2 (14h ago) 3d3h
## 所有pod均为Running 可以继续加入worker节点
4.4 加入worker节点 (worker节点执行)
worker节点分别执行以下命令加入集群
kubeadm join kubeapi.beety.com:6443 --token avjbin.trcl3dwub19jjcau \ --discovery-token-ca-cert-hash sha256:80943437321a578c95a90bc2dae6267f4955b1d09fdf0b8c4b76e938993a780e
查看节点列表 kubectl get nodes
NAME STATUS ROLES AGE VERSION
m1 Ready control-plane 3d3h v1.28.2
server02 Ready <none> 14h v1.28.2
server03 Ready <none> 14h v1.28.2
至此,一个完整的可用于测试环境的k8s集群就完成了,你可以继续练习创建deployment,service等对象资源了。
五、遇到问题及解决
网络问题
5.1 kube-proxy或flannel pod 状态处于 CrashLoopBackOff
该问题是部署时有一台worker节点忘记修改修改containerd的 SystemdCgroup = true 驱动配置了,修改后解决。
5.2 worker节点无法部署flannel,状态为init error
切记如果部署网络插件时需要手动拉取镜像,需要集群内所有主机均拉取flannel镜像。