自己动手在阿里云部署 K8S 集群

7,222 阅读3分钟

一步一步实现搭建 K8S 集群。

买机器

两台机器,内存不小于2G,CPU不少于2核。 建议购买按使用量计费的机器,一小时不到一块钱的成本。 我是在阿里云上购买的两台张家口的机器,操作系统为 CentOS 7.9,关闭了安全加固,具体配置如下:

image.png

购买成功之后,需要知道两台机器的内网 IP,假设分别为 IP 1.1.1.1 和 IP 2.2.2.2,我们将前者看成 k8s-master,后者看成 k8s-worker。

第一步

创建 /path/to/step_1_on_master.sh,内容如下:

hostnamectl set-hostname k8s-master
cat >> /etc/hosts << EOF
1.1.1.1 k8s-master ---------------------------注意改 IP 变成你的 k8s-master 的 IP
2.2.2.2 k8s-worker ---------------------------注意改 IP 变成你的 k8s-worker 的 IP
EOF
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
yum install -y ntpdate
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
yum -y install docker-ce-18.06.1.ce-3.el7
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

使用 ssh root@k8s-master-public-ip 'bash -s' < step_1_on_master.sh 将此脚本放到机器上执行,执行完了之后,重启机器。

然后复制一份,改名为 step_1_on_worker.sh,只需要把第一行改成

hostnamectl set-hostname k8s-worker 

即可,一样执行,一样重启。

两台机器都执行了脚本,并重启之后,开始第二步。

第二步

请先用 docker info |grep -i cgroup 确认 docker 的 cgroup driver 是 cgroupfs 还是 systemd,下文中的 cgroup-driver 要与之保持一致,假设为 cgroupfs。

在 k8s-master 上启动集群:(你可以用 ssh 远程执行,而且一定要把 1.1.1.1 改成你的 IP)

systemctl stop firewalld
systemctl disable firewalld
sysctl --system
ntpdate time.windows.com  
systemctl enable docker
systemctl start docker

echo "KUBELET_EXTRA_ARGS=--cgroup-driver=cgroupfs" > /etc/sysconfig/kubelet 

systemctl enable kubelet  

kubeadm init   --apiserver-advertise-address=1.1.1.1   --image-repository registry.aliyuncs.com/google_containers  --kubernetes-version v1.22.4   --service-cidr=10.1.0.0/16   --pod-network-cidr=10.244.0.0/16

如果没有报错,你会看到如下提示:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 1.1.1.1:6443 --token v2djav.rfpat4j8g1uf7uoy \
        --discovery-token-ca-cert-hash sha256:65d640b3798fde3c1e9ef8f1abbf26c01b512d5d28947a6c7bc921e3dcb8f88b

复制提示中的代码执行,我执行的代码如下:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config


# Alternatively, if you are the root user, you can run:
# 下面一行可选,不清楚就不要执行
export KUBECONFIG=/etc/kubernetes/admin.conf

然后点开提示中的网址:kubernetes.io/docs/concep… 。 选择一个网络插件进行安装,我选择点开其中的 Flannel 链接,刚好它的 README.md 中有一句

For Kubernetes v1.17+ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

我就复制执行了,执行结果如下:

Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

说明成功了!

提示中最后一部分有很关键的代码:

kubeadm join 1.1.1.1:6443 --token v2djav.rfpat4j8g1uf7uoy \
        --discovery-token-ca-cert-hash sha256:65d640b3798fde3c1e9ef8f1abbf26c01b512d5d28947a6c7bc921e3dcb8f88b

我将这段代码叫做「join代码」,会在第三步中用到

接下来是第三步。

第三步

在 k8s-worker 上执行以下代码:

systemctl stop firewalld
systemctl disable firewalld
sysctl --system
ntpdate time.windows.com  
systemctl enable docker
systemctl start docker

echo "KUBELET_EXTRA_ARGS=--cgroup-driver=cgroupfs" > /etc/sysconfig/kubelet 

systemctl enable kubelet

然后执行「join代码」,会看到如下提示:

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

这说明 k8s-worker 已经加入了 k8s-master 创建的集群!

第四步

来到 k8s-master,运行 kubectl get nodes 查看节点:

NAME         STATUS   ROLES                  AGE   VERSION
k8s-master   Ready    control-plane,master   19m   v1.22.4
k8s-worker   Ready    <none>                 12m   v1.22.4

最终,两个机器都会处于 Ready 状态才对。继续执行 kubectl get pod -n kube-system,结果如下:

NAME                                 READY   STATUS    RESTARTS   AGE
coredns-7f6cbbb7b8-4hzvm             1/1     Running   0          20m
coredns-7f6cbbb7b8-hh5rm             1/1     Running   0          20m
etcd-k8s-master                      1/1     Running   2          20m
kube-apiserver-k8s-master            1/1     Running   2          20m
kube-controller-manager-k8s-master   1/1     Running   2          20m
kube-flannel-ds-bz29s                1/1     Running   0          13m
kube-flannel-ds-ddk84                1/1     Running   0          16m
kube-proxy-sxcrt                     1/1     Running   0          20m
kube-proxy-x5fvc                     1/1     Running   0          13m
kube-scheduler-k8s-master            1/1     Running   2          20m

所有 pod 的 Ready 字段都是 1/1 才对。

第五步

我们尝试往机器中放入一个 Nginx:

docker pull nginx
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=NodePort

然后使用 kubectl get pods,svc 查看 service/nginx 的状态:

NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-6799fc88d8-56852   1/1     Running   0          7m24s

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
service/kubernetes   ClusterIP   10.1.0.1      <none>        443/TCP        30m
service/nginx        NodePort    10.1.254.34   <none>        80:30999/TCP   7m15s

其中 80:30999 表示 Nginx 的 80 端口被映射到 Node 的 30999 端口(你的端口应该跟我不同),我们可以使用 NodeIP:30999 来访问这个服务,还记得 1.1.1.1 和 2.2.2.2 这两个内网 IP 吗,试试访问看看:

curl http://k8s-master:30999
curl http://k8s-worker:30999

用 curl 请求两个 IP ,都会得到 Nginx 的欢迎页面:

<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

那么,能不能用公网 IP 访问这个页面呢?可以,前提是要在阿里云/腾讯云的安全策略里把 30999 端口加入白名单才行,比如我就做到了:

image.png

结语

今天,我们学会了使用两台机器创建 K8S 集群,并提供 Nginx 服务。

赶紧写一篇博客记录一下吧,别忘了删掉阿里云的按使用量计费的服务器。


参考文章: