自建K8S集群

85 阅读3分钟

K8S 集群部署

环境准备

设置主机名

1、查看主机名 hostnamectl
hostnamectl 
   Static hostname: VM-99-127-centos
         Icon name: computer-vm
           Chassis: vm
        Machine ID: xxxxxx
           Boot ID: xxxxxx
    Virtualization: kvm
  Operating System: Tencent tlinux 2.2 (Final)
       CPE OS Name: cpe:/o:tlinux:linux:2
            Kernel: Linux 3.10.107-1-tlinux2_kvm_guest-0052
      Architecture: x86-64

2、设置主机名
hostnamectl set-hostname master
其他节点设置hostname为nodex
hostnamectl set-hostname node1

3、集群各个节点配置域名劫持
cat /etc/hosts
x.x.x.x master
x.x.x.x node1

所有机器安装docker环境

yum install -y yum-utils device-mapper-persistent-data lvm2
yum install -y --setopt=obsoletes=0 docker-ce-18.09.7-3.el7

mkdir -p /etc/docker/
tee /etc/docker/daemon.json <<-'EOF'
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

systemctl daemon-reload && \
systemctl restart docker &&\
systemctl enable docker &&\
systemctl status docker.service

设置机器的系统参数

1、关闭缓存
swapoff -a
2、关闭防火墙
systemctl stop firewalld && systemctl disable firewalld
3、设置k8s的内核参数
cat <<EOF >  /etc/sysctl.d/k8s.conf
vm.swappiness = 0
net.bridge.bridge-nf-call-ip6tables = 1 # 开启网桥模式(必须)
net.bridge.bridge-nf-call-iptables = 1
net.ipv6.conf.all.disable_ipv6 = 1 # 关闭IPv6协议(必须)
net.ipv4.ip_forward = 1
vm.panic_on_oom=0 # 开启OOM(默认开启)
vm.swappiness = 0 # 禁止使用swap空间
vm.overcommit_memory=1 # 不检查物理内存是否够用
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max = 52706963 # 设置文件句柄数量
fs.nr_open = 52706963 # 设置文件的最大打开数量
net.netfilter.nf_conntrack_max = 2310720
EOF

sysctl -p /etc/sysctl.d/k8s.conf 

4、设置系统时区
timedatectl set-timezone Asia/Shanghai
timedatectl set-local-rtc 0

5、开启ipvs服务
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF

chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

k8s服务部署

安装k8s组件

1、配置阿里的源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

yum install -y kubectl-1.18.2-0 kubelet-1.18.2-0 kubeadm-1.18.2-0 --disableexcludes=kubernetes

systemctl enable --now kubelet

安装master节点

1、init master节点
kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version 1.18.2 --apiserver-advertise-address x.x.x.x --pod-network-cidr=x.x.0.0/16 --token-ttl 0
2、执行成功后会返回node节点加入集群的命令,也可以用一下命令查询
kubeadm token create --print-join-command
3、配置kubectl命令
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

node加入到集群

1、在master节点查询node加入集群的命令
kubeadm token create --print-join-command

如果节点加入过一次需要reset一下再加入
kubeadm reset

报错container runtime is not running
rm /etc/containerd/config.toml
systemctl restart containerd

安装fannel插件

kubectl apply -f kube-flannel.yml

检查集群状态是否正常

kubectl get nodes
NAME     STATUS   ROLES    AGE   VERSION
master   Ready    master   34m   v1.18.2
node1    Ready    <none>   20m   v1.18.2

kubectl get pods --all-namespaces
NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE
kube-system   coredns-7ff77c879f-n7k7n         1/1     Running   0          34m
kube-system   coredns-7ff77c879f-trr59         1/1     Running   0          34m
kube-system   etcd-master                      1/1     Running   0          34m
kube-system   kube-apiserver-master            1/1     Running   0          34m
kube-system   kube-controller-manager-master   1/1     Running   0          34m
kube-system   kube-flannel-ds-amd64-dmx2z      1/1     Running   0          9m59s
kube-system   kube-flannel-ds-amd64-t5s68      1/1     Running   0          9m59s
kube-system   kube-proxy-6z5n5                 1/1     Running   0          20m
kube-system   kube-proxy-n562q                 1/1     Running   0          34m
kube-system   kube-scheduler-master            1/1     Running   0          34m

安装metrics插件用来监控集群资源占用

1、下载部署yaml
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server-components.yaml
2、国内修改镜像地址到aliyun的源
image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server
3、测试环境可以禁用tls认证
- --metric-resolution=15s
- --kubelet-insecure-tls
4、部署服务
kubectl create -f metrics-server-components.yaml 
5、检查服务是否正常启动
kubectl get pods -n kube-system
6、查看系统资源
kubectl top node
NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
master   209m         2%     5991Mi          38%       
node1    387m         0%     42392Mi         33%       
node2    212m         0%     25762Mi         40%     

踩坑记录

1、安装完flannel后master节点一直处于not ready状态

kubectl get nodes
NAME     STATUS     ROLES    AGE   VERSION
master   NotReady   master   29m   v1.18.2
node1    Ready      <none>   14m   v1.18.2
检查flannel的pod处于Init:ImagePullBackOff
kubectl get pods --all-namespaces
NAMESPACE     NAME                             READY   STATUS                  RESTARTS   AGE
kube-system   coredns-7ff77c879f-n7k7n         1/1     Running                 0          30m
kube-system   coredns-7ff77c879f-trr59         1/1     Running                 0          30m
kube-system   etcd-master                      1/1     Running                 0          30m
kube-system   kube-apiserver-master            1/1     Running                 0          30m
kube-system   kube-controller-manager-master   1/1     Running                 0          30m
kube-system   kube-flannel-ds-amd64-dmx2z      0/1     Init:ImagePullBackOff   0          6m14s
kube-system   kube-flannel-ds-amd64-t5s68      1/1     Running                 0          6m14s
kube-system   kube-proxy-6z5n5                 1/1     Running                 0          16m
kube-system   kube-proxy-n562q                 1/1     Running                 0          30m
kube-system   kube-scheduler-master            1/1     Running                 0          30m

问题原因:
master节点flannel镜像为下载
解决办法:
在下载好的节点打包镜像并上传到master后正常