Docker部署rancherv2.44及故障排查
持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第18天
@TOC
1.环境准备
操作系统:CentOS 7.5
Docker:19.03.13
Rancher:v2.4.4
2.获取rancher2.4.4镜像
[root@bogon ~]# docker pull rancher/rancher:v2.4.4
v2.4.4: Pulling from rancher/rancher
23884877105a: Pull complete
bc38caa0f5b9: Pull complete
2910811b6c42: Pull complete
36505266dcc6: Pull complete
99447ff7670f: Pull complete
879c87dc86fd: Pull complete
5b954e5aebf8: Pull complete
664e1faf26b5: Pull complete
bf7ac75d932b: Pull complete
7e972d16ff5b: Pull complete
08314b1e671c: Pull complete
d5ce20b3d070: Pull complete
20e75cd9c8e9: Pull complete
80daa2770be8: Pull complete
7fb927855713: Pull complete
af20d79674f1: Pull complete
d6a9086242eb: Pull complete
887a8f050cee: Pull complete
834df47e622f: Pull complete
Digest: sha256:cd9c4574606eb88d63dd9c84e6a7f4ee9998c1f0f4e4ee323cae884c95769041
Status: Downloaded newer image for rancher/rancher:v2.4.4
docker.io/rancher/rancher:v2.4.4
3.部署rancher
1.准备挂载点
[root@bogon ~]# mkdir -p /docker_volume/rancher_home/rancher
[root@bogon ~]# mkdir -p /docker_volume/rancher_home/auditlog
2.启动rancher
[root@bogon ~]# docker run -d --restart=unless-stopped -p 80:80 -p 443:443 \
-v /docker_volume/rancher_home/rancher:/var/lib/rancher \
-v /docker_volume/rancher_home/auditlog:/var/log/auditlog \
--name rancher rancher/rancher:v2.4.4
30329e53ae9f388a1a11ddb43e6f52e24616dbd41d2a0987a7446ebfac72817d
3.查看容器启动状态
大概在两分钟左右启动成功
[root@bogon ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
30329e53ae9f rancher/rancher:v2.4.4 "entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp rancher
4.查看日志
[root@bogon ~]# docker logs -f rancher
4.访问rancher
https://192.168.81.250/
默认账号密码:admin/admin
首先让设置密码
4.1.设置语言为中文
4.2.设置站点url路径
4.3.rancher首页
5.添加k8s集群
5.1.点击添加集群
5.2.选择导入k8s集群
5.3.填写集群名称点击创建
5.4.保存imprt命令
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user [USER_ACCOUNT]
kubectl apply -f https://192.168.81.180/v3/import/hktmls54zxmfwrlqjw4gcjmct5r7z9869pvjblk4j6b46bdcl275xx.yaml
curl --insecure -sfL https://192.168.81.180/v3/import/hktmls54zxmfwrlqjw4gcjmct5r7z9869pvjblk4j6b46bdcl275xx.yaml | kubectl apply -f -
5.5.在master运行命令导入rancher
[root@k8s-master ~]# kubectl apply -f https://192.168.81.180/v3/import/hktmls54zxmfwrlqjw4gcjmct5r7z9869pvjblk4j6b46bdcl275xx.yaml
Unable to connect to the server: x509: certificate is valid for 127.0.0.1, 172.17.0.2, not 192.168.81.180
这个报错是因为没有自签证书,运行下面的命令即可解决
[root@k8s-master ~]# curl --insecure -sfL https://192.168.81.180/v3/import/hktmls54zxmfwrlqjw4gcjmct5r7z9869pvjblk4j6b46bdcl275xx.yaml | kubectl apply -f -
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver created
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master created
namespace/cattle-system created
serviceaccount/cattle created
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding created
secret/cattle-credentials-bc8df60 created
clusterrole.rbac.authorization.k8s.io/cattle-admin created
deployment.apps/cattle-cluster-agent created
daemonset.apps/cattle-node-agent created
耐心等待pod启动
[root@k8s-master ~]# kubectl get pod -n cattle-system
当cattle-system名称空间的pod都起来后就可以成功接入rancher了
成功接入后可以看到如下的监控
此处变红会在下面有解决方法
5.6.查看系统pod
点击集群---system
5.7.如何升级pod
选择需要升级的pod---点击三个点---升级
更换镜像即可
5.8.如何执行命令
选择pod---更多---执行命令
这样就能拿到命令行了
6.故障排查
6.1.cattle-cluster-agent-c66cd4f58-xhfhs pod一直处于ContainerCreating状态
cattle-cluster-agent-c66cd4f58-xhfhs pod一直处于ContainerCreating状态,导致无法接入rancher
日志输出
Events: Type Reason Age From Message
Normal Scheduled 25m default-scheduler Successfully assigned cattle-system/cattle-cluster-agent-c66cd4f58-xhfhs to k8s-node1 Warning NetworkNotReady 25s (x753 over 25m) kubelet, k8s-node1 network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
看日志来判断应该是k8s网络的问题
果然是网络宕了一个
解决方法:
查看改pod在哪个节点运行 [root@k8s-master ~]# kubectl get pod -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-5b8b769fcd-ktgpz 1/1 Running 1 121m 10.100.235.199 k8s-master <none> <none> calico-node-4nsml 1/1 Running 0 13m 192.168.81.190 k8s-node1 <none> <none> calico-node-dlkgp 1/1 Running 1 121m 192.168.81.180 k8s-master <none> <none> 去node节点load镜像包 [root@k8s-node1 k8s_1_18_6_image]# docker load -i cni.tar.gz 重启该pod kubectl delete pod calico-node-q4c7n -n kube-system解决
6.2.解决rancher仪表盘变红错误
6.2.1.问题描述
仪表盘变红是由于集群健康检查端口没有开启导致的,不过也不影响使用
使用kubectl get cs命令就可以看到集群的监控状态,以下输出表示为正常,输出不是如下显示则会在rancher的仪表盘地方变红
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Z5faxHrx-1610331169101)(E:\jxl工作\运维文档\docker部署rancher.assets\image-20210111100327912.png)]
变红样子
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ft1OthPe-1610331169102)(E:\jxl工作\运维文档\docker部署rancher.assets\image-20201127174609613.png)]
6.2.2.问题解决
修改kube-scheduler配置文件
[root@k8s-master ~]# vim /etc/kubernetes/manifests/kube-scheduler.yaml
把port=0那行注释
修改kube-controller-manager配置文件
[root@k8s-master ~]# vim /etc/kubernetes/manifests/kube-controller-manager.yaml
把port=0那行注释
修改完即可生效不需要重启,当修改完之后,10251/10252端口起来之后,仪表盘就不会显示红色了