openssl req -new -key server.key -subj "/CN=k8s-master" -out server.csr
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -extfile extfile.cnf -out server.crt -days 10000
如果修改 /etc/kubernetes/apiserver 的配置文件参数的话,通过 systemctl start kube-apiserver 启动失败,出错信息为:
Validate server run options failed: unable to load client CA file: open /root/keys/ca.crt: permission denied
但可以通过命令行启动 API Server
/usr/bin/kube-apiserver --logtostderr=true --v=0 --etcd-servers=http://k8s-master:2379 --address=0.0.0.0 --port=8080 --kubelet-port=10250 --allow-privileged=true --service-cluster-ip-range=10.254.0.0/16 --admission-control=ServiceAccount --insecure-bind-address=0.0.0.0 --client-ca-file=/root/keys/ca.crt --tls-cert-file=/root/keys/server.crt --tls-private-key-file=/root/keys/server.key --basic-auth-file=/root/keys/basic_auth.csv --secure-port=443 &>> /var/log/kubernetes/kube-apiserver.log &
命令行启动 Controller-manager
/usr/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://k8s-master:8080 --root-ca-file=/root/keys/ca.crt --service-account-private-key-file=/root/keys/server.key & >>/var/log/kubernetes/kube-controller-manage.log
## ETCD 启动不起来-问题<1>
etcd是kubernetes 集群的zookeeper进程,几乎所有的service都依赖于etcd的启动,比如flanneld,apiserver,docker…在启动etcd是报错日志如下:
May 24 13:39:09 k8s-master systemd: Stopped Flanneld overlay address etcd agent. May 24 13:39:28 k8s-master systemd: Starting Etcd Server... May 24 13:39:28 k8s-master etcd: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://etcd:2379,http://etcd:4001 May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_NAME, but unused: shadowed by corresponding flag May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corresponding flag May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadowed by corresponding flag May 24 13:39:28 k8s-master etcd: etcd Version: 3.1.3 May 24 13:39:28 k8s-master etcd: Git SHA: 21fdcc6 May 24 13:39:28 k8s-master etcd: Go Version: go1.7.4 May 24 13:39:28 k8s-master etcd: Go OS/Arch: linux/amd64 May 24 13:39:28 k8s-master etcd: setting maximum number of CPUs to 1, total number of available CPUs is 1 May 24 13:39:28 k8s-master etcd: the server is already initialized as member before, starting as etcd member... May 24 13:39:28 k8s-master etcd: listening for peers on http://localhost:2380 May 24 13:39:28 k8s-master etcd: listening for client requests on 0.0.0.0:2379 May 24 13:39:28 k8s-master etcd: listening for client requests on 0.0.0.0:4001 May 24 13:39:28 k8s-master etcd: recovered store from snapshot at index 140014 May 24 13:39:28 k8s-master etcd: name = master May 24 13:39:28 k8s-master etcd: data dir = /var/lib/etcd/default.etcd May 24 13:39:28 k8s-master etcd: member dir = /var/lib/etcd/default.etcd/member May 24 13:39:28 k8s-master etcd: heartbeat = 100ms May 24 13:39:28 k8s-master etcd: election = 1000ms May 24 13:39:28 k8s-master etcd: snapshot count = 10000 May 24 13:39:28 k8s-master etcd: advertise client URLs = http://etcd:2379,http://etcd:4001 May 24 13:39:28 k8s-master etcd: ignored file 0000000000000001-0000000000012700.wal.broken in wal May 24 13:39:29 k8s-master etcd: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 148905 May 24 13:39:29 k8s-master etcd: 8e9e05c52164694d became follower at term 12 May 24 13:39:29 k8s-master etcd: newRaft 8e9e05c52164694d [peers: [8e9e05c52164694d], term: 12, commit: 148905, applied: 140014, lastindex: 148905, lastterm: 12] May 24 13:39:29 k8s-master etcd: enabled capabilities for version 3.1 May 24 13:39:29 k8s-master etcd: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32 from store May 24 13:39:29 k8s-master etcd: set the cluster version to 3.1 from store May 24 13:39:29 k8s-master etcd: starting server... [version: 3.1.3, cluster version: 3.1] May 24 13:39:29 k8s-master etcd: raft save state and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory May 24 13:39:29 k8s-master systemd: etcd.service: main process exited, code=exited, status=1/FAILURE May 24 13:39:29 k8s-master systemd: Failed to start Etcd Server. May 24 13:39:29 k8s-master systemd: Unit etcd.service entered failed state. May 24 13:39:29 k8s-master systemd: etcd.service failed. May 24 13:39:29 k8s-master systemd: etcd.service holdoff time over, scheduling restart.
核心语句:
raft save state and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory
进入相关目录,删除 0.tmp,然后就可以启动啦!
## ETCD启动不起来-超时问题<2>
问题背景:当前部署了 3 个 etcd 节点,突然有一天 3 台集群全部停电宕机了。重新启动之后发现 K8S 集群是可以正常使用的,但是检查了一遍组件之后,发现有一个节点的 etcd 启动不了。
经过一遍探查,发现时间不准确,通过以下命令 ntpdate ntp.aliyun.com 重新将时间调整正确,重新启动 etcd,发现还是起不来,报错如下:
Mar 05 14:27:15 k8s-node2 etcd[3248]: etcd Version: 3.3.13 Mar 05 14:27:15 k8s-node2 etcd[3248]: Git SHA: 98d3084 Mar 05 14:27:15 k8s-node2 etcd[3248]: Go Version: go1.10.8 Mar 05 14:27:15 k8s-node2 etcd[3248]: Go OS/Arch: linux/amd64 Mar 05 14:27:15 k8s-node2 etcd[3248]: setting maximum number of CPUs to 4, total number of available CPUs is 4 Mar 05 14:27:15 k8s-node2 etcd[3248]: the server is already initialized as member before, starting as etcd member ... Mar 05 14:27:15 k8s-node2 etcd[3248]: peerTLS: cert = /opt/etcd/ssl/server.pem, key = /opt/etcd/ssl/server-key.pe m, ca = , trusted-ca = /opt/etcd/ssl/ca.pem, client-cert-auth = false, crl-file = Mar 05 14:27:15 k8s-node2 etcd[3248]: listening for peers on https://192.168.25.226:2380 Mar 05 14:27:15 k8s-node2 etcd[3248]: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files. Mar 05 14:27:15 k8s-node2 etcd[3248]: listening for client requests on 127.0.0.1:2379 Mar 05 14:27:15 k8s-node2 etcd[3248]: listening for client requests on 192.168.25.226:2379 Mar 05 14:27:15 k8s-node2 etcd[3248]: member 9c166b8b7cb6ecb8 has already been bootstrapped Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service: main process exited, code=exited, status=1/FAILURE Mar 05 14:27:15 k8s-node2 systemd[1]: Failed to start Etcd Server. Mar 05 14:27:15 k8s-node2 systemd[1]: Unit etcd.service entered failed state. Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service failed. Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service failed. Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service holdoff time over, scheduling restart. Mar 05 14:27:15 k8s-node2 systemd[1]: Starting Etcd Server... Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_NAME, but unused: shadowed by correspo nding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corr esponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_LISTEN_PEER_URLS, but unused: shadowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadow ed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS, but unuse d: shadowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_ADVERTISE_CLIENT_URLS, but unused: sha dowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_CLUSTER, but unused: shadowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_CLUSTER_TOKEN, but unused: sha dowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_CLUSTER_STATE, but unused: sha dowed by corresponding flag
**解决方法:**
检查日志发现并没有特别明显的错误,根据经验来讲,etcd 节点坏掉一个其实对集群没有大的影响,这时集群已经可以正常使用了,但是这个坏掉的 etcd 节点并没有启动,解决方法如下:
进入 etcd 的数据存储目录进行备份 备份原有数据:
cd /var/lib/etcd/default.etcd/member/ cp * /data/bak/
删除这个目录下的所有数据文件
rm -rf /var/lib/etcd/default.etcd/member/*
停止另外两台 etcd 节点,因为 etcd 节点启动时需要所有节点一起启动,启动成功后即可使用。
#master 节点 systemctl stop etcd systemctl restart etcd
#node1 节点 systemctl stop etcd systemctl restart etcd
#node2 节点 systemctl stop etcd systemctl restart etcd
## CentOS下配置主机互信
在每台服务器需要建立主机互信的用户名执行以下命令生成公钥/密钥,默认回车即可
ssh-keygen -t rsa
可以看到生成个公钥的文件。
互传公钥,第一次需要输入密码,之后就OK了。
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.199.132 (-p 2222)
-p 端口 默认端口不加-p,如果更改过端口,就得加上-p. 可以看到是在.ssh/下生成了个 authorized\_keys的文件,记录了能登陆这台服务器的其他服务器的公钥。
测试看是否能登陆:
ssh 192.168.199.132 (-p 2222)
## CentOS 主机名的修改
hostnamectl set-hostname k8s-master1
## Virtualbox 实现 CentOS 复制和粘贴功能
如果不安装或者不输出,可以将 update 修改成 install 再运行。
yum install update yum update kernel yum update kernel-devel yum install kernel-headers yum install gcc yum install gcc make
运行完后
sh VBoxLinuxAdditions.run
## 删除Pod一直处于Terminating状态
可以通过下面命令强制删除
kubectl delete pod NAME --grace-period=0 --force
## 删除namespace一直处于Terminating状态
可以通过以下脚本强制删除
[root@k8s-master1 k8s]# cat delete-ns.sh #!/bin/bash set -e
useage(){ echo "useage:" echo " delns.sh NAMESPACE" }
if [ $# -lt 1 ];then useage exit fi
NAMESPACE={NAMESPACE}.json
kubectl get ns "{JSONFILE}"
vi "{JSONFLE}"
http://127.0.0.1:8001/api/v1/namespaces/"${NAMESPACE}"/finalize
## 容器包含有效的 CPU/内存 requests 且没有指定 limits 可能会出现什么问题?
下面我们创建一个对应的容器,该容器只有 requests 设定,但是没有 limits 设定,
- name: busybox-cnt02 image: busybox command: ["/bin/sh"] args: ["-c", "while true; do echo hello from cnt02; sleep 10;done"] resources: requests: memory: "100Mi"