kubernetes pod 一直卡在 ContainerCreating 状态

955 阅读2分钟

问题描述

在使用 kubectl create -f 命令创建完 rc,再使用kubectl get pods 查看 pod 状态,发现 pod 一直卡在 ContainerCreating 状态,执行步骤如下

# kubectl create -f mysql-rc.yaml
replicationcontroller "mysql" created
# kubectl get pods
NAME          READY     STATUS              RESTARTS   AGE
mysql-nznsb   0/1       ContainerCreating   0          12m

我的mysql-rc.yaml

apiVersion: v1
kind: ReplicationController
metadata:
  name: mysql
spec:
  replicas: 1
  selector:
    app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql/mysql-server:8.0.18-1.1.13
        ports:
        - containerPort: 3306
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "123456"

问题排查及解决

使用kubectl describe命令查看 pod 最近的事件

# kube describe pod mysql
Name:		mysql-dkh46
Namespace:	default
Node:		127.0.0.1/127.0.0.1
Start Time:	Sun, 04 Jul 2021 19:48:13 +0800
Labels:		app=mysql
Status:		Pending
IP:
Controllers:	ReplicationController/mysql
Containers:
  mysql:
    Container ID:
    Image:		mysql/mysql-server:8.0.18-1.1.13
    Image ID:
    Port:		3306/TCP
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:	<none>
    Environment Variables:
      MYSQL_ROOT_PASSWORD:	123456
Conditions:
  Type		Status
  Initialized 	True
  Ready 	False
  PodScheduled 	True
No volumes.
QoS Class:	BestEffort
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	TypeReason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  1m		1m		1	{default-scheduler }			Normal		Scheduled	Successfully assigned mysql-dkh46 to 127.0.0.1
  1m		28s		4	{kubelet 127.0.0.1}			Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request.  details: (open /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such file or directory)"

  1m	2s	5	{kubelet 127.0.0.1}		Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"registry.access.redhat.com/rhel7/pod-infrastructure:latest\""

发现在拉取镜像的时候,报了下面的错误

Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request.  details: (open /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such file or directory)"

原因是找不到/etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt这个证书,接着使用ll命令发现该地址为一个软链接,而且链接的文件也不存在

# ll /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt
lrwxrwxrwx. 1 root root 27 7月   4 17:26 /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt -> /etc/rhsm/ca/redhat-uep.pem
# ll /etc/rhsm/ca | grep redhat | wc -l
0

网上搜索了资料了解到 rhsm 系列是 redhat 红帽的订阅服务相关包,centos 是重编译 redhat 发布得到的,所以也需要用到 rhsm。报错信息里报告的缺的证书位置,其实只是个符号链接,真正缺的证书位置在 /etc/rhsm/ca/redhat-uep.pem 。 某个版本以前这个证书是通过 python-rhsm-certificates 包提供,但 centos 7 里提示这个包被 subscription-manager-rhsm-certificates 替代了。坑人的点是这个新换上来的包有bug, 包装完了提示正确,其实没证书,有兴趣的可以到这里查看 issue

issue 里提供了一种不用下载包,也不用从旧版 python-rhsm-certificates 包提取证书的办法,只需要执行下面命令即可

openssl s_client -showcerts -servername registry.access.redhat.com -connect registry.access.redhat.com:443 </dev/null 2>/dev/null | openssl x509 -text > /etc/rhsm/ca/redhat-uep.pem

提取完证书就能正确正常启动 pod 啦

# kubectl get pods
NAME          READY     STATUS    RESTARTS   AGE
mysql-nznsb   1/1       Running   0          29m

参考链接: