k8s挂载ceph的一些问题

194 阅读3分钟

问题一:pvc处于使用状态

问题发现

在日常巡查的时候,发现有一部分pod处于pending状态

截图 2023-10-26 10-05-39.png

[root@sc-master-1 kube-prometheus-stack]# kubectl get po -A -owide|awk '{if($4!="Running"){print $0}}'
NAMESPACE          NAME                                                       READY   STATUS              RESTARTS          AGE     IP               NODE             NOMINATED NODE   READINESS GATES
dgraph111          dgraph-dgraph-alpha-2                                      0/1     ContainerCreating   0                 14d     <none>           sc-midware-1     <none>           <none>
kafka2             kafka-controller-2                                         0/1     Init:0/2            0                 14d     <none>           sc-midware-3     <none>

看到pending原因:

[root@sc-master-1 ~]# kubectl -n dgraph111 describe po dgraph-dgraph-alpha-2
......
Events:
  Type     Reason       Age                   From     Message
  ----     ------       ----                  ----     -------
  Warning  FailedMount  36m (x1860 over 14d)  kubelet  Unable to attach or mount volumes: unmounted volumes=[datadir], unattached volumes=[kube-api-access-lgd9x datadir tls-volume]: timed out waiting for the condition
  Warning  FailedMount  29m (x1946 over 14d)  kubelet  Unable to attach or mount volumes: unmounted volumes=[datadir], unattached volumes=[tls-volume kube-api-access-lgd9x datadir]: timed out waiting for the condition
  Warning  FailedMount  11m (x5478 over 14d)  kubelet  Unable to attach or mount volumes: unmounted volumes=[datadir], unattached volumes=[datadir tls-volume kube-api-access-lgd9x]: timed out waiting for the condition
  Warning  FailedMount  40s (x7282 over 14d)  kubelet  MountVolume.MountDevice failed for volume "pvc-282463f7-4e8d-435e-b3e9-f59b0c5a7a40" : rpc error: code = Internal desc = rbd image rbd-pool/csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2 is still being used

Warning FailedMount 40s (x7282 over 14d) kubelet MountVolume.MountDevice failed for volume "pvc-282463f7-4e8d-435e-b3e9-f59b0c5a7a40" : rpc error: code = Internal desc = rbd image rbd-pool/csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2 is still being used 状态显示,挂载的pvc还处于使用状态,因此不能被当前pod的容器挂载。

问题解决

登录到 Ceph 管理节点中查看该 image 正在被谁使用:

[root@sc-master-1 monitoring]# kubectl -n rook-ceph exec -it pod/rook-ceph-tools-765f749d9b-v9v5w -- /bin/bash
bash-4.4$ rbd status rbd-pool/csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2
Watchers:
        watcher=172.70.10.23:0/2960244433 client.7520909 cookie=18446462598732841120

可以看到,image在被172.70.10.23这个机器绑定。 登录到172.70.10.23机器,找到csi-rbdplugin容器

[root@sc-midware-2 ~]# nerdctl -n k8s.io ps |grep csi-rbdplugin   
4283442adb41    quay.io/cephcsi/cephcsi:v3.9.0                                  "/usr/local/bin/ceph…"    2 weeks ago     Up                 k8s://rook-ceph/csi-rbdplugin-rlh6d/csi-rbdplugin
470f29254ac2    registry.k8s.io/pause:3.8                                       "/pause"                  2 weeks ago     Up                 k8s://rook-ceph/csi-rbdplugin-rlh6d
bbbb8d304611    registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0    "/csi-node-driver-re…"    2 weeks ago     Up                 k8s://rook-ceph/csi-rbdplugin-rlh6d/driver-registrar 

该容器是4283442adb41 登录该容器,并删除挂载关系:

[root@sc-midware-2 ~]# nerdctl -n k8s.io exec -it 4283442adb41 /bin/bash
[root@sc-midware-2 /]# rbd showmapped|grep csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2
7   rbd-pool             csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2  -     /dev/rbd7 
[root@sc-midware-2 /]# rbd unmap -o force /dev/rbd7

然后可以看到pod可以正常启动:

dgraph-dgraph-alpha-2   1/1     Running   0          14d
[root@sc-midware-2 ~]# kubectl -n dgraph111 get po dgraph-dgraph-alpha-2 
NAME                    READY   STATUS    RESTARTS   AGE
dgraph-dgraph-alpha-2   1/1     Running   0          14d

转自:blog.csdn.net/alex_yangch…