问题一:pvc处于使用状态
问题发现
在日常巡查的时候,发现有一部分pod处于pending状态
[root@sc-master-1 kube-prometheus-stack]# kubectl get po -A -owide|awk '{if($4!="Running"){print $0}}'
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dgraph111 dgraph-dgraph-alpha-2 0/1 ContainerCreating 0 14d <none> sc-midware-1 <none> <none>
kafka2 kafka-controller-2 0/1 Init:0/2 0 14d <none> sc-midware-3 <none>
看到pending原因:
[root@sc-master-1 ~]# kubectl -n dgraph111 describe po dgraph-dgraph-alpha-2
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 36m (x1860 over 14d) kubelet Unable to attach or mount volumes: unmounted volumes=[datadir], unattached volumes=[kube-api-access-lgd9x datadir tls-volume]: timed out waiting for the condition
Warning FailedMount 29m (x1946 over 14d) kubelet Unable to attach or mount volumes: unmounted volumes=[datadir], unattached volumes=[tls-volume kube-api-access-lgd9x datadir]: timed out waiting for the condition
Warning FailedMount 11m (x5478 over 14d) kubelet Unable to attach or mount volumes: unmounted volumes=[datadir], unattached volumes=[datadir tls-volume kube-api-access-lgd9x]: timed out waiting for the condition
Warning FailedMount 40s (x7282 over 14d) kubelet MountVolume.MountDevice failed for volume "pvc-282463f7-4e8d-435e-b3e9-f59b0c5a7a40" : rpc error: code = Internal desc = rbd image rbd-pool/csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2 is still being used
Warning FailedMount 40s (x7282 over 14d) kubelet MountVolume.MountDevice failed for volume "pvc-282463f7-4e8d-435e-b3e9-f59b0c5a7a40" : rpc error: code = Internal desc = rbd image rbd-pool/csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2 is still being used
状态显示,挂载的pvc还处于使用状态,因此不能被当前pod的容器挂载。
问题解决
登录到 Ceph 管理节点中查看该 image 正在被谁使用:
[root@sc-master-1 monitoring]# kubectl -n rook-ceph exec -it pod/rook-ceph-tools-765f749d9b-v9v5w -- /bin/bash
bash-4.4$ rbd status rbd-pool/csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2
Watchers:
watcher=172.70.10.23:0/2960244433 client.7520909 cookie=18446462598732841120
可以看到,image在被172.70.10.23这个机器绑定。 登录到172.70.10.23机器,找到csi-rbdplugin容器
[root@sc-midware-2 ~]# nerdctl -n k8s.io ps |grep csi-rbdplugin
4283442adb41 quay.io/cephcsi/cephcsi:v3.9.0 "/usr/local/bin/ceph…" 2 weeks ago Up k8s://rook-ceph/csi-rbdplugin-rlh6d/csi-rbdplugin
470f29254ac2 registry.k8s.io/pause:3.8 "/pause" 2 weeks ago Up k8s://rook-ceph/csi-rbdplugin-rlh6d
bbbb8d304611 registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0 "/csi-node-driver-re…" 2 weeks ago Up k8s://rook-ceph/csi-rbdplugin-rlh6d/driver-registrar
该容器是4283442adb41
登录该容器,并删除挂载关系:
[root@sc-midware-2 ~]# nerdctl -n k8s.io exec -it 4283442adb41 /bin/bash
[root@sc-midware-2 /]# rbd showmapped|grep csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2
7 rbd-pool csi-vol-d485ea06-4081-457c-a44a-43622f74e1c2 - /dev/rbd7
[root@sc-midware-2 /]# rbd unmap -o force /dev/rbd7
然后可以看到pod可以正常启动:
dgraph-dgraph-alpha-2 1/1 Running 0 14d
[root@sc-midware-2 ~]# kubectl -n dgraph111 get po dgraph-dgraph-alpha-2
NAME READY STATUS RESTARTS AGE
dgraph-dgraph-alpha-2 1/1 Running 0 14d