问题1
1. 描述
longhorn因为长时间单节点运行,导致重启节点后,无法调度,期间,想添加上磁盘后,重启调度,但是发生如下提示,导致操作无法顺利进行:
admission webhook "validator.longhorn.io" denied the request: spec and status of disks on node node4 are being syncing and please retry later
2. 问题解决
Case 1. If user can still edit the disks
- Delete the evicted disk from the Longhorn node resource
Navigate to UI > Node page. Click theEdit node and disks
and then delete the evicted disks from the node.
or - . Disable the eviction of the disk from the Longhorn node resource
Navigate to UI > Node page. Click theEdit node and disks
and then set the disk eviction to false.
Case 2. If user encounters the error message spec and status of disks on node %v are being syncing and please retry later.
while editing disks
Step 1. 编辑validatingwebhookconfigurations.admissionregistration.k8s.io longhorn-webhook-validator
# kubectl edit validatingwebhookconfigurations.admissionregistration.k8s.io longhorn-webhook-validator
... OMITTED ...
webhooks:
- admissionReviewVersions:
- v1
clientConfig:
caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJwakNDQVUyZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQTdNUnd3R2dZRFZRUUtFeE5rZVc1aGJXbGoK
service:
name: longhorn-admission-webhook
namespace: longhorn-system
path: /v1/webhook/validaton
port: 9443
failurePolicy: Fail
matchPolicy: Exact
name: validator.longhorn.io
namespaceSelector: {}
objectSelector: {}
rules:
- apiGroups:
- longhorn.io
apiVersions:
- v1beta2
operations:
- UPDATE
resources:
- nodes
scope: Namespaced
- apiGroups:
- longhorn.io
apiVersions:
- v1beta2
operations:
- CREATE
- UPDATE
resources:
- settings
scope: Namespaced
... OMITTED ...
Step 2. Remove the part (rule for node validation) and then save.
- apiGroups:
- longhorn.io
apiVersions:
- v1beta2
operations:
- UPDATE
resources:
- nodes
scope: Namespaced
Step 3. Remove the evicted disks in the nodes on Longhorn UI as described in Workaround>Case 1. 最关键的一步
Step 4. Delete the admission webhook pod, and then the pods are re-created and regenerate the rules
kubectl -n longhorn-system delete pod -l app=longhorn-admission-webhook
实际上,如果存在磁盘编辑中,如:
Only the disk with disabled scheduling and no storage scheduled can be deleted
,这类问题,都可以通过上述问题得到解决。
问题2
1. 问题描述
在恢复磁盘的时候,节点可以调度,磁盘无法调度,在页面上面看到问题描述:the disk default-disk-fd0000000000(/var/lib/longhorn/) on the node node4 has 457388851200 available, but requires reserved 153461723136, minimal 25% to schedule more replicas
该问题是磁盘设置的可用百分比
修改如下配置即可:
2. longhorn volume不能被删除
一般情况下,是因为有关联的snapshot快照,删除对应的快照,即可删除
[root@node6 longhorn]# kubectl -n longhorn-system get Snapshot|grep pvc-beed9bde-984d-4c88-8de3-546a9873c775
9f812c65-9c0b-440e-a8dd-c904d12f5224 pvc-beed9bde-984d-4c88-8de3-546a9873c775 2023-04-10T05:08:01Z false 21474836480 0 3d21h
[root@node6 longhorn]# kubectl -n longhorn-system delete Snapshot 9f812c65-9c0b-440e-a8dd-c904d12f5224
snapshot.longhorn.io "9f812c65-9c0b-440e-a8dd-c904d12f5224" deleted
finalizers:
- longhorn.io
常用命令
kubectl -n longhorn-system get replicas.longhorn.io pvc-e769316c-d435-4ef5-a80e-dbbc56157fa9-r-ea4cf0e8 -oyaml > /tmp/pvc-e769316c-d435-4ef5-a80e-dbbc56157fa9-r-ea4cf0e8.yaml
kubectl -n longhorn-system edit replicas.longhorn.io pvc-e769316c-d435-4ef5-a80e-dbbc56157fa9-r-ea4cf0e8
kubectl edit validatingwebhookconfigurations.admissionregistration.k8s.io longhorn-webhook-validator
kubectl -n longhorn-system get pod -l app=longhorn-admission-webhook
kubectl -n longhorn-system delete pod -l app=longhorn-admission-webhook
kubectl -n longhorn-system get pods -owide |grep node4
ll|awk '{if($9~"pvc-e769316c-d435-4ef5-a80e-dbbc56157fa9*") print "du -sh "$9}'
57145b4b2ed7-r-4297c9d5
d850ba7bf18d-r-fb861938
kubectl get pods -A -owide -w|grep node4
用户可以按 列出孤立的副本目录。kubectl -n longhorn-system get orphans
用户可以通过 删除孤立的副本目录。kubectl -n longhorn-system delete orphan <name>
用户可以通过以下方式启用全局自动删除kubectl -n longhorn-system edit settings orphan-auto-deletion