longhorn问题解决

275 阅读3分钟

问题1

1. 描述

longhorn因为长时间单节点运行,导致重启节点后,无法调度,期间,想添加上磁盘后,重启调度,但是发生如下提示,导致操作无法顺利进行:

admission webhook "validator.longhorn.io" denied the request: spec and status of disks on node node4 are being syncing and please retry later

2. 问题解决

参照:github.com/longhorn/lo…

Case 1. If user can still edit the disks

  • Delete the evicted disk from the Longhorn node resource
    Navigate to UI > Node page. Click the Edit node and disks and then delete the evicted disks from the node.
    or
  • . Disable the eviction of the disk from the Longhorn node resource
    Navigate to UI > Node page. Click the Edit node and disks and then set the disk eviction to false.

Case 2. If user encounters the error message spec and status of disks on node %v are being syncing and please retry later. while editing disks
Step 1. 编辑validatingwebhookconfigurations.admissionregistration.k8s.io longhorn-webhook-validator

# kubectl edit validatingwebhookconfigurations.admissionregistration.k8s.io longhorn-webhook-validator

... OMITTED ...

webhooks:
- admissionReviewVersions:
  - v1
  clientConfig:
    caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJwakNDQVUyZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQTdNUnd3R2dZRFZRUUtFeE5rZVc1aGJXbGoK
    service:
      name: longhorn-admission-webhook
      namespace: longhorn-system
      path: /v1/webhook/validaton
      port: 9443
  failurePolicy: Fail
  matchPolicy: Exact
  name: validator.longhorn.io
  namespaceSelector: {}
  objectSelector: {}
  rules:
  - apiGroups:
    - longhorn.io
    apiVersions:
    - v1beta2
    operations:
    - UPDATE
    resources:
    - nodes
    scope: Namespaced
  - apiGroups:
    - longhorn.io
    apiVersions:
    - v1beta2
    operations:
    - CREATE
    - UPDATE
    resources:
    - settings
    scope: Namespaced

... OMITTED ...

Step 2. Remove the part (rule for node validation) and then save.

  - apiGroups:
    - longhorn.io
    apiVersions:
    - v1beta2
    operations:
    - UPDATE
    resources:
    - nodes
    scope: Namespaced

Step 3. Remove the evicted disks in the nodes on Longhorn UI as described in Workaround>Case 1. 最关键的一步

Step 4. Delete the admission webhook pod, and then the pods are re-created and regenerate the rules

 kubectl -n longhorn-system delete pod -l app=longhorn-admission-webhook

实际上,如果存在磁盘编辑中,如: Only the disk with disabled scheduling and no storage scheduled can be deleted,这类问题,都可以通过上述问题得到解决。

问题2

1. 问题描述

在恢复磁盘的时候,节点可以调度,磁盘无法调度,在页面上面看到问题描述:the disk default-disk-fd0000000000(/var/lib/longhorn/) on the node node4 has 457388851200 available, but requires reserved 153461723136, minimal 25% to schedule more replicas 11.png

该问题是磁盘设置的可用百分比

修改如下配置即可:

1222.png

2. longhorn volume不能被删除

一般情况下,是因为有关联的snapshot快照,删除对应的快照,即可删除

[root@node6 longhorn]# kubectl -n longhorn-system get Snapshot|grep pvc-beed9bde-984d-4c88-8de3-546a9873c775                                                                                              
9f812c65-9c0b-440e-a8dd-c904d12f5224   pvc-beed9bde-984d-4c88-8de3-546a9873c775   2023-04-10T05:08:01Z   false        21474836480     0               3d21h                                               
[root@node6 longhorn]# kubectl -n longhorn-system delete Snapshot 9f812c65-9c0b-440e-a8dd-c904d12f5224                                                                                                    
snapshot.longhorn.io "9f812c65-9c0b-440e-a8dd-c904d12f5224" deleted    
  finalizers:
    - longhorn.io

常用命令

kubectl -n longhorn-system get replicas.longhorn.io pvc-e769316c-d435-4ef5-a80e-dbbc56157fa9-r-ea4cf0e8 -oyaml > /tmp/pvc-e769316c-d435-4ef5-a80e-dbbc56157fa9-r-ea4cf0e8.yaml
kubectl -n longhorn-system edit replicas.longhorn.io pvc-e769316c-d435-4ef5-a80e-dbbc56157fa9-r-ea4cf0e8


kubectl edit validatingwebhookconfigurations.admissionregistration.k8s.io longhorn-webhook-validator

kubectl -n longhorn-system get pod -l app=longhorn-admission-webhook

kubectl -n longhorn-system delete pod -l app=longhorn-admission-webhook
kubectl  -n longhorn-system get pods -owide |grep node4

ll|awk '{if($9~"pvc-e769316c-d435-4ef5-a80e-dbbc56157fa9*") print "du -sh "$9}'



57145b4b2ed7-r-4297c9d5

d850ba7bf18d-r-fb861938


kubectl    get pods -A -owide -w|grep node4


用户可以按 列出孤立的副本目录。kubectl -n longhorn-system get orphans
用户可以通过 删除孤立的副本目录。kubectl -n longhorn-system delete orphan <name>
用户可以通过以下方式启用全局自动删除kubectl -n longhorn-system edit settings orphan-auto-deletion