velero进行k8s集群备份(跨集群)

429 阅读8分钟

velero进行k8s集群备份(跨集群)

只要我们将每个 velero 实例指向相同的对象存储,velero 就能将资源从一个群集迁移到另一个群集。此外还支持定时备份,触发备份 Hooks 等操作,更多资料请查阅官方文档: 安装参照官方文档

较为详细的文档

下载velero

下载地址: github.com/vmware-tanz…

[root@server14 velero]# ls
velero-v1.10.0-linux-amd64.tar.gz
[root@server14 velero]# tar -zxvf velero-v1.10.0-linux-amd64.tar.gz 
velero-v1.10.0-linux-amd64/LICENSE
velero-v1.10.0-linux-amd64/examples/README.md
velero-v1.10.0-linux-amd64/examples/minio
velero-v1.10.0-linux-amd64/examples/minio/00-minio-deployment.yaml
velero-v1.10.0-linux-amd64/examples/nginx-app
velero-v1.10.0-linux-amd64/examples/nginx-app/README.md
velero-v1.10.0-linux-amd64/examples/nginx-app/base.yaml
velero-v1.10.0-linux-amd64/examples/nginx-app/with-pv.yaml
velero-v1.10.0-linux-amd64/velero
[root@server14 velero]# ll
总用量 35008
drwxr-xr-x 3 root root       67 12月 22 13:52 velero-v1.10.0-linux-amd64
-rw-r--r-- 1 root root 35844454 12月 22 13:51 velero-v1.10.0-linux-amd64.tar.gz
[root@server14 velero]# cd velero-v1.10.0-linux-amd64/
[root@server14 velero-v1.10.0-linux-amd64]# ls
examples  LICENSE  velero
[root@server14 velero-v1.10.0-linux-amd64]# mv velero /usr/local/bin/
[root@server14 velero-v1.10.0-linux-amd64]# velero version
Client:
        Version: v1.10.0
        Git commit: 367f563072659f0bcd809bc33507fd75cd722344
<error getting server version: no matches for kind "ServerStatusRequest" in version "velero.io/v1">

安装velero服务端

  1. 准备minio配置
[root@server14 velero]# cat minio-velero 
[default]
aws_access_key_id = P18QXB27F3FBH0
aws_secret_access_key = 0DVXEWDGQDX73M818mw9EGFGFF99e
  1. 在集群A中安装velero
[root@server14 velero]# BUCKET="velero"
[root@server14 velero]# REGION="minio"
[root@node1 velero]# velero install \
>     --provider aws \
>     --plugins velero/velero-plugin-for-aws:v1.6.0  \
>     --bucket $BUCKET \
>     --secret-file ./minio-velero \
>     --use-node-agent \
>     --default-volumes-to-fs-backup \
>     --use-volume-snapshots=false \
>     --backup-location-config region=$REGION,s3ForcePathStyle="true",s3Url=https://kstg.knowdee.com \
>     --uploader-type=kopia
CustomResourceDefinition/backuprepositories.velero.io: attempting to create resource
CustomResourceDefinition/backuprepositories.velero.io: attempting to create resource client
CustomResourceDefinition/backuprepositories.velero.io: created
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: attempting to create resource client
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource client
CustomResourceDefinition/backupstoragelocations.velero.io: created
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource client
CustomResourceDefinition/deletebackuprequests.velero.io: created
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource client
CustomResourceDefinition/downloadrequests.velero.io: created
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumebackups.velero.io: created
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumerestores.velero.io: created
CustomResourceDefinition/restores.velero.io: attempting to create resource
CustomResourceDefinition/restores.velero.io: attempting to create resource client
CustomResourceDefinition/restores.velero.io: created
CustomResourceDefinition/schedules.velero.io: attempting to create resource
CustomResourceDefinition/schedules.velero.io: attempting to create resource client
CustomResourceDefinition/schedules.velero.io: created
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource client
CustomResourceDefinition/serverstatusrequests.velero.io: created
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource client
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: attempting to create resource client
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: attempting to create resource client
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: attempting to create resource client
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: attempting to create resource client
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: attempting to create resource client
BackupStorageLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: attempting to create resource client
Deployment/velero: created
DaemonSet/node-agent: attempting to create resource
DaemonSet/node-agent: attempting to create resource client
DaemonSet/node-agent: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.
[root@node1 velero]# kubectl -n velero get pods -owide
NAME                      READY   STATUS    RESTARTS   AGE   IP              NODE    NOMINATED NODE   READINESS GATES
node-agent-bf22w          1/1     Running   0          14s   10.244.139.53   node6   <none>           <none>
node-agent-rn9wr          1/1     Running   0          14s   10.244.3.83     node4   <none>           <none>
node-agent-z5xjz          1/1     Running   0          15s   10.244.33.174   node5   <none>           <none>
velero-6db79cdc6f-lpdzc   1/1     Running   0          14s   10.244.139.56   node6   <none>           <none>

备份应用-带pvc

  1. 创建测试应用
[root@node1 ceph-csi]# kubectl create ns ceph-csi
namespace/ceph-csi created
[root@node1 ceph-csi]# kubectl apply -f nginx.yaml -f pvc.yaml 
deployment.apps/my-nginx-test-rbd-repeat created
[root@node1 ceph-csi]# kubectl -n ceph-csi get pods
NAME                                        READY   STATUS    RESTARTS   AGE
my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv   1/1     Running   0          17s
[root@node1 ceph-csi]# kubectl -n ceph-csi exec -it pod/my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv -- /bin/bash
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/# cd /usr/share/rbd/
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# ls
lost+found
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# cat "caoyong mark backup " > mark.txt
cat: 'caoyong mark backup ': No such file or directory
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# echo "caoyong mark backup " > mark.txt
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# cat mark.txt 
caoyong mark backup 
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# exit
exit

yaml内容如下:

[root@node1 ceph-csi]# cat nginx.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx-test-rbd-repeat
  namespace: ceph-csi
spec:
  selector:
    matchLabels:
      app: nginx
#  serviceName: ngx-service
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
        volumeMounts:
        - name: rbd
          mountPath: /usr/share/rbd
      volumes:
      - name: rbd
        persistentVolumeClaim:              #指定pvc
          claimName: rbd-pvc #cephfs-pvc

pvc.yaml内容

[root@node1 ceph-csi]# cat pvc.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
  namespace: ceph-csi 
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block
  1. 在集群A上面执行velero备份
[root@node1 ceph-csi]# velero backup create nginx-backup-rbd   --include-namespaces=ceph-csi   --default-volumes-to-fs-backup 
Backup request "nginx-backup-rbd" submitted successfully.
Run `velero backup describe nginx-backup-rbd` or `velero backup logs nginx-backup-rbd` for more details.
[root@node1 ceph-csi]# velero backup get
NAME               STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
nginx-backup-rbd   Completed   0        0          2022-12-23 18:30:41 +0800 CST   29d       default            <none>

恢复

  1. 在集群B上面,安装velero客户端和服务端,过程参照A集群(一模一样哦)
  2. 在集群B上面执行恢复
[root@server14 workspace]#  velero backup get
NAME               STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
nginx-backup-rbd   Completed   0        0          2022-12-23 18:30:41 +0800 CST   29d       default            <none>
[root@server14 velero]# velero restore create --from-backup=nginx-backup-rbd --include-namespaces=ceph-csi  
Restore request "nginx-backup-rbd-20221223183141" submitted successfully.
Run `velero restore describe nginx-backup-rbd-20221223183141` or `velero restore logs nginx-backup-rbd-20221223183141` for more details.
[root@server14 velero]# velero restore get
NAME                              BACKUP             STATUS       STARTED                         COMPLETED   ERRORS   WARNINGS   CREATED                         SELECTOR
nginx-backup-rbd-20221223183141   nginx-backup-rbd   InProgress   2022-12-23 18:31:41 +0800 CST   <nil>       0        0          2022-12-23 18:31:41 +0800 CST   <none>
[root@server14 velero]# velero restore get
NAME                              BACKUP             STATUS      STARTED                         COMPLETED                       ERRORS   WARNINGS   CREATED                         SELECTOR
nginx-backup-rbd-20221223183141   nginx-backup-rbd   Completed   2022-12-23 18:31:41 +0800 CST   2022-12-23 18:32:19 +0800 CST   0        1          2022-12-23 18:31:41 +0800 CST   <none>
[root@server14 velero]# kubectl -n ceph-csi get pods
NAME                                        READY   STATUS    RESTARTS   AGE
my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv   1/1     Running   0          3m50s
[root@server14 velero]# kubectl -n ceph-csi exec -it pod/my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv -- cat /usr/share/rbd/mark.txt
Defaulted container "nginx" out of: nginx, restore-wait (init)
caoyong mark backup 
[root@server14 velero]# kubectl -n ceph-csi exec -it pod/my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv -c nginx -- cat /usr/share/rbd/mark.txt
caoyong mark backup 
[root@server14 velero]# kubectl -n ceph-csi get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rbd-pvc   Bound    pvc-03412729-a396-4449-9adc-18cdfeea9f34   1Gi        RWO            rook-ceph-block   31m

问题解决

  1. 恢复过程中,restore状态一直是New

在恢复的过程中,执行velero restore create --from-backup=nginx-backup-rbd --include-namespaces=ceph-csi ,其状态一直是New,查看日志,debug等等,均没有找到有效信息:

[root@server14 velero]# velero debug --restore nginx-backup-rbd-20221223163049
2022/12/23 16:45:24 Collecting velero resources in namespace: velero
2022/12/23 16:45:25 Collecting velero deployment logs in namespace: velero
2022/12/23 16:45:25 Collecting log and information for restore: nginx-backup-rbd-20221223163049
2022/12/23 16:45:25 Generated debug information bundle: /root/velero/bundle-2022-12-23-16-45-24.tar.gz
[root@server14 velero]# ls
bundle-2022-12-23-16-45-24.tar.gz  external-snapshotter-6.1.0.tar.gz  minio-velero                velero-v1.10.0-linux-amd64.tar.gz
external-snapshotter-6.1.0         logs                               velero-v1.10.0-linux-amd64

然后找到资料说:

当一个备份或者恢复任务开始时,这个任务对于特定的controller来说是串行的。也就是说前一个备份没有做完的话,后面一个备份任务即使创建出来,也不会处理,直到前一个顺利完成,或者超时。如果第一个任务遇到问题,即使删掉任务的CR,这个任务的状态还在内存中,并且velero的controller reconcile逻辑是直到做完或者超时,否则不会return,并不会因为CR的删掉而结束任务。因此,即使删掉任务,也不会立即处理下一个任务,要等这个任务超时才行(因为CR被删了,没法顺利完成了)。

应变方法

遇到这个问题时,一个简单的应变方法就是:删掉velero deployment的pod,造成pod重启,重启之后内存中的任务就没有了,下一个任务就开始了。

更改 PV/PVC 存储类

在备份集群和恢复集群中,应该具有相同的storage-class,否则,不能恢复。。。。。
Velero 可以在还原期间更改持久卷和持久卷声明的存储类。要配置存储类映射,请在 Velero 命名空间中创建配置映射,如下所示:

apiVersion: v1
kind: ConfigMap
metadata:
  # any name can be used; Velero uses the labels (below)
  # to identify it rather than the name
  name: change-storage-class-config
  # must be in the velero namespace
  namespace: velero
  # the below labels should be used verbatim in your
  # ConfigMap.
  labels:
    # this value-less label identifies the ConfigMap as
    # config for a plugin (i.e. the built-in change storage
    # class restore item action plugin)
    velero.io/plugin-config: ""
    # this label identifies the name and kind of plugin
    # that this ConfigMap is for.
    velero.io/change-storage-class: RestoreItemAction
data:
  # add 1+ key-value pairs here, where the key is the old
  # storage class name and the value is the new storage
  # class name.
  <old-storage-class>: <new-storage-class>

参考

使用yaml k8sAPI进行备份

参考: velero.io/docs/v1.10/…
velero.io/docs/v1.10/…

其他

A集群中卸载插件过程:

[root@node1 velero]# cd external-snapshotter-6.1.0
[root@node1 external-snapshotter-6.1.0]# ls
CHANGELOG  cloudbuild.yaml  code-of-conduct.md  deploy    go.mod  LICENSE   OWNERS          pkg        release-tools      vendor
client     cmd              CONTRIBUTING.md     examples  go.sum  Makefile  OWNERS_ALIASES  README.md  SECURITY_CONTACTS
[root@node1 external-snapshotter-6.1.0]# kubectl -n kube-system kustomize deploy/kubernetes/snapshot-controller | kubectl delete -f -
serviceaccount "snapshot-controller" deleted
role.rbac.authorization.k8s.io "snapshot-controller-leaderelection" deleted
clusterrole.rbac.authorization.k8s.io "snapshot-controller-runner" deleted
rolebinding.rbac.authorization.k8s.io "snapshot-controller-leaderelection" deleted
clusterrolebinding.rbac.authorization.k8s.io "snapshot-controller-role" deleted
deployment.apps "snapshot-controller" deleted
[root@node1 external-snapshotter-6.1.0]# kubectl kustomize client/config/crd | kubectl delete -f -
customresourcedefinition.apiextensions.k8s.io "volumesnapshotclasses.snapshot.storage.k8s.io" deleted
customresourcedefinition.apiextensions.k8s.io "volumesnapshotcontents.snapshot.storage.k8s.io" deleted
customresourcedefinition.apiextensions.k8s.io "volumesnapshots.snapshot.storage.k8s.io" deleted

All.