velero进行k8s集群备份(跨集群)
只要我们将每个 velero 实例指向相同的对象存储,velero 就能将资源从一个群集迁移到另一个群集。此外还支持定时备份,触发备份 Hooks 等操作,更多资料请查阅官方文档: 安装参照官方文档
下载velero
下载地址: github.com/vmware-tanz…
[root@server14 velero]# ls
velero-v1.10.0-linux-amd64.tar.gz
[root@server14 velero]# tar -zxvf velero-v1.10.0-linux-amd64.tar.gz
velero-v1.10.0-linux-amd64/LICENSE
velero-v1.10.0-linux-amd64/examples/README.md
velero-v1.10.0-linux-amd64/examples/minio
velero-v1.10.0-linux-amd64/examples/minio/00-minio-deployment.yaml
velero-v1.10.0-linux-amd64/examples/nginx-app
velero-v1.10.0-linux-amd64/examples/nginx-app/README.md
velero-v1.10.0-linux-amd64/examples/nginx-app/base.yaml
velero-v1.10.0-linux-amd64/examples/nginx-app/with-pv.yaml
velero-v1.10.0-linux-amd64/velero
[root@server14 velero]# ll
总用量 35008
drwxr-xr-x 3 root root 67 12月 22 13:52 velero-v1.10.0-linux-amd64
-rw-r--r-- 1 root root 35844454 12月 22 13:51 velero-v1.10.0-linux-amd64.tar.gz
[root@server14 velero]# cd velero-v1.10.0-linux-amd64/
[root@server14 velero-v1.10.0-linux-amd64]# ls
examples LICENSE velero
[root@server14 velero-v1.10.0-linux-amd64]# mv velero /usr/local/bin/
[root@server14 velero-v1.10.0-linux-amd64]# velero version
Client:
Version: v1.10.0
Git commit: 367f563072659f0bcd809bc33507fd75cd722344
<error getting server version: no matches for kind "ServerStatusRequest" in version "velero.io/v1">
安装velero服务端
- 准备minio配置
[root@server14 velero]# cat minio-velero
[default]
aws_access_key_id = P18QXB27F3FBH0
aws_secret_access_key = 0DVXEWDGQDX73M818mw9EGFGFF99e
- 在集群A中安装velero
[root@server14 velero]# BUCKET="velero"
[root@server14 velero]# REGION="minio"
[root@node1 velero]# velero install \
> --provider aws \
> --plugins velero/velero-plugin-for-aws:v1.6.0 \
> --bucket $BUCKET \
> --secret-file ./minio-velero \
> --use-node-agent \
> --default-volumes-to-fs-backup \
> --use-volume-snapshots=false \
> --backup-location-config region=$REGION,s3ForcePathStyle="true",s3Url=https://kstg.knowdee.com \
> --uploader-type=kopia
CustomResourceDefinition/backuprepositories.velero.io: attempting to create resource
CustomResourceDefinition/backuprepositories.velero.io: attempting to create resource client
CustomResourceDefinition/backuprepositories.velero.io: created
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: attempting to create resource client
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource client
CustomResourceDefinition/backupstoragelocations.velero.io: created
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource client
CustomResourceDefinition/deletebackuprequests.velero.io: created
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource client
CustomResourceDefinition/downloadrequests.velero.io: created
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumebackups.velero.io: created
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumerestores.velero.io: created
CustomResourceDefinition/restores.velero.io: attempting to create resource
CustomResourceDefinition/restores.velero.io: attempting to create resource client
CustomResourceDefinition/restores.velero.io: created
CustomResourceDefinition/schedules.velero.io: attempting to create resource
CustomResourceDefinition/schedules.velero.io: attempting to create resource client
CustomResourceDefinition/schedules.velero.io: created
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource client
CustomResourceDefinition/serverstatusrequests.velero.io: created
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource client
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: attempting to create resource client
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: attempting to create resource client
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: attempting to create resource client
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: attempting to create resource client
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: attempting to create resource client
BackupStorageLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: attempting to create resource client
Deployment/velero: created
DaemonSet/node-agent: attempting to create resource
DaemonSet/node-agent: attempting to create resource client
DaemonSet/node-agent: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.
[root@node1 velero]# kubectl -n velero get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-agent-bf22w 1/1 Running 0 14s 10.244.139.53 node6 <none> <none>
node-agent-rn9wr 1/1 Running 0 14s 10.244.3.83 node4 <none> <none>
node-agent-z5xjz 1/1 Running 0 15s 10.244.33.174 node5 <none> <none>
velero-6db79cdc6f-lpdzc 1/1 Running 0 14s 10.244.139.56 node6 <none> <none>
备份应用-带pvc
- 创建测试应用
[root@node1 ceph-csi]# kubectl create ns ceph-csi
namespace/ceph-csi created
[root@node1 ceph-csi]# kubectl apply -f nginx.yaml -f pvc.yaml
deployment.apps/my-nginx-test-rbd-repeat created
[root@node1 ceph-csi]# kubectl -n ceph-csi get pods
NAME READY STATUS RESTARTS AGE
my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv 1/1 Running 0 17s
[root@node1 ceph-csi]# kubectl -n ceph-csi exec -it pod/my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv -- /bin/bash
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/# cd /usr/share/rbd/
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# ls
lost+found
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# cat "caoyong mark backup " > mark.txt
cat: 'caoyong mark backup ': No such file or directory
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# echo "caoyong mark backup " > mark.txt
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# cat mark.txt
caoyong mark backup
root@my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv:/usr/share/rbd# exit
exit
yaml内容如下:
[root@node1 ceph-csi]# cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-test-rbd-repeat
namespace: ceph-csi
spec:
selector:
matchLabels:
app: nginx
# serviceName: ngx-service
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
volumeMounts:
- name: rbd
mountPath: /usr/share/rbd
volumes:
- name: rbd
persistentVolumeClaim: #指定pvc
claimName: rbd-pvc #cephfs-pvc
pvc.yaml内容
[root@node1 ceph-csi]# cat pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-pvc
namespace: ceph-csi
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: rook-ceph-block
- 在集群A上面执行velero备份
[root@node1 ceph-csi]# velero backup create nginx-backup-rbd --include-namespaces=ceph-csi --default-volumes-to-fs-backup
Backup request "nginx-backup-rbd" submitted successfully.
Run `velero backup describe nginx-backup-rbd` or `velero backup logs nginx-backup-rbd` for more details.
[root@node1 ceph-csi]# velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
nginx-backup-rbd Completed 0 0 2022-12-23 18:30:41 +0800 CST 29d default <none>
恢复
- 在集群B上面,安装velero客户端和服务端,过程参照A集群(一模一样哦)
- 在集群B上面执行恢复
[root@server14 workspace]# velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
nginx-backup-rbd Completed 0 0 2022-12-23 18:30:41 +0800 CST 29d default <none>
[root@server14 velero]# velero restore create --from-backup=nginx-backup-rbd --include-namespaces=ceph-csi
Restore request "nginx-backup-rbd-20221223183141" submitted successfully.
Run `velero restore describe nginx-backup-rbd-20221223183141` or `velero restore logs nginx-backup-rbd-20221223183141` for more details.
[root@server14 velero]# velero restore get
NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
nginx-backup-rbd-20221223183141 nginx-backup-rbd InProgress 2022-12-23 18:31:41 +0800 CST <nil> 0 0 2022-12-23 18:31:41 +0800 CST <none>
[root@server14 velero]# velero restore get
NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
nginx-backup-rbd-20221223183141 nginx-backup-rbd Completed 2022-12-23 18:31:41 +0800 CST 2022-12-23 18:32:19 +0800 CST 0 1 2022-12-23 18:31:41 +0800 CST <none>
[root@server14 velero]# kubectl -n ceph-csi get pods
NAME READY STATUS RESTARTS AGE
my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv 1/1 Running 0 3m50s
[root@server14 velero]# kubectl -n ceph-csi exec -it pod/my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv -- cat /usr/share/rbd/mark.txt
Defaulted container "nginx" out of: nginx, restore-wait (init)
caoyong mark backup
[root@server14 velero]# kubectl -n ceph-csi exec -it pod/my-nginx-test-rbd-repeat-5d8f9ddd8b-gzdrv -c nginx -- cat /usr/share/rbd/mark.txt
caoyong mark backup
[root@server14 velero]# kubectl -n ceph-csi get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
rbd-pvc Bound pvc-03412729-a396-4449-9adc-18cdfeea9f34 1Gi RWO rook-ceph-block 31m
问题解决
- 恢复过程中,restore状态一直是New
在恢复的过程中,执行velero restore create --from-backup=nginx-backup-rbd --include-namespaces=ceph-csi
,其状态一直是New,查看日志,debug等等,均没有找到有效信息:
[root@server14 velero]# velero debug --restore nginx-backup-rbd-20221223163049
2022/12/23 16:45:24 Collecting velero resources in namespace: velero
2022/12/23 16:45:25 Collecting velero deployment logs in namespace: velero
2022/12/23 16:45:25 Collecting log and information for restore: nginx-backup-rbd-20221223163049
2022/12/23 16:45:25 Generated debug information bundle: /root/velero/bundle-2022-12-23-16-45-24.tar.gz
[root@server14 velero]# ls
bundle-2022-12-23-16-45-24.tar.gz external-snapshotter-6.1.0.tar.gz minio-velero velero-v1.10.0-linux-amd64.tar.gz
external-snapshotter-6.1.0 logs velero-v1.10.0-linux-amd64
然后找到资料说:
当一个备份或者恢复任务开始时,这个任务对于特定的controller来说是串行的。也就是说前一个备份没有做完的话,后面一个备份任务即使创建出来,也不会处理,直到前一个顺利完成,或者超时。如果第一个任务遇到问题,即使删掉任务的CR,这个任务的状态还在内存中,并且velero的controller reconcile逻辑是直到做完或者超时,否则不会return,并不会因为CR的删掉而结束任务。因此,即使删掉任务,也不会立即处理下一个任务,要等这个任务超时才行(因为CR被删了,没法顺利完成了)。
应变方法
遇到这个问题时,一个简单的应变方法就是:删掉velero deployment的pod,造成pod重启,重启之后内存中的任务就没有了,下一个任务就开始了。
更改 PV/PVC 存储类
在备份集群和恢复集群中,应该具有相同的storage-class,否则,不能恢复。。。。。
Velero 可以在还原期间更改持久卷和持久卷声明的存储类。要配置存储类映射,请在 Velero 命名空间中创建配置映射,如下所示:
apiVersion: v1
kind: ConfigMap
metadata:
# any name can be used; Velero uses the labels (below)
# to identify it rather than the name
name: change-storage-class-config
# must be in the velero namespace
namespace: velero
# the below labels should be used verbatim in your
# ConfigMap.
labels:
# this value-less label identifies the ConfigMap as
# config for a plugin (i.e. the built-in change storage
# class restore item action plugin)
velero.io/plugin-config: ""
# this label identifies the name and kind of plugin
# that this ConfigMap is for.
velero.io/change-storage-class: RestoreItemAction
data:
# add 1+ key-value pairs here, where the key is the old
# storage class name and the value is the new storage
# class name.
<old-storage-class>: <new-storage-class>
使用yaml k8sAPI进行备份
参考:
velero.io/docs/v1.10/…
velero.io/docs/v1.10/…
其他
A集群中卸载插件过程:
[root@node1 velero]# cd external-snapshotter-6.1.0
[root@node1 external-snapshotter-6.1.0]# ls
CHANGELOG cloudbuild.yaml code-of-conduct.md deploy go.mod LICENSE OWNERS pkg release-tools vendor
client cmd CONTRIBUTING.md examples go.sum Makefile OWNERS_ALIASES README.md SECURITY_CONTACTS
[root@node1 external-snapshotter-6.1.0]# kubectl -n kube-system kustomize deploy/kubernetes/snapshot-controller | kubectl delete -f -
serviceaccount "snapshot-controller" deleted
role.rbac.authorization.k8s.io "snapshot-controller-leaderelection" deleted
clusterrole.rbac.authorization.k8s.io "snapshot-controller-runner" deleted
rolebinding.rbac.authorization.k8s.io "snapshot-controller-leaderelection" deleted
clusterrolebinding.rbac.authorization.k8s.io "snapshot-controller-role" deleted
deployment.apps "snapshot-controller" deleted
[root@node1 external-snapshotter-6.1.0]# kubectl kustomize client/config/crd | kubectl delete -f -
customresourcedefinition.apiextensions.k8s.io "volumesnapshotclasses.snapshot.storage.k8s.io" deleted
customresourcedefinition.apiextensions.k8s.io "volumesnapshotcontents.snapshot.storage.k8s.io" deleted
customresourcedefinition.apiextensions.k8s.io "volumesnapshots.snapshot.storage.k8s.io" deleted
All.