使用rook搭建ceph

1,798 阅读14分钟

安装部署

步骤参见:blog.csdn.net/networken/a…

Rook官网:rook.io
Github地址:github.com/rook/rook

部署环境准备 1、前提条件:

  • 至少准备3个节点、并且全部可以调度pod,满足ceph副本高可用要求
  • 节点存在可用的裸盘或裸分区,无格式化文件系统
  • 如果需要LVM,节点必须安装lvm2软件包 Ceph需要使用RBD模块构建的Linux内核,可以通过运行modprobe rbd来测试Kubernetes节点。
  • 如果要从Ceph共享文件系统(CephFS)创建卷​​,建议的最低内核版本为4.17。

开始安装过程:

  1. 下载源码安装包:
[root@node1 ~]# git clone --single-branch --branch v1.10.5 https://github.com/rook/rook.git
正克隆到 'rook'...
remote: Enumerating objects: 83792, done.
remote: Counting objects: 100% (188/188), done.
remote: Compressing objects: 100% (104/104), done.
接收对象中:  23% (19273/83792), 11.71 MiB | 1.15 MiB/s  
[root@node3 examples]# cd /root/rook-1.10.5/deploy/examples
  1. 部署Rook Operator
[root@node3 examples]#  kubectl create -f crds.yaml -f common.yaml -f operator.yaml
customresourcedefinition.apiextensions.k8s.io/cephblockpoolradosnamespaces.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephbucketnotifications.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephbuckettopics.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephclients.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystemmirrors.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystemsubvolumegroups.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectrealms.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzonegroups.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzones.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephrbdmirrors.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/objectbucketclaims.objectbucket.io created
customresourcedefinition.apiextensions.k8s.io/objectbuckets.objectbucket.io created
namespace/rook-ceph created
clusterrole.rbac.authorization.k8s.io/cephfs-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/cephfs-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/rbd-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
clusterrole.rbac.authorization.k8s.io/rook-ceph-global created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrole.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrole.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrole.rbac.authorization.k8s.io/rook-ceph-system created
clusterrolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-global created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-system created
role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
role.rbac.authorization.k8s.io/rbd-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
role.rbac.authorization.k8s.io/rook-ceph-purge-osd created
role.rbac.authorization.k8s.io/rook-ceph-rgw created
role.rbac.authorization.k8s.io/rook-ceph-system created
rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw created
rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
serviceaccount/rook-ceph-cmd-reporter created
serviceaccount/rook-ceph-mgr created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-purge-osd created
serviceaccount/rook-ceph-rgw created
serviceaccount/rook-ceph-system created
serviceaccount/rook-csi-cephfs-plugin-sa created
serviceaccount/rook-csi-cephfs-provisioner-sa created
serviceaccount/rook-csi-rbd-plugin-sa created
serviceaccount/rook-csi-rbd-provisioner-sa created
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created
##查看创建的operator pod
[root@node3 examples]# kubectl -n rook-ceph get pods
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-operator-7b4f6fd594-jdglb   1/1     Running   0          165m
  1. 接下来,开始创建ceph集群
#添加如下磁盘
[root@node3 examples]# vim cluster.yaml
238     nodes:
239     - name: "node4"
240       devices: # specific devices to use for storage can be specified for each node
241       - name: "/dev/vdb"
242       - name: "/dev/vdc"
243     - name: "node5"
244       devices: # specific devices to use for storage can be specified for each node
245       - name: "/dev/vdb"
246       - name: "/dev/vdc"
247     - name: "node6"
248       devices: # specific devices to use for storage can be specified for each node
249       - name: "/dev/vdb"
250       - name: "/dev/vdc"

999.png

[root@node3 examples]# kubectl create -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph created
[root@node3 examples]# kubectl get pods -n rook-ceph
NAME                                              READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-2xvrs                            2/2     Running     0          3h2m
csi-cephfsplugin-5bx4l                            2/2     Running     0          3h2m
csi-cephfsplugin-hwrfn                            2/2     Running     0          3h2m
csi-cephfsplugin-provisioner-75875b5887-cch5l     5/5     Running     0          3h2m
csi-cephfsplugin-provisioner-75875b5887-zbq2j     5/5     Running     0          3h2m
csi-rbdplugin-dbskr                               2/2     Running     0          3h2m
csi-rbdplugin-fbkv7                               2/2     Running     0          3h2m
csi-rbdplugin-provisioner-56d69f5d8-2vngj         5/5     Running     0          3h2m
csi-rbdplugin-provisioner-56d69f5d8-b7hd8         5/5     Running     0          3h2m
csi-rbdplugin-wwzhn                               2/2     Running     0          3h2m
rook-ceph-crashcollector-node4-788d65dcc4-87z5d   1/1     Running     0          3h1m
rook-ceph-crashcollector-node5-5b87ff6fc5-bbwsm   1/1     Running     0          3h
rook-ceph-crashcollector-node6-76679cb79b-mlwnm   1/1     Running     0          3h
rook-ceph-mgr-a-59bcc59d7c-8jrcp                  3/3     Running     0          3h1m
rook-ceph-mgr-b-5dbd588748-mxg65                  3/3     Running     0          3h1m
rook-ceph-mon-a-544b58cd97-8td94                  2/2     Running     0          3h2m
rook-ceph-mon-b-587674cd95-8gh89                  2/2     Running     0          3h1m
rook-ceph-mon-c-5b5c696bfd-ghm6k                  2/2     Running     0          3h1m
rook-ceph-operator-64fb475fcb-zvpfp               1/1     Running     0          3h3m
rook-ceph-osd-0-596ffcbb78-66dbm                  2/2     Running     0          3h
rook-ceph-osd-1-7f5fdcff4b-vqch8                  2/2     Running     0          3h
rook-ceph-osd-2-5896657d68-b7r64                  2/2     Running     0          3h
rook-ceph-osd-3-784f998f77-4pfl5                  2/2     Running     0          3h
rook-ceph-osd-4-d75df46df-9fxlt                   2/2     Running     0          3h
rook-ceph-osd-5-6c84469594-pkfhb                  2/2     Running     0          3h
rook-ceph-osd-prepare-node4-779cg                 0/1     Completed   0          3h
rook-ceph-osd-prepare-node5-mh8ds                 0/1     Completed   0          3h
rook-ceph-osd-prepare-node6-mcfww                 0/1     Completed   0          3h
  1. 配置ceph dashboard
[root@node3 examples]# kubectl create -f dashboard-external-http.yaml
service/rook-ceph-mgr-dashboard-external-http created
[root@node3 examples]# kubectl -n rook-ceph get service
NAME                                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
rook-ceph-mgr                           ClusterIP   10.98.189.149   <none>        9283/TCP            33m
rook-ceph-mgr-dashboard                 ClusterIP   10.107.23.234   <none>        7000/TCP            33m
rook-ceph-mgr-dashboard-external-http   NodePort    10.103.119.23   <none>        7000:32331/TCP      6s
rook-ceph-mon-a                         ClusterIP   10.101.25.118   <none>        6789/TCP,3300/TCP   34m
rook-ceph-mon-b                         ClusterIP   10.101.22.52    <none>        6789/TCP,3300/TCP   34m
rook-ceph-mon-c                         ClusterIP   10.110.72.46    <none>        6789/TCP,3300/TCP   33m

然后最悲剧的事情发生了,发现访问的时候,出现了303转发,转发的地址正好是service后端对应的pod。 1111.png 使用curl:

root@yong:/home/cyxinda/download/rook-1.10.5/deploy/examples# curl -vvv http://172.70.10.185:32331/
*   Trying 172.70.10.185:32331...
* TCP_NODELAY set
* Connected to 172.70.10.185 (172.70.10.185) port 32331 (#0)
> GET / HTTP/1.1
> Host: 172.70.10.185:32331
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 303 See Other
< Content-Type: text/html;charset=utf-8
< Server: Ceph-Dashboard
< Date: Wed, 16 Nov 2022 07:22:47 GMT
< Content-Security-Policy: frame-ancestors 'self';
< X-Content-Type-Options: nosniff
< Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
< Location: http://10.244.139.6:7000/
< Vary: Accept-Encoding
< Content-Length: 96
< 
* Connection #0 to host 172.70.10.185 left intact
This resource can be found at < a href="http://10.244.139.6:7000/">http://10.244.139.6:7000/</ a>.

按说k8s服务都是反向代理的,怎么会出现303重定向呢?
然后Google了好一会,才找到ceph官网上面的几句话: 777.png 翻译一下: 222.png 意思是,当请求到状态处于standby状态的mgr服务时,会自动转发请求到activate状态的mgr服务,自然会出现页面上面的303转发了。 然后使用工具集,可以看到mgr.b的状态: 333.png 当请求通过service无头服务到mgr.b上面的时候,因为mgr.b是standby状态,然后会将请求转发给mgr.a,也就出现在了浏览器上面,浏览器自然不能识别k8s中pod的地址了,所以,就请求不到任何数据了。 然后在dashboard-external-http.yaml内,加入mgr.a特有的label,可以解决该问题 444.png 然后在浏览器上面,便可以访问到ceph的dashboard了 555.png 当然还有可以禁用303重定向,因为已经知道了303出现的原因,也就没有太多必要实验了 888.png 密码可以通过下面的命令获得

[root@node3 examples]# kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
(KW=6{zQg%{n1.dB)U:2
  1. 创建工具集
[root@node3 examples]# kubectl create -f toolbox.yaml
deployment.apps/rook-ceph-tools created
[root@node3 examples]# kubectl get pods -n rook-ceph
NAME                                              READY   STATUS      RESTARTS   AGE
rook-ceph-tools-5679b7d8f-jzbkr                   1/1     Running     0          179m
[root@node1 ~]# kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
bash-4.4$ 
bash-4.4$ ceph osd status
ID  HOST    USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  node4  19.5M   499G      0        0       0        0   exists,up  
 1  node5  20.8M   499G      0        0       0        0   exists,up  
 2  node6  19.5M   499G      0        0       0        0   exists,up  
 3  node4  21.1M   499G      0        0       0        0   exists,up  
 4  node5  19.5M   499G      0        0       0        0   exists,up  
 5  node6  20.8M   499G      0        0       0        0   exists,up  
bash-4.4$ ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    2.9 TiB  2.9 TiB  121 MiB   121 MiB          0
TOTAL  2.9 TiB  2.9 TiB  121 MiB   121 MiB          0
 
--- POOLS ---
POOL  ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr   1    1  449 KiB        2  449 KiB      0    950 GiB
bash-4.4$ ceph -s
  cluster:
    id:     0055fdfc-9741-40e5-b108-87e02574e98b
    health: HEALTH_WARN
            clock skew detected on mon.b
 
  services:
    mon: 3 daemons, quorum a,b,c (age 2h)
    mgr: a(active, since 2h), standbys: b
    osd: 6 osds: 6 up (since 2h), 6 in (since 2h)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   121 MiB used, 2.9 TiB / 2.9 TiB avail
    pgs:     1 active+clean
 
bash-4.4$ ceph osd status
ID  HOST    USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  node4  19.5M   499G      0        0       0        0   exists,up  
 1  node5  20.8M   499G      0        0       0        0   exists,up  
 2  node6  19.5M   499G      0        0       0        0   exists,up  
 3  node4  21.1M   499G      0        0       0        0   exists,up  
 4  node5  19.5M   499G      0        0       0        0   exists,up  
 5  node6  20.8M   499G      0        0       0        0   exists,up  
bash-4.4$ ceph health detail
HEALTH_WARN clock skew detected on mon.b
[WRN] MON_CLOCK_SKEW: clock skew detected on mon.b
    mon.b clock skew 0.158903s > max 0.05s (latency 0.0186243s)

发现mon.b时钟不太正常,有些延迟,导致ceph集群不健康 666.png 然后重新找到mon.b所在的node6节点,发现: 651867873289a957dcfcfc7bf092425.png 1111.png chrony服务的客户端根本没有生效,通过文档查看 然后修改node1上面的chrony server,添加allow的ip段 222.png 然后重启chronyd服务:

[root@node1 rook]# vim /etc/chrony.conf 
 26 #allow 192.168.0.0/16
 27 allow 172.0.0.0/8
[root@node1 rook]# systemctl restart chronyd
[root@node1 rook]# systemctl status chronyd
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
   Active: active (running) since 三 2022-11-16 18:17:39 CST; 5s ago
     Docs: man:chronyd(8)
           man:chrony.conf(5)
  Process: 22003 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 21999 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 22001 (chronyd)
    Tasks: 1
   Memory: 484.0K
   CGroup: /system.slice/chronyd.service
           └─22001 /usr/sbin/chronyd

11月 16 18:17:39 node1 systemd[1]: Starting NTP client/server...
11月 16 18:17:39 node1 chronyd[22001]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
11月 16 18:17:39 node1 chronyd[22001]: Frequency -13.820 +/- 0.026 ppm read from /var/lib/chrony/drift
11月 16 18:17:39 node1 systemd[1]: Started NTP client/server.
11月 16 18:17:44 node1 chronyd[22001]: Selected source 203.107.6.88

然后等待几分钟后,客户端恢复正常

1111.png 然后dashboard页面上面,可以看到ceph集群恢复正常了:

333.png 在tools中,可以看到:

[root@node1 ~]# kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
bash-4.4$ ceph health detail
HEALTH_OK
bash-4.4$ 

关于crash事件

bash-4.4$ ceph health detail
HEALTH_WARN 3 mgr modules have recently crashed
[WRN] RECENT_MGR_MODULE_CRASH: 3 mgr modules have recently crashed
    mgr module nfs crashed in daemon mgr.a on host rook-ceph-mgr-a-744cc86b75-n8x6w at 2022-11-17T05:34:36.364991Z
    mgr module nfs crashed in daemon mgr.a on host rook-ceph-mgr-a-744cc86b75-n8x6w at 2022-11-17T05:34:39.298479Z
    mgr module nfs crashed in daemon mgr.a on host rook-ceph-mgr-a-744cc86b75-wwbk5 at 2022-11-17T06:17:45.972042Z
bash-4.4$ ceph crash ls
ID                                                                ENTITY  NEW  
2022-11-17T05:34:36.364991Z_1257681d-877b-41e1-96fa-6519032b7f2d  mgr.a    *   
2022-11-17T05:34:39.298479Z_bd2244fe-352e-4a12-b7a0-19f31cb94f58  mgr.a    *   
2022-11-17T06:17:45.972042Z_ac3ac293-5c65-496d-b7b1-43be3f0c836e  mgr.a    *   
bash-4.4$ ceph crash info 2022-11-17T06:17:45.972042Z_ac3ac293-5c65-496d-b7b1-43be3f0c836e
{
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n    return available_clusters(self)",
        "  File \"/usr/share/ceph/mgr/nfs/utils.py\", line 38, in available_clusters\n    completion = mgr.describe_service(service_type='nfs')",
        "  File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1479, in inner\n    completion = self._oremote(method_name, args, kwargs)",
        "  File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1546, in _oremote\n    raise NoOrchestrator()",
        "orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)"
    ],
    "ceph_version": "17.2.5",
    "crash_id": "2022-11-17T06:17:45.972042Z_ac3ac293-5c65-496d-b7b1-43be3f0c836e",
    "entity_name": "mgr.a",
    "mgr_module": "nfs",
    "mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
    "mgr_python_exception": "NoOrchestrator",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "dad5ab00e8109633a6f99a44e3e3c67aa44aad3613d396e491b3ebd3ae1e9dad",
    "timestamp": "2022-11-17T06:17:45.972042Z",
    "utsname_hostname": "rook-ceph-mgr-a-744cc86b75-wwbk5",
    "utsname_machine": "x86_64",
    "utsname_release": "5.17.6-1.el7.elrepo.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PREEMPT Fri May 6 09:08:57 EDT 2022"
}
bash-4.4$ ceph crash archive-all
##或者
bash-4.4$ ceph crash archive <id>
##可以消除ceph health detail里面的提示ceph集群不健康的信息
bash-4.4$ ceph crash ls
ID                                                                ENTITY  NEW  
2022-11-17T05:34:36.364991Z_1257681d-877b-41e1-96fa-6519032b7f2d  mgr.a        
2022-11-17T05:34:39.298479Z_bd2244fe-352e-4a12-b7a0-19f31cb94f58  mgr.a        
2022-11-17T06:17:45.972042Z_ac3ac293-5c65-496d-b7b1-43be3f0c836e  mgr.a 
bash-4.4$ ceph health detail
HEALTH_OK

关于设置orchestrator

参考

bash-4.4$ ceph orch status
Error ENOENT: No orchestrator configured (try `ceph orch set backend`)
bash-4.4$ ceph mgr module enable rook
module 'rook' is already enabled
bash-4.4$ ceph orch set backend rook
bash-4.4$ ceph orch status
Backend: rook
Available: Yes
bash-4.4$ ceph orch host ls
HOST   ADDR                 LABELS  STATUS  
node1  172.70.10.181/node1                  
node2  172.70.10.182/node2                  
node3  172.70.10.183/node3                  
node4  172.70.10.184/node4                  
node5  172.70.10.185/node5  

使用master节点上面的磁盘

master节点上面,包含有Taints: node-role.kubernetes.io/control-plane:NoSchedule污点

[root@node3 examples]# kubectl describe node/node1 -n rook-ceph
Name:               node1
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node1
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 172.70.10.181/24
                    projectcalico.org/IPv4VXLANTunnelAddr: 10.244.166.128
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 16 Nov 2022 13:59:19 +0800
Taints:             node-role.kubernetes.io/control-plane:NoSchedule

需要在cluster.yaml中添加tolerations

  placement:
    all:
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: "NoSchedule"

之后,添加dev设备:

    nodes:
    - name: "node1"
      devices: # specific devices to use for storage can be specified for each node
      - name: "/dev/vdb"
      - name: "/dev/vdc"
    - name: "node2"
      devices: # specific devices to use for storage can be specified for each node
      - name: "/dev/vdb"
      - name: "/dev/vdc"
    - name: "node3"
      devices: # specific devices to use for storage can be specified for each node
      - name: "/dev/vdb"
      - name: "/dev/vdc"

然后就看到在master节点上面启动的osd等几个pod

[root@node3 examples]# kubectl get pods -n rook-ceph -o wide |grep node1
rook-ceph-crashcollector-node1-8866f454f-m5kht    1/1     Running     0               3h38m   10.244.166.161   node1   <none>           <none>
rook-ceph-osd-6-66fccf8f69-lq7zw                  2/2     Running     0               3h38m   10.244.166.160   node1   <none>           <none>
rook-ceph-osd-8-66c98ddc5c-gnb54                  2/2     Running     0               3h38m   10.244.166.158   node1   <none>           <none>
rook-ceph-osd-prepare-node1-5sjvd                 0/1     Completed   0               6m27s   10.244.166.185   node1   <none>           <none>
[root@node3 examples]# kubectl get pods -n rook-ceph -o wide |grep node2
rook-ceph-crashcollector-node2-78c64b56bb-m44x4   1/1     Running     0               3h36m   10.244.104.10    node2   <none>           <none>
rook-ceph-mgr-a-55ff688b45-pljzc                  3/3     Running     0               3h42m   10.244.104.5     node2   <none>           <none>
rook-ceph-osd-10-5d96f775f9-7lwpg                 2/2     Running     0               3h28m   10.244.104.15    node2   <none>           <none>
rook-ceph-osd-9-777c866497-74rll                  2/2     Running     0               3h36m   10.244.104.9     node2   <none>           <none>
rook-ceph-osd-prepare-node2-c886s                 0/1     Completed   0               6m35s   10.244.104.25    node2   <none>           <none>
[root@node3 examples]# kubectl get pods -n rook-ceph -o wide |grep node3
rook-ceph-crashcollector-node3-65569859c9-9rdlx   1/1     Running     0               3h38m   10.244.135.42    node3   <none>           <none>
rook-ceph-mgr-b-795cbf984d-rt2k6                  3/3     Running     0               3h42m   10.244.135.37    node3   <none>           <none>
rook-ceph-osd-11-747d4cb964-7jsxt                 2/2     Running     0               162m    10.244.135.51    node3   <none>           <none>
rook-ceph-osd-7-58d996b97-kgtk8                   2/2     Running     0               88m     10.244.135.56    node3   <none>           <none>
rook-ceph-osd-prepare-node3-7mf55                 0/1     Completed   0               6m31s   10.244.135.57    node3   <none>           <none>

关于osd不能正常加入到rook-ceph

如下错误

Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 9 --monmap /var/lib/ceph/osd/ceph-9/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-9/ --osd-uuid 672afdb9-ffa6-4b42-85d1-7ae476bc6d6c --setuser ceph --setgroup ceph
 stderr: 2022-11-18T10:56:45.103+0000 7fb67e97d3c0 -1 bluestore(/var/lib/ceph/osd/ceph-9/) _read_fsid unparsable uuid
 stderr: 2022-11-18T10:56:45.214+0000 7fb67e97d3c0 -1 bluefs _replay 0x0: stop: uuid 00000000-0000-0000-0000-000000000000 != super.uuid 592cee20-ba67-4e4b-bb10-884b0678491f, block dump:
 stderr: 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 stderr: *
 stderr: 00000ff0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 stderr: 00001000
 stderr: 2022-11-18T10:56:49.986+0000 7fb67e97d3c0 -1 rocksdb: verify_sharding unable to list column families: NotFound:
 stderr: 2022-11-18T10:56:49.986+0000 7fb67e97d3c0 -1 bluestore(/var/lib/ceph/osd/ceph-9/) _open_db erroring opening db:
 stderr: 2022-11-18T10:56:50.407+0000 7fb67e97d3c0 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
 stderr: 2022-11-18T10:56:50.407+0000 7fb67e97d3c0 -1  ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-9/: (5) Input/output error
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.9 --yes-i-really-mean-it
 stderr: purged osd.9
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 91, in safe_prepare
    self.prepare()
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 134, in prepare
    tmpfs,
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 68, in prepare_bluestore
    db=db
  File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 484, in osd_mkfs_bluestore
    raise RuntimeError('Command failed with exit code %!s(MISSING): %!s(MISSING)' %!((MISSING)returncode, ' '.join(command)))
RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 9 --monmap /var/lib/ceph/osd/ceph-9/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-9/ --osd-uuid 672afdb9-ffa6-4b42-85d1-7ae476bc6d6c --setuser ceph --setgroup ceph

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/sbin/ceph-volume", line 11, in <module>
    load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
  File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__
    self.main(self.argv)
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 169, in main
    self.safe_prepare(self.args)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 95, in safe_prepare
    rollback_osd(self.args, self.osd_id)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
    Zap(['--destroy', '--osd-id', osd_id]).main()
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 404, in main
    self.zap_osd()
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 301, in zap_osd
    devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
    '%!s(MISSING)' %!o(MISSING)sd_id or osd_fsid)
RuntimeError: Unable to find any LV for zapping OSD: 9: exit status 1}
2022-11-18 10:59:12.259144 I | ceph-cluster-controller: reconciling ceph cluster in namespace "rook-ceph"

该错误一般是需要清理一下osd对应的dev设备的 先在cluster.yaml中,将该osd设备摘除 然后 需要执行如下清理过程:
参考1:stackoverflow.com/questions/5…
参考2:zhuanlan.zhihu.com/p/140486398

#标记出osd
bash-4.4$ ceph osd out osd.7
marked out osd.7. 
#从crush map中删除osd
bash-4.4$ ceph osd crush remove osd.7
device 'osd.7' does not appear in the crush map
#删除caps
bash-4.4$ ceph auth del osd.7
updated
#移除osd
bash-4.4$ ceph osd rm osd.7
removed osd.7        

然后

## 格式化块设备
[root@node3 mapper]# mkfs.ext4 /dev/vdc
mke2fs 1.42.9 (28-Dec-2013)
文件系统标签=
OS type: Linux
块大小=4096 (log=2)
分块大小=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
32768000 inodes, 131072000 blocks
6553600 blocks (5.00%) reserved for the super user
第一个数据块=0
Maximum filesystem blocks=2279604224
4000 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
        102400000
Allocating group tables: 完成                            
正在写入inode表: 完成                            
Creating journal (32768 blocks): 完成
Writing superblocks and filesystem accounting information:          
完成
##磁盘清理过程
[root@node3 ~]# DISK="/dev/vdc"
[root@node3 ~]# if [ ! -z $1 ];then
> DISK=$1
> fi
[root@node3 ~]# # 磁盘去格式化
[root@node3 ~]# # DISK="/dev/vdc"
[root@node3 ~]# # Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
[root@node3 ~]# # You will have to run this step for all disks.
[root@node3 ~]# sgdisk --zap-all $DISK
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
[root@node3 ~]# dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
记录了100+0 的读入
记录了100+0 的写出
104857600字节(105 MB)已复制,16.5448 秒,6.3 MB/秒
[root@node3 ~]# partprobe $DISK
[root@node3 ~]# kubectl get pods -n rook-ceph -o wide|grep node3
rook-ceph-crashcollector-node3-65569859c9-9rdlx   1/1     Running            0               128m    10.244.135.42    node3   <none>           <none>
rook-ceph-mgr-b-795cbf984d-rt2k6                  3/3     Running            0               132m    10.244.135.37    node3   <none>           <none>
rook-ceph-osd-11-747d4cb964-7jsxt                 2/2     Running            0               72m     10.244.135.51    node3   <none>           <none>
rook-ceph-osd-7-58d996b97-vshvz                   1/2     CrashLoopBackOff   28 (99s ago)    128m    10.244.135.41    node3   <none>           <none>
rook-ceph-osd-prepare-node3-zzhwp                 0/1     Completed          0               49m     10.244.135.55    node3   <none>           <none>
[root@node3 ~]# kubectl -n rook-ceph logs pod/rook-ceph-osd-7-58d996b97-vshvz 
Defaulted container "osd" out of: osd, log-collector, activate (init), chown-container-data-dir (init)
debug 2022-11-21T04:27:48.818+0000 7fc114dcd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
debug 2022-11-21T04:27:48.819+0000 7fc1145cc700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
debug 2022-11-21T04:27:48.819+0000 7fc1155ce700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
failed to fetch mon config (--no-mon-config to skip)
[root@node3 ~]# kubectl -n rook-ceph delete pod/rook-ceph-osd-7-58d996b97-vshvz 
pod "rook-ceph-osd-7-58d996b97-vshvz" deleted
[root@node3 ~]# kubectl get pods -n rook-ceph -o wide|grep node3
rook-ceph-crashcollector-node3-65569859c9-9rdlx   1/1     Running     0               130m    10.244.135.42    node3   <none>           <none>
rook-ceph-mgr-b-795cbf984d-rt2k6                  3/3     Running     0               134m    10.244.135.37    node3   <none>           <none>
rook-ceph-osd-11-747d4cb964-7jsxt                 2/2     Running     0               74m     10.244.135.51    node3   <none>           <none>
rook-ceph-osd-7-58d996b97-kgtk8                   1/2     Running     0               11s     10.244.135.56    node3   <none>           <none>
rook-ceph-osd-prepare-node3-zzhwp                 0/1     Completed   0               52m     10.244.135.55    node3   <none>           <none>
[root@node3 examples]# kubectl get pods -n rook-ceph -o wide|grep node3
rook-ceph-crashcollector-node3-65569859c9-9rdlx   1/1     Running     0               3h25m   10.244.135.42    node3   <none>           <none>
rook-ceph-mgr-b-795cbf984d-rt2k6                  3/3     Running     0               3h29m   10.244.135.37    node3   <none>           <none>
rook-ceph-osd-11-747d4cb964-7jsxt                 2/2     Running     0               149m    10.244.135.51    node3   <none>           <none>
rook-ceph-osd-7-58d996b97-kgtk8                   2/2     Running     0               75m     10.244.135.56    node3   <none>           <none>
rook-ceph-osd-prepare-node3-zzhwp                 0/1     Completed   0               127m    10.244.135.55    node3   <none>           <none>

清理详细过程:
参考:github.com/rook/rook/i…

#!/usr/bin/env bash
DISK="/dev/sda"

# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)

# You will have to run this step for all disks.
sgdisk --zap-all $DISK

# Clean hdds with dd
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync

# Clean disks such as ssd with blkdiscard instead of dd
blkdiscard $DISK

# These steps only have to be run once on each node
# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %

# ceph-volume setup can leave ceph-<UUID> directories in /dev and /dev/mapper (unnecessary clutter)
rm -rf /dev/ceph-*
rm -rf /dev/mapper/ceph--*

# Inform the OS of partition table changes
partprobe $DISK

集群出现MGR_MODULE_ERROR错误

dashboard页面出现 MGR_MODULE_ERROR Module 'rook' has failed

bash-4.4$ ceph health detail
HEALTH_ERR Module 'rook' has failed: ({'type': 'ERROR', 'object': {'api_version': 'v1',
 'kind': 'Status',
 'metadata': {'annotations': None,
              'cluster_name': None,
              'creation_timestamp': None,
              'deletion_grace_period_seconds': None,
              'deletion_timestamp': None,
              'finalizers': None,
              'generate_name': None,
              'generation': None,
              'initializers': None,
              'labels': None,
              'managed_fields': None,
              'name': None,
              'namespace': None,
              'owner_references': None,
              'resource_version': None,
              'self_link': None,
              'uid': None},
 'spec': None,
 'status': {'addresses': None,
            'allocatable': None,
            'capacity': None,
            'conditions': None,
            'config': None,
            'daemon_endpoints': None,
            'images': None,
            'node_info': None,
            'phase': None,
            'volumes_attached': None,
            'volumes_in_use': None}}, 'raw_object': {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 1909844 (1909877)', 'reason': 'Expired', 'code': 410}})
Reason: None

可以使用重启mgr服务和禁用mgr服务,如果MGR服务没有问题,错误信息也将被清理掉:

kubectl -n rook-ceph exec -it rook-ceph-tools -- ceph status     # reports HEALTH_ERR
kubectl -n rook-ceph exec -it rook-ceph-tools -- ceph mgr module disable prometheus
kubectl -n rook-ceph exec -it rook-ceph-tools -- ceph mgr module disable dashboard
kubectl -n rook-ceph exec -it rook-ceph-tools -- ceph status     # reports HEALTH_OK

关于镜像:

[root@node3 examples]# cat images.txt 
 quay.io/ceph/ceph:v17.2.5
 quay.io/cephcsi/cephcsi:v3.7.2
 quay.io/csiaddons/k8s-sidecar:v0.5.0
 registry.k8s.io/sig-storage/csi-attacher:v4.0.0
 registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1
 registry.k8s.io/sig-storage/csi-provisioner:v3.3.0
 registry.k8s.io/sig-storage/csi-resizer:v1.6.0
 registry.k8s.io/sig-storage/csi-snapshotter:v6.1.0
 rook/ceph:v1.10.5
[root@node3 examples]# pwd
/root/rook-1.10.5/deploy/examples
[root@node3 examples]# 

可以将镜像使用代理,下载到本地,打包后,在集群节点中导入即可

常用命令

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l \
"app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- bin/bash
ceph osd set noout
ceph osd set nobackfill
ceph osd set norebalance
ceph osd set norecover
exit

ceph osd unset
ceph osd unset noout
ceph osd unset nobackfill
ceph osd unset norebalance
ceph osd unset norecover

关于rook清理

参见: 官网清理文档

[root@node1 rbd]# kubectl delete -n rook-ceph cephblockpool rbdpool
cephblockpool.ceph.rook.io "rbdpool" deleted
[root@node1 rbd]# kubectl get sc
NAME                        PROVISIONER                  RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block (default)   rook-ceph.rbd.csi.ceph.com   Delete          Immediate           true                   27h
[root@node1 rbd]# kubectl delete storageclass  rook-ceph-block
storageclass.storage.k8s.io "rook-ceph-block" deleted

[root@node1 examples]# kubectl -n rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}'
cephcluster.ceph.rook.io/rook-ceph patched
[root@node1 examples]# 



kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo


$ kubectl -n rook-ceph get service
NAME                                     TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)             AGE
rook-ceph-mgr                            ClusterIP      172.30.11.40     <none>                                                                    9283/TCP            4h
rook-ceph-mgr-dashboard                  ClusterIP      172.30.203.185   <none>                                                                    8443/TCP            4h
rook-ceph-mgr-dashboard-loadbalancer     LoadBalancer   172.30.27.242    a7f23e8e2839511e9b7a5122b08f2038-1251669398.us-east-1.elb.amazonaws.com   8443:32747/TCP      4h


https://rook.io/docs/rook/v1.10/Getting-Started/ceph-teardown/#delete-the-block-and-file-artifacts
[root@node1 examples]# kubectl delete -f operator.yaml
configmap "rook-ceph-operator-config" deleted
deployment.apps "rook-ceph-operator" deleted
[root@node1 examples]# kubectl delete -f common.yaml
namespace "rook-ceph" deleted
clusterrole.rbac.authorization.k8s.io "cephfs-csi-nodeplugin" deleted
clusterrole.rbac.authorization.k8s.io "cephfs-external-provisioner-runner" deleted
clusterrole.rbac.authorization.k8s.io "rbd-csi-nodeplugin" deleted
clusterrole.rbac.authorization.k8s.io "rbd-external-provisioner-runner" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-cluster-mgmt" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-global" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-mgr-cluster" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-mgr-system" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-object-bucket" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-osd" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-system" deleted
clusterrolebinding.rbac.authorization.k8s.io "cephfs-csi-provisioner-role" deleted
clusterrolebinding.rbac.authorization.k8s.io "rbd-csi-nodeplugin" deleted
clusterrolebinding.rbac.authorization.k8s.io "rbd-csi-provisioner-role" deleted
clusterrolebinding.rbac.authorization.k8s.io "rook-ceph-global" deleted
clusterrolebinding.rbac.authorization.k8s.io "rook-ceph-mgr-cluster" deleted
clusterrolebinding.rbac.authorization.k8s.io "rook-ceph-object-bucket" deleted
clusterrolebinding.rbac.authorization.k8s.io "rook-ceph-osd" deleted
clusterrolebinding.rbac.authorization.k8s.io "rook-ceph-system" deleted
role.rbac.authorization.k8s.io "cephfs-external-provisioner-cfg" deleted
role.rbac.authorization.k8s.io "rbd-csi-nodeplugin" deleted
role.rbac.authorization.k8s.io "rbd-external-provisioner-cfg" deleted
role.rbac.authorization.k8s.io "rook-ceph-cmd-reporter" deleted
role.rbac.authorization.k8s.io "rook-ceph-mgr" deleted
role.rbac.authorization.k8s.io "rook-ceph-osd" deleted
role.rbac.authorization.k8s.io "rook-ceph-purge-osd" deleted
role.rbac.authorization.k8s.io "rook-ceph-rgw" deleted
role.rbac.authorization.k8s.io "rook-ceph-system" deleted
rolebinding.rbac.authorization.k8s.io "cephfs-csi-provisioner-role-cfg" deleted
rolebinding.rbac.authorization.k8s.io "rbd-csi-nodeplugin-role-cfg" deleted
rolebinding.rbac.authorization.k8s.io "rbd-csi-provisioner-role-cfg" deleted
rolebinding.rbac.authorization.k8s.io "rook-ceph-cluster-mgmt" deleted
rolebinding.rbac.authorization.k8s.io "rook-ceph-cmd-reporter" deleted
rolebinding.rbac.authorization.k8s.io "rook-ceph-mgr" deleted
rolebinding.rbac.authorization.k8s.io "rook-ceph-mgr-system" deleted
rolebinding.rbac.authorization.k8s.io "rook-ceph-osd" deleted
rolebinding.rbac.authorization.k8s.io "rook-ceph-purge-osd" deleted
rolebinding.rbac.authorization.k8s.io "rook-ceph-rgw" deleted
rolebinding.rbac.authorization.k8s.io "rook-ceph-system" deleted
serviceaccount "rook-ceph-cmd-reporter" deleted
serviceaccount "rook-ceph-mgr" deleted
serviceaccount "rook-ceph-osd" deleted
serviceaccount "rook-ceph-purge-osd" deleted
serviceaccount "rook-ceph-rgw" deleted
serviceaccount "rook-ceph-system" deleted
serviceaccount "rook-csi-cephfs-plugin-sa" deleted
serviceaccount "rook-csi-cephfs-provisioner-sa" deleted
serviceaccount "rook-csi-rbd-plugin-sa" deleted
serviceaccount "rook-csi-rbd-provisioner-sa" deleted

[root@node1 examples]# kubectl delete -f psp.yaml
Error from server (NotFound): error when deleting "psp.yaml": clusterroles.rbac.authorization.k8s.io "psp:rook" not found
Error from server (NotFound): error when deleting "psp.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-system-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-cephfs-plugin-sa-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-cephfs-provisioner-sa-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-rbd-plugin-sa-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-rbd-provisioner-sa-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": rolebindings.rbac.authorization.k8s.io "rook-ceph-cmd-reporter-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": rolebindings.rbac.authorization.k8s.io "rook-ceph-default-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": rolebindings.rbac.authorization.k8s.io "rook-ceph-mgr-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": rolebindings.rbac.authorization.k8s.io "rook-ceph-osd-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": rolebindings.rbac.authorization.k8s.io "rook-ceph-purge-osd-psp" not found
Error from server (NotFound): error when deleting "psp.yaml": rolebindings.rbac.authorization.k8s.io "rook-ceph-rgw-psp" not found
resource mapping not found for name: "00-rook-privileged" namespace: "" from "psp.yaml": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
ensure CRDs are installed first
[root@node1 examples]# kubectl delete -f crds.yaml
customresourcedefinition.apiextensions.k8s.io "cephblockpoolradosnamespaces.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephblockpools.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephbucketnotifications.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephbuckettopics.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephclients.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephclusters.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystemmirrors.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystems.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystemsubvolumegroups.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephnfses.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectrealms.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectstores.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectstoreusers.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectzonegroups.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectzones.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephrbdmirrors.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "objectbucketclaims.objectbucket.io" deleted
customresourcedefinition.apiextensions.k8s.io "objectbuckets.objectbucket.io" deleted
[root@node1 examples]# 
[root@node1 examples]# kubectl get ns

添加osd过程中,总有添加失败的问题,最好将node的所有taints都去掉的好,防止pod调度问题。。。。。

设置其他限制

docs.mirantis.com/container-c…

ceph 磁盘和osd对应关系:ceph device ls|sort -k2