背景
最近在研究搭建生产 Kubenetes 集群的问题,如何搭建 CSI,为整个集群提供高效可用的存储供应,是一个核心问题,根据了解决定尝试使用 Rook+Ceph 的方案来实践,满足如下几个目标:
- 为有状态中间件(MySQL、Redis、MQ 等)提供持久化卷的供应;
- 存储附件等基本文件的存储;
为了验证这个方案的可行性,需要对 Rook 和 Ceph 的基本特性有所了解,关注如下几个目标:
- 功能:如何新搭建一个 Ceph 集群,并在集群中提供给其他 Pod 使用;
- 性能:以 MySQL 为例,Ceph 引入的网络延时,对 SQL 的性能影响是否可以接受;
- 高可用:分布式带来的容量超售、多节点容错能力,节点异常对业务连续性的测试;
- 备份与数据安全:在集群被摧毁的情况下,如何手动恢复数据,输出应急处置方案;
那么后针对这些能力在开发环境中进行验证,使用环境及软件版本参照:
《ubuntu 22.04 开发环境 Kubernetes 集群搭建》
开发环境的 Ubuntu 22.04 建立在如下的一个 Win10 + VMware Workstation 16 的虚拟机中。
主要操作流程参照 Rook 官方文档:
一、Rook 的搭建
1.1. 准备工作
看官方文档的[准备工作]rook.github.io/docs/rook/l…,是需要提供几块可用的盘的,那么我先在 VMWare 中添加三块 10GB 的硬盘:
以上的配置搞三个,分别是 ceph-0 / ceph-1 / ceph-2。
于是多出来三个盘,确定。
重启 ubuntu,fdisk -l
可以看到已经识别出来 sdb / sdc /sdd
三个裸盘,然后参照准备工作文档,执行 lsblk -f
也可以看到这三个裸盘,符合要求。
然后安装 lvm2:
sudo apt-get install -y lvm2
检查 Linux 内核是否支持 RDB 模块,运行 sudo modprobe rbd
没有显示 Not Found 即可。
1.2. 安装
1.2.1. 预下载镜像
需要的镜像清单,由于网络原因,需要提前下载好:
docker.io/rook/ceph:v1.13.1
quay.io/ceph/ceph:v18.2.1
quay.io/cephcsi/cephcsi:v3.10.1
registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.1
registry.k8s.io/sig-storage/csi-resizer:v1.9.2
registry.k8s.io/sig-storage/csi-attacher:v4.4.2
registry.k8s.io/sig-storage/csi-snapshotter:v6.3.2
registry.k8s.io/sig-storage/csi-provisioner:v3.6.2
下载的方法,找个外网代理拉,例如:
# 如果 containerd 当前用户没有权限,用 root 权限
https_proxy=socks5://192.168.154.1:1080 ctr -n=k8s.io i pull registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.1
拉到之后,为了后面生产用起来方便,一定要留个种,将这些镜像导出为二进制文件,存好:
ctr -n=k8s.io i export ceph:v1.13.1.img docker.io/rook/ceph:v1.13.1
ctr -n=k8s.io i export ceph:v18.2.1.img quay.io/ceph/ceph:v18.2.1
ctr -n=k8s.io i export cephcsi:v3.10.1.img quay.io/cephcsi/cephcsi:v3.10.1
ctr -n=k8s.io i export csi-node-driver-registrar:v2.9.1.img registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.1
ctr -n=k8s.io i export csi-resizer:v1.9.2.img registry.k8s.io/sig-storage/csi-resizer:v1.9.2
ctr -n=k8s.io i export csi-attacher:v4.4.2.img registry.k8s.io/sig-storage/csi-attacher:v4.4.2
ctr -n=k8s.io i export csi-snapshotter:v6.3.2.img registry.k8s.io/sig-storage/csi-snapshotter:v6.3.2
ctr -n=k8s.io i export csi-provisioner:v3.6.2.img registry.k8s.io/sig-storage/csi-provisioner:v3.6.2
然后日后在生产集群就可以离线把这些镜像整进去了:
ctr -n=k8s.io i import ceph:v1.13.1.img
ctr -n=k8s.io i import ceph:v18.2.1.img
ctr -n=k8s.io i import cephcsi:v3.10.1.img
ctr -n=k8s.io i import csi-node-driver-registrar:v2.9.1.img
ctr -n=k8s.io i import csi-resizer:v1.9.2.img
ctr -n=k8s.io i import csi-attacher:v4.4.2.img
ctr -n=k8s.io i import csi-snapshotter:v6.3.2.img
ctr -n=k8s.io i import csi-provisioner:v3.6.2.img
1.2.2. 执行安装
注意版本和 tag 是对应的,不要乱改,否则镜像版本可能也要对应修改(但是如果想搞最新的也行,用同样方法看官网的最新说明即可)。
$ git clone --single-branch --branch v1.13.1 https://github.com/rook/rook.git
cd rook/deploy/examples
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
kubectl create -f cluster.yaml
以上应用 cluster.yaml
的时候,国内环境测试 registry.k8s.io 下面的一堆镜像都是拉不到的,按照上卖弄的方式预先导入到 containerd 中。
cd deploy/examples
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# verify the rook-ceph-operator is in the `Running` state before proceeding
kubectl -n rook-ceph get pod
观测执行之后创建的 pod 清单,输出如下:
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-7jq8m 2/2 Running 0 33m
csi-cephfsplugin-provisioner-cc86b75b8-bmccf 5/5 Running 2 (33m ago) 33m
csi-rbdplugin-ngf97 2/2 Running 0 33m
csi-rbdplugin-provisioner-5bc465bbcc-mfl7s 5/5 Running 0 33m
rook-ceph-operator-5d9455bdcd-vz8j4 1/1 Running 0 3h1m
与官网相比较,缺少了 rook-ceph-mon
,rook-ceph-mgr
,rook-ceph-osd
三类 pod,官网提示参照common issues 进行处理,猜测是因为开发机现在只有一个 master node,包不住所导致。
另外官网说了保证集群健康的三个条件,下一步的任务应该是解决这个问题:
- 所有 mons 监控节点 数量要够;
- 要有活跃的 mgr 管理节点,在 active 状态;
- 至少有三个 osd 节点拉起并加入 (
up
andin
);
参照这一节:rook.github.io/docs/rook/l…
1.2.3. 创建 worker 节点
现在只有一个 master 节点显然是无法满足这个状态的,因此我们另外部署了三个 worker 节点,并且把刚开始建立的三个裸盘从 master 节点中剥离,在分别挂载到三个 worker 节点中。
部署步骤参照:juejin.cn/post/732047…
拉起之后,解决各类基本问题,我们相当于有一个 master 节点,三个 worker 节点。
确认大部分的 Pod 都已经拉起来了,大致是这么个状况:
$ k get po -n=rook-ceph -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-7jq8m 2/2 Running 2 (9h ago) 2d16h 192.168.154.128 alfred-pc <none> <none>
csi-cephfsplugin-db4n5 2/2 Running 4 (8h ago) 9h 192.168.154.131 k8s-wk1 <none> <none>
csi-cephfsplugin-gxbqp 2/2 Running 4 (8h ago) 37h 192.168.154.130 k8s-wk0 <none> <none>
csi-cephfsplugin-provisioner-cc86b75b8-b4s2h 5/5 Running 2 (7h56m ago) 8h 192.168.74.149 k8s-wk1 <none> <none>
csi-cephfsplugin-provisioner-cc86b75b8-bmccf 5/5 Running 7 (9h ago) 2d16h 192.168.179.244 alfred-pc <none> <none>
csi-cephfsplugin-qwnt6 2/2 Running 6 (8h ago) 9h 192.168.154.132 k8s-wk2 <none> <none>
csi-rbdplugin-5rqtw 2/2 Running 4 (8h ago) 37h 192.168.154.130 k8s-wk0 <none> <none>
csi-rbdplugin-lvglc 2/2 Running 6 (8h ago) 9h 192.168.154.132 k8s-wk2 <none> <none>
csi-rbdplugin-ngf97 2/2 Running 2 (9h ago) 2d16h 192.168.154.128 alfred-pc <none> <none>
csi-rbdplugin-provisioner-5bc465bbcc-hw7sh 5/5 Running 1 (7h56m ago) 8h 192.168.74.147 k8s-wk1 <none> <none>
csi-rbdplugin-provisioner-5bc465bbcc-mfl7s 5/5 Running 5 (9h ago) 2d16h 192.168.179.247 alfred-pc <none> <none>
csi-rbdplugin-rj5ng 2/2 Running 4 (8h ago) 9h 192.168.154.131 k8s-wk1 <none> <none>
rook-ceph-crashcollector-alfred-pc-5477c9b8f7-vfn8d 1/1 Running 0 9h 192.168.179.255 alfred-pc <none> <none>
rook-ceph-crashcollector-k8s-wk0-78c58c869b-2rdmx 1/1 Running 0 8h 192.168.129.213 k8s-wk0 <none> <none>
rook-ceph-crashcollector-k8s-wk1-7b4859f6cc-64nr7 1/1 Running 0 8h 192.168.74.145 k8s-wk1 <none> <none>
rook-ceph-crashcollector-k8s-wk2-79c56475c8-vjftb 1/1 Running 0 8h 192.168.36.11 k8s-wk2 <none> <none>
rook-ceph-exporter-alfred-pc-cb476c5d5-nl9sv 1/1 Running 0 9h 192.168.179.199 alfred-pc <none> <none>
rook-ceph-exporter-k8s-wk0-c6787bc7b-n4w5v 1/1 Running 0 8h 192.168.129.212 k8s-wk0 <none> <none>
rook-ceph-exporter-k8s-wk1-6c4868c665-w9zmw 1/1 Running 1 (7h35m ago) 8h 192.168.74.146 k8s-wk1 <none> <none>
rook-ceph-exporter-k8s-wk2-dbb57f6f8-ch7w5 1/1 Running 0 8h 192.168.36.10 k8s-wk2 <none> <none>
rook-ceph-mgr-a-78d945bf94-v9z9l 3/3 Running 0 9h 192.168.179.254 alfred-pc <none> <none>
rook-ceph-mgr-b-6d979dbd4f-979rd 2/3 CrashLoopBackOff 8 (7h35m ago) 5h33m 192.168.36.19 k8s-wk2 <none> <none>
rook-ceph-mon-a-6c8d6bb8b6-qbhbg 2/2 Running 0 9h 192.168.179.251 alfred-pc <none> <none>
rook-ceph-mon-c-75645448db-vnlkb 2/2 Running 0 8h 192.168.74.144 k8s-wk1 <none> <none>
rook-ceph-mon-d-5578547686-fn9jd 2/2 Running 1 (8h ago) 8h 192.168.36.13 k8s-wk2 <none> <none>
rook-ceph-operator-5d9455bdcd-jtg2x 1/1 Running 1 (9h ago) 37h 192.168.179.246 alfred-pc <none> <none>
rook-ceph-osd-0-7456b4f497-r67sl 1/2 CrashLoopBackOff 21 (7h35m ago) 8h 192.168.36.12 k8s-wk2 <none> <none>
rook-ceph-osd-1-785d897f79-k87tw 1/2 CrashLoopBackOff 16 (7h31m ago) 8h 192.168.74.148 k8s-wk1 <none> <none>
rook-ceph-osd-2-6f6dd79b78-sx2g9 1/2 CrashLoopBackOff 16 (7h35m ago) 8h 192.168.129.211 k8s-wk0 <none> <none>
rook-ceph-osd-prepare-alfred-pc-ncsp6 0/1 Completed 0 8h 192.168.179.201 alfred-pc <none> <none>
rook-ceph-osd-prepare-k8s-wk0-hhg94 0/1 Completed 0 8h 192.168.129.215 k8s-wk0 <none> <none>
rook-ceph-osd-prepare-k8s-wk1-h9jkl 0/1 Completed 0 8h 192.168.74.151 k8s-wk1 <none> <none>
rook-ceph-osd-prepare-k8s-wk2-krxmh 0/1 Completed 0 8h 192.168.36.14 k8s-wk2 <none> <none>
对照三个健康条件:
第一个:mon 节点要够
可以观察到 mon-a
, mon-c
, mon-d
是好的,够。
至于为啥没有 mon-b
,我们再观测一下 mon-a
, mon-c
, mon-d
这三个去了哪个 node:
发现 a, c, d 分别在 master,wk1 和 wk2,wk0 上面并没有调度到 mon 节点,姑且假设三个就够了所以没拉起。
第二个:要有 mgr 节点
可以看到有两个 mgr 节点,一个是 mgr-a
(在 master),一个是 b (在 wk2),但是 b 是 CrashLoopBackOff
状态,其实也算满足。
不过还是看看 mgr-b 为何挂了:k describe po -n=rook-ceph rook-ceph-mgr-b-6d979dbd4f-979rd
从 Event 里面看到,真正产生问题的在于:
Warning Unhealthy 7h44m (x2 over 7h56m) kubelet Startup probe failed: command "env -i sh -c \noutp=\"$(ceph --admin-daemon /run/ceph/ceph-mgr.b.asok status 2>&1)\"\nrc=$?\nif [ $rc -ne 0 ]; then\n\techo \"ceph daemon health check failed with the following output:\"\n\techo \"$outp\" | sed -e 's/^/> /g'\n\texit $rc\nfi\n" timed out
仔细看了一下 node 的状态,发现出现了一个 OOM,还是内存不够,于是把三个 worker 的内存从 1G 加到了 2G,发现两个 mgr 都好起来了。
第三个:至少三个好的 osd 节点
可以看到,有三个 osd pod:osd-0
、osd-1
和 osd-2
分别调度在了 wk2
、wk1
、wk0
三个节点上。
但是状态都是 CrashLoopBackOff
,也看看有什么问题:k describe po -n=rook-ceph rook-ceph-osd-1-785d897f79-k87tw
Warning BackOff 7h29m (x370 over 8h) kubelet Back-off restarting failed container osd in pod rook-ceph-osd-1-785d897f79-k87tw_rook-ceph(a7de3be1-d118-4919-a547-8ba6c392a494)
看不出来有什么问题,再进 wk0 看一下 kubelet 的状态:systemctl status kubelet -l
,发现问题了:
1月 07 01:33:43 k8s-wk0 kubelet[17787]: I0107 01:33:43.198786 17787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"host-dev\" (UniqueName: \"kubernetes.io/host-path/2d25b699-6c99-4f53-863f-9c61161c831c-host-dev\") pod \"csi-cephfsplugin-gxbqp\" (UID: \"2d25b699-6c99-4f53-863f-9c61161c831c\") " pod="rook-ceph/csi-cephfsplugin-gxbqp"
貌似是卷的挂载验证出了些问题,鉴于刚刚加过内存,重启一下 kubelet 试试 systemctl restart kubelet
,结果再看 kubelet 状态的时候居然就好了:
1月 07 01:33:43 k8s-wk0 kubelet[17787]: I0107 01:33:43.410776 17787 scope.go:117] "RemoveContainer" containerID="ece95d18c5be16767ff35af8f7a120844b126c6b9ec04f28c2ee09dbea54296f"
1月 07 01:33:43 k8s-wk0 kubelet[17787]: I0107 01:33:43.412747 17787 kubelet_resources.go:45] "Allocatable" allocatable={"cpu":"1","ephemeral-storage":"17394Mi","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"1941632Ki","pods":"110"}
1月 07 01:33:43 k8s-wk0 kubelet[17787]: I0107 01:33:43.412857 17787 kubelet_resources.go:45] "Allocatable" allocatable={"cpu":"1","ephemeral-storage":"17394Mi","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"1941632Ki","pods":"110"}
1月 07 01:33:43 k8s-wk0 kubelet[17787]: I0107 01:33:43.412882 17787 kubelet_resources.go:45] "Allocatable" allocatable={"cpu":"1","ephemeral-storage":"17394Mi","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"1941632Ki","pods":"110"}
1月 07 01:33:43 k8s-wk0 kubelet[17787]: I0107 01:33:43.412901 17787 kubelet_resources.go:45] "Allocatable" allocatable={"cpu":"1","ephemeral-storage":"17394Mi","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"1941632Ki","pods":"110"}
然后在 master 上面再看 k get po -owide
的时候发现这个 osd pod 起来了:
rook-ceph rook-ceph-osd-2-6f6dd79b78-sx2g9 2/2 Running 22 (7h36m ago) 9h 192.168.129.211 k8s-wk0 <none> <none>
如法炮制 wk1 和 wk2,最后全部上线。
1.2.4. 调试和管理工具
创建好 ceph 集群之后,我们需要通过 ceph 命令来测试使用,以及调试 ceph 服务。
这需要用到 rook-ceph-tools
,按照下面这个安装。
cd rook/deploy/examples
kubectl create -f toolbox.yaml
deployment.apps/rook-ceph-tools created
$ k get po -A | grep ceph-tools
rook-ceph rook-ceph-tools-66b77b8df5-m2fj8 1/1 Running 0 77s
可以看到多拉起了一个 rook-ceph-tools 的 pod,进去之后就可以用 ceph 命令来对 ceph 集群进行操作了。
我们进入这个 pod:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
在里面可以执行如下 CLI 命令:
ceph status
ceph health detail
ceph osd status
ceph osd df
ceph osd utilization
ceph osd pool stats
ceph osd tree
ceph pg stat
参考:
记录一下这几个命令的样例输出:
1. ceph status
bash-4.4$ ceph status
cluster:
id: 77a08ac4-65a9-406e-b983-3ce1ab8b4ffb
health: HEALTH_WARN
clock skew detected on mon.c, mon.d
Degraded data redundancy: 2/6 objects degraded (33.333%), 1 pg degraded, 1 pg undersized
2 daemons have recently crashed
services:
mon: 3 daemons, quorum a,c,d (age 12m)
mgr: a(active, since 13h), standbys: b
osd: 3 osds: 2 up (since 11m), 2 in (since 19s)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 449 KiB
usage: 55 MiB used, 20 GiB / 20 GiB avail
pgs: 2/6 objects degraded (33.333%)
1 active+undersized+degraded
看出来有警告问题,先字面理解一下再考虑怎么破:
- 时钟倾斜 clock skew detected
- 数据冗余降级 Degraded data redundancy
- 两个 deamon crash 掉了
好家伙,pod 看起来都是 UP 的,还是不健康,所以前面写的三个健康条件只是必要条件,充分条件还是要关注 ceph 命令查看的状态的输出。
时钟倾斜的问题
主要是 ntp 没有同步,我们在节点上 timedatectl status
查看时间同步情况:
$ timedatectl status
Local time: 日 2024-01-07 14:15:34 CST
Universal time: 日 2024-01-07 06:15:34 UTC
RTC time: 日 2024-01-07 06:15:34
Time zone: Asia/Shanghai (CST, +0800)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
以上,master 节点上面这个看起来就是对的。
[root@k8s-wk1 ~]# timedatectl status
Local time: 日 2024-01-07 11:17:28 CST
Universal time: 日 2024-01-07 03:17:28 UTC
RTC time: 日 2024-01-07 05:35:52
Time zone: Asia/Shanghai (CST, +0800)
NTP enabled: yes
NTP synchronized: no
RTC in local TZ: no
DST active: n/a
但看了下几个 wk 节点,发现 wk0,wk2 是好的,唯独 wk1 出现了上面的情况,NTP 有启动,但是并没有同步,而且时间错得有点离谱。
于是安装 ntp 并令其正常工作:
yum install -y ntp ntpdate
systemctl enable ntpd
systemctl stop ntpd
ntpd -gq
hwclock -w
systemctl start ntpd
然后就成功解除警告了。
参考:
- blog.csdn.net/qq_19734597…
- developer.aliyun.com/article/114… (for ubuntu)
- cloud.tencent.com/developer/a… (for centos7)
两个 deamon crash 的问题
是由于最近有一个或多个 Ceph 守护进程崩溃,管理员尚未对该崩溃进行存档(确认)。这可能表示软件错误、硬件问题 (例如,故障磁盘) 或某些其它问题。
解决方式就是查看这些 crash,并且将其手动标记一下,就可以恢复正常。
bash-4.4$ ceph crash ls-new
ID ENTITY NEW
2024-01-06T15:43:06.114309Z_e5f1efcd-6430-4077-af9c-85b47c364f11 client.ceph-exporter *
2024-01-06T17:07:14.789138Z_0e3836e1-aca4-4d41-835e-57c2ef69a028 client.ceph-exporter *
bash-4.4$ ceph crash archive-all
参考:
搞完这个,ceph status
的状态就恢复到 health: HEALTH_OK
了。
2. ceph osd status
bash-4.4$ ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 k8s-wk2 27.6M 9.97G 0 0 0 0 exists,up
1 k8s-wk1 28.1M 9.97G 0 0 0 0 exists,up
2 k8s-wk0 28.0M 9.97G 0 0 0 0 exists,up
看起来三个 10G 的盘都正常起来了。
3. ceph osd df
bash-4.4$ ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 hdd 0.00980 1.00000 10 GiB 28 MiB 1.1 MiB 1 KiB 27 MiB 10 GiB 0.27 1.00 1 up
1 hdd 0.00980 1.00000 10 GiB 28 MiB 1.1 MiB 1 KiB 27 MiB 10 GiB 0.27 1.01 1 up
0 hdd 0.00980 1.00000 10 GiB 28 MiB 1.1 MiB 1 KiB 27 MiB 10 GiB 0.27 0.99 1 up
TOTAL 30 GiB 84 MiB 3.2 MiB 3.5 KiB 81 MiB 30 GiB 0.27
MIN/MAX VAR: 0.99/1.01 STDDEV: 0.00
这个看起来没事,三个盘都认到了。
4. ceph osd utilization
bash-4.4$ ceph osd utilization
avg 1
stddev 0 (expected baseline 0.816497)
min osd.0 with 1 pgs (1 * mean)
max osd.0 with 1 pgs (1 * mean)
可以看出来使用率。
5. ceph osd pool stats
bash-4.4$ ceph osd pool stats
pool .mgr id 1
nothing is going on
6. ceph osd tree
bash-4.4$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.02939 root default
-3 0.00980 host k8s-wk0
2 hdd 0.00980 osd.2 up 1.00000 1.00000
-5 0.00980 host k8s-wk1
1 hdd 0.00980 osd.1 up 1.00000 1.00000
-7 0.00980 host k8s-wk2
0 hdd 0.00980 osd.0 up 1.00000 1.00000
7. ceph pg stat
bash-4.4$ ceph pg stat
1 pgs: 1 active+clean; 449 KiB data, 88 MiB used, 30 GiB / 30 GiB avail
以上,大致可以看得出来 ceph 集群的整体状态信息。
1.2.5. Ceph Dashboard
TODO:
1.2.6. StorageClass
官网给出了一个 StorageClass 的示例 storageclass.yaml
:
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:l
size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
# clusterID is the namespace where the rook cluster is running
clusterID: rook-ceph
# Ceph pool into which the RBD image shall be created
pool: replicapool
# (optional) mapOptions is a comma-separated list of map options.
# For krbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options
# For nbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options
# mapOptions: lock_on_read,queue_depth=1024
# (optional) unmapOptions is a comma-separated list of unmap options.
# For krbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options
# For nbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options
# unmapOptions: force
# RBD image format. Defaults to "2".
imageFormat: "2"
# RBD image features
# Available for imageFormat: "2". Older releases of CSI RBD
# support only the `layering` feature. The Linux kernel (KRBD) supports the
# full complement of features as of 5.4
# `layering` alone corresponds to Ceph's bitfield value of "2" ;
# `layering` + `fast-diff` + `object-map` + `deep-flatten` + `exclusive-lock` together
# correspond to Ceph's OR'd bitfield value of "63". Here we use
# a symbolic, comma-separated format:
# For 5.4 or later kernels:
#imageFeatures: layering,fast-diff,object-map,deep-flatten,exclusive-lock
# For 5.3 or earlier kernels:
imageFeatures: layering
# The secrets contain Ceph admin credentials.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
# in hyperconverged settings where the volume is mounted on the same node as the osds.
csi.storage.k8s.io/fstype: ext4
# Delete the rbd volume when a PVC is deleted
reclaimPolicy: Delete
# Optional, if you want to add dynamic resize for PVC.
# For now only ext3, ext4, xfs resize support provided, like in Kubernetes itself.
allowVolumeExpansion: true
手动编辑,存起来,然后跑:
kubectl create -f storageclass.yaml
完成之后,可以看到 StorageClass 已经被拉起:
$ k get sc -owide
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 73s
也能看到同时被拉起的 replicapool
对象:
$ k get cephblockpool.ceph.rook.io/replicapool -nrook-ceph
NAME PHASE
replicapool Ready
参考:
1.3. 正式应用于 PVC
按理来说,如果 StorageClass
已经有效工作,每个实际声明的 PVC 都会自动分配到 PV 使其工作,做一个示例来测试:
TODO:
参考
- 分布式存储 Ceph 介绍及原理架构分享:www.infoq.cn/article/brj…
- Ceph 运维手册:lihaijing.gitbooks.io/ceph-handbo…
- Ceph 故障排除指南:access.redhat.com/documentati…