一、购置云硬盘

1.1、购置云硬盘以及挂载确认

k8s集群，配置一致，CBS容量均20G。机型配置也一致

挂载至云主机实例
勾选：随实例释放（根据实际情况自由选择）

# 确认磁盘镜像
[root@VM-16-14-centos ~]# fdisk -l
Disk /dev/vda：50 GiB，53687091200 字节，104857600 个扇区
单元：扇区 / 1 * 512 = 512 字节
扇区大小(逻辑/物理)：512 字节 / 512 字节
I/O 大小(最小/最佳)：512 字节 / 512 字节
磁盘标签类型：dos
磁盘标识符：0x89ee0607

设备       启动  起点      末尾      扇区 大小 Id 类型
/dev/vda1  *     2048 104857566 104855519  50G 83 Linux


Disk /dev/vdb：20 GiB，21474836480 字节，41943040 个扇区
单元：扇区 / 1 * 512 = 512 字节
扇区大小(逻辑/物理)：512 字节 / 512 字节
I/O 大小(最小/最佳)：512 字节 / 512 字节

[root@VM-16-14-centos ~]# df -h | grep -v overlay | grep -v '/var/lib/docker/containers/' | grep -v '/var/lib/kubelet/'
文件系统        容量  已用  可用 已用% 挂载点
devtmpfs        1.9G     0  1.9G    0% /dev
tmpfs           1.9G   24K  1.9G    1% /dev/shm
tmpfs           1.9G  3.2M  1.9G    1% /run
tmpfs           1.9G     0  1.9G    0% /sys/fs/cgroup
/dev/vda1        50G   15G   34G   30% /
tmpfs           374M     0  374M    0% /run/user/0

1.2、创建ceph账号以及k8s集群间免密登录

master/node1/node2 CentOS 8.4

1.3、创建账号[master/node1/node2][可选]

k8s集群所有节点均需要

[root@VM-16-14-centos ~]# groupadd -g 3000 ceph
[root@VM-16-14-centos ~]# useradd -u 3000 -g ceph ceph
[root@VM-16-14-centos ~]# echo "ceph" | passwd --stdin ceph
更改用户 ceph 的密码 。
passwd：所有的身份验证令牌已经成功更新。
[root@VM-16-14-centos ~]# echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
ceph ALL = (root) NOPASSWD:ALL
[root@VM-16-14-centos ~]# chmod 0440 /etc/sudoers.d/ceph

1.4、k8s集群间免密登录

从master可以免密至node1/node2

# master执行
[ceph@VM-16-14-centos ~]$ ssh-keygen
[ceph@VM-16-14-centos ~]$ ssh-copy-id ceph@node1
[ceph@VM-16-14-centos ~]$ ssh-copy-id ceph@node2
[ceph@VM-16-14-centos ~]$ ssh-copy-id ceph@master

1.5、安装ceph[master]

本文使用root安装master节点组件，章节1.3、1.4按需可选

方法1： (ansible)

ceph-ansible使用Ansible部署和管理Ceph集群。
（1）ceph-ansible被广泛部署。
（2）ceph-ansible未与Nautlius和Octopus中引入的新的Orchestrator API集成在一起，意味着更新的管理功能和仪表板集成不可用。

方法2：(ceph-deploy)

不再积极维护ceph-deploy。未在Nautilus之前的Ceph版本上进行测试。它不支持RHEL8，CentOS 8或更新的操作系统。

方法3：(cephadm)[推荐]

版本说明：
ceph版本： Octopus / v15.2.1
安装方式： cephadm
系统版本： Centos8.1
python: python3

# 切换至root
[ceph@VM-16-14-centos ~]$ sudo su - root

# 增加源
[root@VM-16-14-centos ~]# cat /etc/redhat-release
CentOS Linux release 8.4.2105
[root@VM-16-14-centos ~]# vim /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-nautilus/el8/x86_64/
gpgcheck=0

[Ceph-noarch]
name=Ceph noarch packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-nautilus/el8/noarch/
gpgcheck=0

[root@VM-16-14-centos ~]# yum install -y  epel-release
[root@VM-16-14-centos ~]# yum makecache
[root@VM-16-14-centos ~]# yum update
[root@VM-16-14-centos ~]# yum search ceph

[root@VM-16-14-centos ~]# yum install ceph ceph-osd ceph-mds ceph-mon ceph-radosgw
[root@VM-16-14-centos ~]# dnf -y install  yum-utils createrepo
[root@VM-16-14-centos ~]# dnf makecache
[root@VM-16-14-centos ~]# dnf repolist

# 重点（要翻墙）
➜  ~ curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm
➜  ~ scp cephadm root@master:/data/
[root@VM-16-14-centos ~]# ll /data/cephadm

[ceph@VM-16-14-centos data]$ ./cephadm add-repo --release octopus
ERROR: cephadm should be run as root

# root帐户，执行cephadm
[root@VM-16-14-centos ~]# cd /data/
[root@VM-16-14-centos data]# ./cephadm add-repo --release octopus
Writing repo to /etc/yum.repos.d/ceph.repo...
Enabling EPEL...

[root@VM-16-14-centos data]# ./cephadm install
Installing packages ['cephadm']...

[root@VM-16-14-centos data]# ll /etc/ceph
total 4
-rw-r--r-- 1 root root 92 Jun 30 06:35 rbdmap

[root@VM-16-14-centos data]# mkdir -p /etc/ceph

[root@VM-16-14-centos ceph]# cephadm bootstrap --mon-ip 10.206.16.14
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 1df24ab6-4548-11ec-85be-525400d2062a
Verifying IP 10.206.16.14 port 3300 ...
Verifying IP 10.206.16.14 port 6789 ...
Mon IP 10.206.16.14 is in CIDR network 10.206.16.0/20
Pulling container image quay.io/ceph/ceph:v15...
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network...
Creating mgr...
Verifying port 9283 ...
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Wrote config to /etc/ceph/ceph.conf
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/10)...
^@mgr not available, waiting (2/10)...
mgr not available, waiting (3/10)...
mgr not available, waiting (4/10)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for Mgr epoch 5...
Mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to to /etc/ceph/ceph.pub
Adding key to root@localhost's authorized_keys...
Adding host VM-16-14-centos...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Enabling mgr prometheus module...
Deploying prometheus service with default placement...
Deploying grafana service with default placement...
Deploying node-exporter service with default placement...
Deploying alertmanager service with default placement...
Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for Mgr epoch 13...
Mgr epoch 13 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
Ceph Dashboard is now available at:

	     URL: https://VM-16-14-centos:8443/
	    User: admin
	Password: 3dyrvv478y

You can access the Ceph CLI with:

	sudo /usr/sbin/cephadm shell --fsid 1df24ab6-4548-11ec-85be-525400d2062a -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Please consider enabling telemetry to help improve Ceph:

	ceph telemetry on

For more information see:

	https://docs.ceph.com/docs/master/mgr/telemetry/

Bootstrap complete.

dashboard 访问 https://xxxx:8443 。首次访问需改密码。

/etc/ceph已有写入配置

[root@VM-16-14-centos ceph]# pwd
/etc/ceph
[root@VM-16-14-centos ceph]# ll
total 16
-rw------- 1 root root  63 Nov 14 20:42 ceph.client.admin.keyring
-rw-r--r-- 1 root root 175 Nov 14 20:42 ceph.conf
-rw-r--r-- 1 root root 595 Nov 14 20:42 ceph.pub
-rw-r--r-- 1 root root  92 Jun 30 06:35 rbdmap
[root@VM-16-14-centos ceph]# cat ceph.conf
# minimal ceph.conf for 1df24ab6-4548-11ec-85be-525400d2062a
[global]
	fsid = 1df24ab6-4548-11ec-85be-525400d2062a
	mon_host = [v2:10.206.16.14:3300/0,v1:10.206.16.14:6789/0]

1.6、启用 CEPH CLI

执行 cephadm shell
/etc/hosts 配置主机名，在后续过程中为唯一值。hosts中的文件名要和实际主机名一致，不能使用别名

# cephadm shell
[root@VM-16-14-centos ceph]# cephadm shell
Inferring fsid 1df24ab6-4548-11ec-85be-525400d2062a
Inferring config /var/lib/ceph/1df24ab6-4548-11ec-85be-525400d2062a/mon.VM-16-14-centos/config
Using recent ceph image quay.io/ceph/ceph@sha256:a2c23b6942f7fbc1e15d8cfacd6655a681fe0e44f288e4a158db22030b8d58e3

[ceph: root@VM-16-14-centos /]# ceph -s
  cluster:
    id:     1df24ab6-4548-11ec-85be-525400d2062a
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum VM-16-14-centos (age 14m)
    mgr: VM-16-14-centos.blyexk(active, since 14m)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:
    
[root@VM-16-14-centos ceph]# cephadm add-repo --release octopus
Writing repo to /etc/yum.repos.d/ceph.repo...
Enabling EPEL...

[root@VM-16-14-centos ceph]# cephadm install ceph-common
Installing packages ['ceph-common']...

# hosts中的文件名要和实际主机名一致，不能使用别名
[root@VM-16-14-centos ~]# cat /etc/hosts
10.206.16.6 node1 VM-16-6-centos
10.206.16.4 node2 VM-16-4-centos

[root@VM-16-14-centos ~]# hostnamectl
   Static hostname: VM-16-14-centos
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 507dcd6e2d7f40188332a41637f6f2fa
           Boot ID: 26fd37766d06423bba1ad5bf001295bd
    Virtualization: kvm
  Operating System: CentOS Linux 8
       CPE OS Name: cpe:/o:centos:centos:8
            Kernel: Linux 4.18.0-305.19.1.el8_4.x86_64
      Architecture: x86-64

# 这里要用实际主机名
[root@VM-16-14-centos ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub VM-16-4-centos
[root@VM-16-14-centos ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub VM-16-6-centos

1.7、开启ntp以及加入ceph节点

ntp服务 [master/node1/node2]，不然会报错

# ntp服务 [master/node1/node2]
[root@VM-16-6-centos ~]# systemctl restart chronyd.service && systemctl enable chronyd.service

# 加入ceph节点
[root@VM-16-14-centos ~]# ceph orch host add VM-16-4-centos
Added host 'VM-16-4-centos'
[root@VM-16-14-centos ~]# ceph orch host add VM-16-6-centos
Added host 'VM-16-6-centos'

# host清单
[root@VM-16-14-centos ~]# ceph orch host ls
HOST             ADDR             LABELS  STATUS
VM-16-14-centos  VM-16-14-centos
VM-16-4-centos   VM-16-4-centos
VM-16-6-centos   VM-16-6-centos

# 部署其他监视器，ceph自动判断节点部署
[root@VM-16-14-centos ~]# ceph config set mon public_network 10.206.16.0/20

1.8、在未使用的设备上自动创建osd

# 显示集群中的存储设备清单
[root@VM-16-14-centos ~]# ceph orch device ls
Hostname         Path      Type  Serial         Size   Health   Ident  Fault  Available
VM-16-14-centos  /dev/vdb  hdd   disk-rpged3ym  21.4G  Unknown  N/A    N/A    Yes

# 在未使用的设备上自动创建osd
[root@VM-16-14-centos ~]# ceph orch apply osd --all-available-devices
Scheduled osd.all-available-devices update...

#
[root@VM-16-14-centos ~]# ceph orch device ls
Hostname         Path      Type  Serial         Size   Health   Ident  Fault  Available
VM-16-14-centos  /dev/vdb  hdd   disk-rpged3ym  21.4G  Unknown  N/A    N/A    Yes

[root@VM-16-14-centos ~]# ceph -s
  cluster:
    id:     1df24ab6-4548-11ec-85be-525400d2062a
    health: HEALTH_WARN
            OSD count 1 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum VM-16-14-centos (age 20m)
    mgr: VM-16-14-centos.blyexk(active, since 19m)
    osd: 1 osds: 1 up (since 24s), 1 in (since 24s)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   1.0 GiB used, 19 GiB / 20 GiB avail
    pgs:     100.000% pgs not active
             1 undersized+peered
             
[root@VM-16-14-centos ~]# ceph orch daemon add osd VM-16-14-centos:/dev/vdb
Created osd(s) 0 on host 'VM-16-14-centos'

1.9、部署MDS

CephFS 需要两个 Pools，cephfs-data 和 cephfs-metadata,分别存储文件数据和文件元数据
MDS部署进程，在公有云虚拟机master上的数据盘上执行，耗时超长，需要注意。

[root@VM-16-14-centos ~]# ceph osd pool create cephfs_data 64 64
pool 'cephfs_data' created

[root@VM-16-14-centos ~]# ceph osd pool create cephfs_metadata 64 64
pool 'cephfs_metadata' created

# 创建一个 CephFS, 名字为 cephfs
[root@VM-16-14-centos ~]# ceph fs new cephfs cephfs_metadata cephfs_data
new fs with metadata pool 3 and data pool 2

# MDS部署
[root@VM-16-14-centos ~]# ceph orch apply mds cephfs --placement="3 VM-16-14-centos VM-16-4-centos VM-16-6-centos"
Scheduled mds.cephfs update...

# MDS部署进程，在公有云虚拟机上，耗时超长
Every 1.0s: ceph -s                                                                                                                                 VM-16-14-centos: Mon Nov 15 00:36:34 2021

  cluster:
    id:     1df24ab6-4548-11ec-85be-525400d2062a
    health: HEALTH_ERR
            failed to probe daemons or devices
            1 filesystem is offline
            1 filesystem is online with fewer MDS than max_mds
  services:
    mon: 1 daemons, quorum VM-16-14-centos (age 64m)
    mgr: VM-16-14-centos.blyexk(active, since 64m)
    mds: cephfs:0
    osd: 1 osds: 1 up (since 44m), 1 in (since 44m)

  data:
    pools:   3 pools, 129 pgs
    objects: 0 objects, 0 B
    usage:   1.0 GiB used, 19 GiB / 20 GiB avail
    pgs:     100.000% pgs not active
    pools:   3 pools, 129 pgs
    objects: 0 objects, 0 B
    usage:   1.0 GiB used, 19 GiB / 20 GiB avail
    pgs:     100.000% pgs not active
             129 undersized+peered

  progress:
    PG autoscaler decreasing pool 3 PGs from 64 to 16 (39m)
      [............................]
      
# MDS部署结束 
[root@VM-16-14-centos ~]# ceph -s
  cluster:
    id:     1df24ab6-4548-11ec-85be-525400d2062a
    health: HEALTH_WARN
            2 hosts fail cephadm check
            2 daemons have recently crashed
            298 slow ops, oldest one blocked for 185 sec, daemons [mon.VM-16-14-centos,mon.VM-16-6-centos] have slow ops.

  services:
    mon: 2 daemons, quorum VM-16-14-centos,VM-16-6-centos (age 7s)
    mgr: VM-16-4-centos.aegzbd(active, since 19m), standbys: VM-16-14-centos.blyexk
    mds: cephfs:1 {0=cephfs.VM-16-6-centos.yxlhyv=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 19m), 3 in (since 2h)

  data:
    pools:   3 pools, 81 pgs
    objects: 22 objects, 3.3 KiB
    usage:   3.1 GiB used, 57 GiB / 60 GiB avail
    pgs:     81 active+clean

验证至少有一个MDS已经进入active状态，默认情况下，ceph只支持一个活跃的MDS，其他的作为备用MDS

[root@VM-16-14-centos ~]# ceph fs status cephfs
cephfs - 0 clients
======
RANK  STATE               MDS                  ACTIVITY     DNS    INOS
 0    active  cephfs.VM-16-6-centos.yxlhyv  Reqs:    0 /s    10     13
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata  1553k  17.9G
  cephfs_data      data       0   17.9G
         STANDBY MDS
 cephfs.VM-16-4-centos.lglcaq
cephfs.VM-16-14-centos.mkrrsh
MDS version: ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)

1.10、部署RGW

[root@VM-16-14-centos ~]# ceph orch apply rgw myorg cn-east-1 --placement="3 VM-16-14-centos VM-16-4-centos VM-16-6-centos"
Scheduled rgw.myorg.cn-east-1 update...

二、使用

reference：

三、删除ceph

[root@VM-16-14-centos data]# ceph orch host rm VM-16-4-centos
Removed host 'VM-16-4-centos'
[root@VM-16-14-centos data]# ceph orch host rm VM-16-6-centos
Removed host 'VM-16-6-centos'
[root@VM-16-14-centos data]# ceph orch host rm VM-16-14-centos
Removed host 'VM-16-14-centos'

[root@VM-16-14-centos data]# ceph osd ls
0
1
2
[root@VM-16-14-centos data]# ceph osd stop 0
stop osd.0.
[root@VM-16-14-centos data]# ceph osd stop 1
stop osd.1.
[root@VM-16-14-centos data]# ceph osd stop 2
stop osd.2.
[root@VM-16-14-centos data]# ceph osd out osd.1
marked out osd.1.
[root@VM-16-14-centos data]# ceph osd out osd.2
marked out osd.2.
[root@VM-16-14-centos data]# ceph osd out osd.0
marked out osd.0.


[ceph: root@VM-16-14-centos /]# ceph orch apply mon --unmanaged
Scheduled mon update...


[ceph: root@VM-16-14-centos /]# ceph osd tree
[ceph: root@VM-16-14-centos /]# ceph osd rm 0
removed osd.0
[ceph: root@VM-16-14-centos /]# ceph osd rm 1
^[[Aremoved osd.1
[ceph: root@VM-16-14-centos /]# ceph osd rm 2
removed osd.2

[ceph: root@VM-16-14-centos /]# ceph osd crush rm 0
device '0' does not appear in the crush map
[ceph: root@VM-16-14-centos /]# ceph osd crush rm 1
device '1' does not appear in the crush map
[ceph: root@VM-16-14-centos /]# ceph osd crush rm 2
device '2' does not appear in the crush map

[ceph: root@VM-16-14-centos /]# ceph auth list | grep osd.1
installed auth entries:

osd.1
[ceph: root@VM-16-14-centos /]# ceph auth del osd.1
updated
[ceph: root@VM-16-14-centos /]# ceph auth del osd.2
updated
[ceph: root@VM-16-14-centos /]# ceph auth del osd.0
updated


[ceph: root@VM-16-14-centos /]# ceph mon stat

[root@VM-16-14-centos data]# systemctl status ceph-mgr.target
● ceph-mgr.target - ceph target allowing to start/stop all ceph-mgr@.service instances at once
   Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target; enabled; vendor preset: enabled)
   Active: active since Mon 2021-11-15 13:52:26 CST; 1h 15min ago
[root@VM-16-14-centos data]# systemctl stop ceph-mgr.target
[root@VM-16-14-centos data]# ceph -s


[root@VM-16-14-centos data]# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
[root@VM-16-14-centos data]# ceph mds fail 0
failed mds gid 34119

[root@VM-16-14-centos ~]# ceph config set mon mon_allow_pool_delete true
[root@VM-16-14-centos ~]# ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it

[root@VM-16-14-centos ~]# ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it
pool 'cephfs_data' removed

[root@VM-16-14-centos ~]# ceph osd pool delete  device_health_metrics device_health_metrics --yes-i-really-really-mean-it
pool 'device_health_metrics' removed
[root@VM-16-14-centos ~]# ceph osd pool delete  cephfs_metadata cephfs_metadata --yes-i-really-really-mean-it
pool 'cephfs_metadata' removed



[root@VM-16-14-centos data]# ceph fs fail cephfs
cephfs marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
[root@VM-16-14-centos data]# ceph fs rm cephfs --yes-i-really-mean-it

[root@VM-16-14-centos ~]# ceph mon remove VM-16-14-centos
removing mon.VM-16-14-centos at [v2:10.206.16.14:3300/0,v1:10.206.16.14:6789/0], there will be 1 monitors

05.3 CentOS 8.4部署Ceph

一、购置云硬盘

1.1、购置云硬盘以及挂载确认

1.2、创建ceph账号以及k8s集群间免密登录

1.3、创建账号[master/node1/node2][可选]

1.4、k8s集群间免密登录

1.5、安装ceph[master]

方法1： (ansible)

方法2：(ceph-deploy)

方法3：(cephadm)[推荐]

1.6、启用 CEPH CLI

1.7、开启ntp以及加入ceph节点

1.8、在未使用的设备上自动创建osd

1.9、部署MDS

1.10、部署RGW

二、使用

reference：

三、删除ceph