复制池
概念
Ceph通过创建每个对象的多个副本来保护复制池中的数据。每个副本称为replica(副本)。这些副本而后存储在池中的不同OSD中;这样,其中一个OSD出现故障,对象中的数据会得到保护
特征
- 占用空间多,存储容量是1/n
- 写速度比纠删池快
创建
ceph osd pool create pool_name pg_num pgp_num
# 创建时可指定的参数
pool-name: 池的名字
pg-num: PG的数量
pgp-num: PGP的数量
crush-ruleset-name: 规则集的名字
expected-num-objects: 预期的对象数量
查看
# 创建一个池
[root@node1 ~]# ceph osd pool create tst1 128 128
# 查看池
[root@node1 ~]# ceph osd lspools
1 .mgr
2 tst1
# 查看池的详细信息
[root@node1 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 8.00
pool 2 'tst1' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 94 pgp_num 94 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 178 lfor 0/178/176 flags hashpspool stripe_width 0 read_balance_score 1.36
# 查看池的使用情况
[root@node1 ~]# ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 160 GiB 160 GiB 275 MiB 275 MiB 0.17
TOTAL 160 GiB 160 GiB 275 MiB 275 MiB 0.17
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 449 KiB 2 904 KiB 0 76 GiB
tst1 2 60 0 B 0 0 B 0 76 GiB
# 查看池的状态
[root@node1 ~]# ceph osd pool stats tst1
pool tst1 id 2
nothing is going on
为池启用Ceph应用
在创建了池后,管理员必须显示指定能够使用它的Ceph应用类型:Ceph块设备(也称为RADOS块设备或RDB)、Ceph对象网关(也称为RADOS网关或RGW)或Ceph文件系统(CephFS)
ceph osd pool application enable 池名字 应用类型(可选值为rbd、rgw、cephfs)
[root@node1 ~]# ceph osd pool application enable tst1 rbd
enabled application 'rbd' on pool 'tst1'
池的一些常用操作
# 池的重命名
[root@node1 ~]# ceph osd pool rename tst1 ceph125
pool 'tst1' renamed to 'ceph125'
[root@node1 ~]# ceph osd lspools
1 .mgr
2 ceph125
# 给pool做配额进行限制
# 设置最大使用的字节数
[root@node1 ~]# ceph osd pool set-quota ceph125 max_bytes 51200000000
set-quota max_bytes = 51200000000 for pool ceph125
# 为池制作快照
[root@node1 ~]# ceph osd pool mksnap ceph125 ceph125-snap01
created pool ceph125 snap ceph125-snap01
[root@node1 ~]# ceph osd pool mksnap ceph125 ceph125-snap01
created pool ceph125 snap ceph125-snap01
[root@node1 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 8.00
pool 2 'ceph125' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 491 lfor 0/486/484 flags hashpspool,pool_snaps max_bytes 51200000000 stripe_width 0 application rbd read_balance_score 1.25
snap 1 'ceph125-snap01' 2023-09-13T23:58:45.664645+0800
# 删除快照
[root@node1 ~]# ceph osd pool rmsnap ceph125 ceph125-snap01
removed pool ceph125 snap ceph125-snap01
# 从快照中取出文件
[root@node1 ~]# rados -p ceph125 -s ceph125-snap01 get object test.txt
selected snap 3 'ceph125-snap01'
error getting ceph125/object: (2) No such file or directory
# 获取池的参数
[root@node1 ~]# ceph osd pool get ceph125 all
size: 2
min_size: 1
pg_num: 32
pgp_num: 32
crush_rule: replicated_rule
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
pg_autoscale_mode: on
eio: false
bulk: false
# 设置池的参数
[root@node1 ~]# ceph osd pool set ceph125 size 3
set pool 2 size to 3
查看ceph osd的各种状态
# 查看ceph集群的健康状态
[root@node1 ~]# ceph health
HEALTH_WARN mons are allowing insecure global_id reclaim; 1 daemons have recently crashed; 11 mgr modules have recently crashed
# 查看osd的状态
[root@node1 ~]# ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 node1 36.0M 19.9G 0 0 0 0 exists,up
1 node4 36.0M 19.9G 0 0 0 0 exists,up
2 node2 36.5M 19.9G 0 0 0 0 exists,up
3 node3 36.0M 19.9G 0 0 0 0 exists,up
4 node4 36.0M 19.9G 0 0 0 0 exists,up
5 node1 36.0M 19.9G 0 0 0 0 exists,up
6 node2 36.0M 19.9G 0 0 0 0 exists,up
7 node3 36.5M 19.9G 0 0 0 0 exists,up
[root@node1 ~]# ceph osd stat
8 osds: 8 up (since 2h), 8 in (since 2h); epoch: e495
# 查看哪个主机上面有哪几个osd
[root@node1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.15588 root default
-3 0.03897 host node1
0 ssd 0.01949 osd.0 up 1.00000 1.00000
5 ssd 0.01949 osd.5 up 1.00000 1.00000
-7 0.03897 host node2
2 ssd 0.01949 osd.2 up 1.00000 1.00000
6 ssd 0.01949 osd.6 up 1.00000 1.00000
-9 0.03897 host node3
3 ssd 0.01949 osd.3 up 1.00000 1.00000
7 ssd 0.01949 osd.7 up 1.00000 1.00000
-5 0.03897 host node4
1 ssd 0.01949 osd.1 up 1.00000 1.00000
4 ssd 0.01949 osd.4 up 1.00000 1.00000
查看ceph mon的各种状态
[root@node1 ~]# ceph mon stat
e1: 3 mons at {node1=[v2:192.168.202.129:3300/0,v1:192.168.202.129:6789/0],node2=[v2:192.168.202.130:3300/0,v1:192.168.202.130:6789/0],node3=[v2:192.168.202.131:3300/0,v1:192.168.202.131:6789/0]} removed_ranks: {1}, election epoch 22, leader 0 node1, quorum 0,1,2 node1,node2,node3
# 查看mon的版本信息
[root@node1 ~]# ceph mon versions
{
"ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable)": 3
}
纠删池
概念:
纠删代码池使用纠删代码而非复制来保护对象数据
纠删码
EC是一种编码容错技术,最早是在通信行业解决部分数据在传输中的损耗问题,原理就是把传输的信号分段,加入一定的校验再让各段间发生相互关联,即使再传输过程中丢失部分信号,接收端仍然能够通过算法将完整的信息计算出来。
在数据存储中,纠删码将数据分割成片段,将冗余数据块扩散和编码,并将其存储在不同的位置,比如磁盘、存储节点或者其他地理位置。
纠删码原理
- 每个对象的数据分割为k个数据区块。
- 计算出m个编码区块
- 编码区块大小与数据区块大小相同
- 对象存储在总共k + m 个OSD中
- n = k + m
创建纠删池
ceph osd pool create pool-name-pg-num [pgp-num] erasure [erasure-code-profile] [crush-ruleset-name] [expected_num_objects]
# 参数选择
pool-name 是新池的名称
pg-num 是这个池的放置组的有效数。通常而言,这应当与放置组总数相等
erasure指定这是纠删代码池
erasure-code-profile 是要使用的配置集的名称
crush-ruleset-name 是要用于这个池的CRUSH规则集的名称
expected-num-objects
查看默认的纠删集
[root@node1 ~]# ceph osd erasure-code-profile get default
k=2 # 数据块为2
m=2 # 编码块为2
plugin=jerasure
technique=reed_sol_van
查看有哪些纠删池配置集
[root@node1 ~]# ceph osd erasure-code-profile ls
ceph125
default
创建一个纠删集
# 创建一个纠删集ceph125, 故障转移域是osd级别的
[root@node1 ~]# ceph osd erasure-code-profile set ceph125 k=3 m=2 crush-failure-domain=osd
[root@node1 ~]# ceph osd erasure-code-profile get ceph125
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8
创建一个纠删池
[root@node1 ~]# ceph osd pool create ceph125-erasure erasure ceph125
pool 'ceph125-erasure' created
[root@node1 ~]# ceph osd pool ls detail
pool 1 '.mgr' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 8.00
pool 2 'ceph125' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 494 lfor 0/486/484 flags hashpspool,pool_snaps max_bytes 51200000000 stripe_width 0 application rbd read_balance_score 1.25
snap 3 'ceph125-snap01' 2023-09-14T00:01:39.508235+0800
pool 3 'ceph125-erasure' erasure profile ceph125 size 5 min_size 4 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 504 lfor 0/0/502 flags hashpspool stripe_width 12288