Ceph 笔记-部署篇ceph 分布式文件系统 ubuntu-server-20 ceph version 15.2.1

ceph 部署

ceph组件与操作系统版本

~# uname -a
Linux node001 5.4.0-81-generic #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
~# ceph -v
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
~#=================
ceph-deploy 2.0.1

网络设定

~# ip a | grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    inet 192.168.1.251/24 brd 192.168.1.255 scope global ens33
    inet 192.168.10.251/16 brd 192.168.255.255 scope global ens33
#双网卡，1.0 => publick-network,10.0 => cluster-network

系统环境设定

设定时钟同步

timedatectl set-timezone Asia/Shanghai

主机名称解析

cat >> /etc/hosts << EOF
192.168.1.251 node001
192.168.1.252 node002
192.168.1.253 node003
192.168.1.254 node004
EOF

关闭防火墙服务

ufw disable

关闭并禁用selinux

准备部署ceph集群

创建部署ceph的特定用户账号

部署工具ceph-deploy必须以普通用户登录到ceph集群的各目标节点，且此用户需要拥有无密码使用sudo命令的权限，以便在安装软件及生成配置文件的过程仲无需中断配置过程。

在各ceph节点创建用户

首先，在各个节点上以管理员身份创建一个专用于ceph-deploy的特定用户账号并为其设置密码

groupadd cephadm
useradd -g cephadm -d /home/cephadm -m cephadm -s /bin/bash
passwd cephadm
#=================================
echo "123456" | passwd --stdin cephadm #[ubuntu 操作失败]

而后，确保这些节点上新创建的用户cephadm都有无密码运行sudo命令的权限

echo "cephadm ALL = (root) NOPASSWD:ALL" |sudo tee /etc/sudoers.d/cephadm
chmod 0440 /etc/sudoers.d/cephadm

配置用户基于密钥的ssh认证

ceph-deploy命令不支持运行中途的密码输入，因此，必须在管理节点上生成ssh密钥并将其公钥发至ceph集群的各节点上。下面以cephadm用户的身份生成ssh密钥对：

ssh-keygen -t rsa -P ""

在管理节点安装ceph-deploy

apt install ceph-deploy
或↓
pip3 install git+https://github.com/ceph/ceph-deploy.git

部署RADOS存储集群

初始化RADOS集群

1、首先在管理节点上以cephadm用户创建集群相关的配置文件目录

mkdir ceph-cluster
cd ceph-cluster

2、初始第一个mon节点，准备创建集群

~/ceph-cluster$ ceph-deploy new --cluster-network 192.168.10.0/16 --public-network 192.168.1.0/24 node001 node002 node003 node004

3、编辑生成ceph.conf配置文件

public network = x.x.x.x

#4、安装ceph集群

ceph-deploy命令能够以远程的方式连入ceph集群各节点完成程序包安装等操作：

~/ceph-cluster$ ceph-deploy install node002 node003 node004

5、配置初始化mon节点，并收集所有密钥

~/ceph-cluster$ ceph-deploy mon create-initial

6、把配置文件和admin密钥拷贝ceph集群各节点，以免每次执行ceph命令时不得不明确指定mon节点地址和ceph.client.admin.keyring:

~/ceph-cluster$ ceph-deploy admin node001 node002 node003 node004

在ceph集群中需要运行ceph命令的节点上以root用户的身份设定cephadm用户能够读取ceph.client.admin.keyring文件

~/ceph-cluster$ setfacl -m u:cephadm:rw /etc/ceph/ceph.client.admin.keyring

7、配置manager节点，启动ceph-mgr进程

~/ceph-cluster$ ceph-deploy mgr create node001

向RADOS集群添加OSD

创建OSD

~/ceph-cluster$ ceph-deploy osd create --data /dev/sdb node004 [node001...node004]

部署ceph-mgr-dashboard

apt install ceph-mgr-dashboard

设置用户密码

echo "admin" > 123.txt
ceph dashboard set-login-credentials admin -i ./123.txt

开启dashboard

ceph mgr module disable dashboard
ceph mgr module enable dashboard

ceph集群之cephFS文件存储

部署MDS

~/ceph-cluster$ ceph-deploy mds create node001

创建cephfs文件系统

创建pool

~/ceph-cluster$ ceph osd pool create cephfs_metadata 16 16
~/ceph-cluster$ ceph osd pool create cephfs_data 16 16

创建文件系统

~/ceph-cluster$ ceph fs new cephfs cephfs_metadata cephfs_data

查看文件系统相关信息

~/ceph-cluster$ ceph fs ls
~/ceph-cluster$ ceph mds stat

使用内核驱动挂载 CEPHFS

~/ceph-cluster$ sudo mkdir /ceph_fs
# 挂载需要指定ceph集群中monitor地址及用户名
~/ceph-cluster$ sudo mount -t ceph 192.168.1.251:6789:/ /ceph_fs -o name=admin
~/ceph-cluster$ cd /ceph_fs/
#测试，写入大块数据
~/ceph-cluster$ sudo dd if=/dev/zero of=test.img bs=1M count=10240

登录dashboard

https://192.168.1.251:8443/

踩坑日记

现象一：

[node004][ERROR ] Can't communicate with remote host, possibly because python3 is not installed there
[ceph_deploy][ERROR ] RuntimeError: connecting to host: node004 resulted in errors: OSError cannot send (already closed?)

解决方案

1、根据日志排查节点python3 
2、检查节点sudo 无密码使用问题

现象二：

dpkg: error processing archive /var/cache/apt/archives/ceph-deploy_2.0.1-0ubuntu1_all.deb (--unpack):
 trying to overwrite '/usr/share/man/man8/ceph-deploy.8.gz', which is also in package ceph-base 15.2.13-0ubuntu0.20.04.1
Errors were encountered while processing:
 /var/cache/apt/archives/ceph-deploy_2.0.1-0ubuntu1_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

解决方案：

dpkg -i --force-overwrite /var/cache/apt/archives/ceph-deploy_2.0.1-0ubuntu1_all.deb
apt install ceph-deploy

现象三：

[node002][INFO  ] Running command: sudo fdisk -l
[ceph_deploy][ERROR ] Traceback (most recent call last):
[ceph_deploy][ERROR ]   File "/home/cephadm/.local/lib/python3.8/site-packages/ceph_deploy/util/decorators.py", line 69, in newfunc
[ceph_deploy][ERROR ]     return f(*a, **kw)
[ceph_deploy][ERROR ]   File "/home/cephadm/.local/lib/python3.8/site-packages/ceph_deploy/cli.py", line 166, in _main
[ceph_deploy][ERROR ]     return args.func(args)
[ceph_deploy][ERROR ]   File "/home/cephadm/.local/lib/python3.8/site-packages/ceph_deploy/osd.py", line 435, in disk
[ceph_deploy][ERROR ]     disk_list(args, cfg)
[ceph_deploy][ERROR ]   File "/home/cephadm/.local/lib/python3.8/site-packages/ceph_deploy/osd.py", line 375, in disk_list
[ceph_deploy][ERROR ]     line = line.decode('utf-8')
[ceph_deploy][ERROR ] AttributeError: 'str' object has no attribute 'decode'
[ceph_deploy][ERROR ]

未解决：

现象四：

health: HEALTH_WARN
		mons are allowing insecure global_id reclaim

解决方案

ceph config set mon auth_allow_insecure_global_id_reclaim false

现象五：

curl https://node001:8443
curl: (60) SSL certificate problem: self signed certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

解决方案：

方案一：https://ip:8443端口
方案二：访问该网址的主机修改hosts增加域名关系