4、关闭防火墙
service iptables stop
service ip6tables stop
5、安装相关依赖
yum -y install libtool
yum -y install unixODBC
6、安装Zookeeper
三、ClickHouse安装
官网安装部署:安装部署 | ClickHouse文档
Altinity安装部署:github.com/Altinity/cl…
看云安装部署:1.2ClickHouse单机安装 · ClickHouse · 看云
社区单机部署:CentOS7.5 安装 ClickHouse 20.8.3.18单机版 - clickhouseclub
社区源码部署:clickhouse 在centos7.4 编译 - clickhouseclub
社区集群部署:ClickHouse集群搭建从0到1 - clickhouseclub
win10-Docker部署:Windows下Docker安装ClickHouse - 云+社区 - 腾讯云
1、RPM在线安装
ClickHouse下载(rpm包):Index of /clickhouse/rpm/stable/x86_64/
sudo yum -y install yum-utils
sudo rpm --import repo.clickhouse.tech/CLICKHOUSE-…
sudo yum-config-manager --add-repo repo.clickhouse.tech/rpm/clickho…
sudo yum -y install clickhouse-server
sudo yum -y install clickhouse-client
2、TGZ在线安装(推荐)
ClickHouse下载(tgz包):repo.clickhouse.tech/tgz/stable
将ClickHouse的最新版本赋给变量LATEST_VERSION,但这里发现21.10.1.8013版本还没有提供
export LATEST_VERSION=curl https://api.github.com/repos/ClickHouse/ClickHouse/tags 2>/dev/null | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -n 1
所有这里手动指定ClickHouse的版本为21.9.2.17(版本号去官网查)
生产环境建议指定最新的稳定版本,版本查看地址:Tags · ClickHouse/ClickHouse · GitHub
export LATEST_VERSION=21.9.2.17
curl -O repo.clickhouse.tech/tgz/clickho…
curl -O repo.clickhouse.tech/tgz/clickho…
curl -O repo.clickhouse.tech/tgz/clickho…
curl -O repo.clickhouse.tech/tgz/clickho…
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz
sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz
sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh
sudo /etc/init.d/clickhouse-server start
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz
sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
解压时发现官网提供的tgz包有问题(真坑) ,只能手动下载再上传安装了
Linux tgz package clickhouse-client-21.9.2.17.tgz
Linux tgz package clickhouse-common-static-21.9.2.17.tgz
Linux tgz package clickhouse-common-static-dbg-21.9.2.17.tgz
Linux tgz package clickhouse-server-21.9.2.17.tgz
Linux tgz package clickhouse-test-21.9.2.17.tgz
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz
sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz
sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh
sudo /etc/init.d/clickhouse-server start
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz
sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
3、RPM离线安装
mkdir -p /home/software/clickhouse
cd /home/software/clickhouse
wget repo.red-soft.biz/repos/click…
wget repo.red-soft.biz/repos/click…
wget repo.red-soft.biz/repos/click…
wget repo.red-soft.biz/repos/click…
wget repo.red-soft.biz/repos/click…
rpm -qa | grep clickhouse
rpm -Uvh *.rpm
rpm -qa | grep clickhouse
nohup clickhouse-server --config-file=/etc/clickhouse-server/config.xml >null 2>&1 &
四、ClickHouse命令
centos7以上的系统只有 systemctl 命令有效
1、启动clickhouse服务
sudo /etc/init.d/clickhouse-server start
service clickhouse-server start
systemctl start clickhouse-server.service
2、关闭clickhouse服务
sudo /etc/init.d/clickhouse-server stop
service clickhouse-server stop
systemctl stop clickhouse-server.service
3、重启clickhouse服务
sudo /etc/init.d/clickhouse-server restart
service clickhouse-server restart
systemctl restart clickhouse-server.service
4、查看clickhouse服务
sudo /etc/init.d/clickhouse-server status
service clickhouse-server status
systemctl status clickhouse-server.service
5、启动clickhouse客户端
clickhouse-client
6、查看clickhouse进程
ps -ef | grep clickhouse
7、停止clickhouse相关服务
ps -ef | grep clickhouse | grep -v grep | awk '{print $2}' | xargs kill -9
8、查看clickhouse正常日志
tail -n 300 /var/log/clickhouse-server/clickhouse-server.log
9、查看clickhouse错误日志
tail -n 300 /var/log/clickhouse-server/clickhouse-server.err.log
10、关闭clickhouse开机自启(非生产环境)
sudo systemctl disable clickhouse-server
11、查看clickhouse集群配置
clickhouse-client -u default --password "" --query "SELECT * FROM system.clusters"
12、查看clickhouse所有表
echo "SELECT database,name,engine FROM system.tables WHERE database != 'system'" | clickhouse-client
13、查看系统用户内存用量
ps aux | tail -n +2 | awk '{ printf("%s\t%s\n", 4) }' | clickhouse local -S "user String, memory Float64" -q "SELECT user, round(sum(memory), 2) as memoryTotal FROM table GROUP BY user ORDER BY memoryTotal DESC FORMAT Pretty"
14、查询clickhouse执行指标
echo "SELECT * FROM system.numbers LIMIT 1000" | clickhouse-benchmark -i 5 -h localhost -h localhost
15、修改clickhouse用户密码
①方法一:配置 /etc/clickhouse-server/users.xml 文件
vim /etc/clickhouse-server/users.xml
123456
②方法二:配置 /etc/clickhouse-client/config.xml 文件
vim /etc/clickhouse-client/config.xml
username
password
False
五、ClickHouse集群
ClickHouse集群部署:使用教程 | ClickHouse文档
ClickHouse集群配置:clickhouse.tech/docs/zh/ope…
ClickHouse副本引擎:数据副本 | ClickHouse文档
ClickHouse分布式配置:分布 | ClickHouse文档
该集群配置为分片副本集群, ClickHouse只有 MergeTree 系列里的表可支持副本
副本配置提供高可用,分片配置提供数据的横向扩展和容灾
ClickHouse在单个节点创建表,表只会创建在单个节点上。如果想要使用复制表,
在建表时必须指定带 Replicated 前缀的复制表引擎,然后在每个节点上创建相同表
副本只能同步数据,不能同步表结构,所以我们需要在每台机器上自己手动建表
每台机器都相同的配置文件:/etc/clickhouse-server/config.xml(不引入外部metrika.xml不相同)
每台机器不相同的配置文件:/etc/metrika.xml
1、 ClickHouse目录文件介绍
| 数据存储目录 | /var/lib/clickhouse |
| 日志存储目录 | /var/log/clickhouse-server |
| 默认分片集群配置 | /etc/metrika.xml |
| 服务器配置文件 | /etc/clickhouse-server/config.xml |
| 客户端配置文件 | /etc/clickhouse-client/config.xml |
| 定时任务配置 | /etc/cron.d/clickhouse-server |
| 系统服务配置文件 | /etc/systemd/system/clickhouse-server.service |
| 文件句柄数量配置 | /etc/security/limits.d/clickhouse.conf |
| 主程序可执行文件 | /var/lib/clickhouse |
| 客户端连接可执行文件 | /usr/bin/clickhouse-client |
| 服务端可执行文件 | /usr/bin/clickhouse-server |
| 数据压缩可执行文件 | /usr/bin/clickhouse-compressor |
| 服务器正常日志文件 | /var/log/clickhouse-server/clickhouse-server.log |
| 服务端错误日志文件 | /var/log/clickhouse-server/clickhouse-server.err.log |
2、ClickHouse集群规划
| zookeeper | clickhouse | 分片 | 副本 | |
| hadoop001 | √ | √ | shard01 | replica_01_02 |
| hadoop002 | √ | √ | shard02 | replica_02_02 |
| hadoop003 | √ | √ | shard03 | replica_03_02 |
| hadoop004 | √ | shard01 | replica_01_01 | |
| hadoop005 | √ | shard02 | replica_02_01 | |
| hadoop006 | √ | shard03 | replica_03_01 |
3、ClickHouse核心配置
cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config.xml.init
chmod 664 /etc/clickhouse-server/config.xml
chown -R clickhouse:clickhouse /etc/clickhouse-server
vim /etc/clickhouse-server/config.xml
<listen_host>0.0.0.0</listen_host>
/home/clickhouse/data/
<tmp_path>/home/clickhouse/tmp/</tmp_path>
<user_files_path>/home/clickhouse/data/user_files/</user_files_path>
mkdir -p /home/clickhouse/data/
mkdir -p /home/clickhouse/tmp/
mkdir -p /home/clickhouse/data/user_files/
chown -R clickhouse:clickhouse /home/clickhouse/
4、ClickHouse集群配置(rpm版本)
**注意:**集群配置在创建分布式表时可以使用{shard}和{replac}方便创建表,在建表时也可以直接自定义shard和replac变量并且不局限于集群配置的变量,可由开发者灵活定义。集群定义的元数据在zookeeper中保存,如果修改了已定义好的集群表的集群配置可能会导致表变成只读状态,这时需要去zookeeper上查看clickhouse的元数据信息是否和当前表匹配。
在rmp安装的版本中,clickhouse服务端默认配置的 /etc/clickhouse-server/config.xml 中表明
会默认加载 /etc/metrika.xml 文件作为远程服务的替换文件,这里手动配置在其它目录
其中默认的集群服务名称为标签 incl 指定的 clickhouse_remote_servers
<remote_servers incl="clickhouse_remote_servers" />
在 /etc/clickhouse-server/config.d/ 目录下手动配置分片副本集群文件 metrika.xml
chmod 664 /etc/clickhouse-server/config.d/metrika.xml
chown clickhouse:clickhouse /etc/clickhouse-server/config.d/metrika.xml
vim /etc/clickhouse-server/config.d/metrika.xml
每台集群的配置文件都不一样,区别在于标签的和标签
详情查看上文 ClickHouse集群规划
<clickhouse_remote_servers>
<cluster_3shards_2replicas>
<internal_replication>true</internal_replication>
hadoop004
9000
hadoop001
9000
<internal_replication>true</internal_replication>
hadoop005
9000
hadoop002
9000
<internal_replication>true</internal_replication>
hadoop006
9000
hadoop003
9000
</cluster_3shards_2replicas>
</clickhouse_remote_servers>
hadoop001
2181
hadoop002
2181
hadoop003
2181
shard01
replica_01_02
::/0
<clickhouse_compression>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
lz4
</clickhouse_compression>
每台机器上只有 标签不一样,这里配置的3分片2副本的不同节点参数如下
shard01
replica_01_02
shard02
replica_02_02
shard03
replica_03_02
shard01
replica_01_01
shard02
replica_02_01
shard03
replica_03_01
sed -n '78, 81p' /etc/clickhouse-server/config.d/metrika.xml
clickhouse-client -u default --password "" --query "SELECT * FROM system.clusters"
配置之后,使用过一次该集群,clickhouse会在根目录下将本机配置写入macros文件
cat /home/clickhouse/macros
[root@hadoop001 ~]# cat /home/clickhouse/macros
export shard=shard01
export replica=replica_01_02
..............................................
[root@hadoop006 ~]# cat /home/clickhouse/macros
export shard=shard03
export replica=replica_03_01
5、ClickHouse主集群配置(tgz版本)
在 tgz 安装的版本中,clickhouse服务端默认配置的 /etc/clickhouse-server/config.xml 中已经默认配置了三个本地测试的分片副本集群,分别为
本地分片测试集群:test_shard_localhost
本地两个分片测试集群:test_cluster_two_shards_localhost
两分片测试集群:test_cluster_two_shards
两分片内部复制测试集群 :test_cluster_two_shards_internal_replication
本地分片安全集群:test_shard_localhost_secure
测试不推荐的分片集群:test_unavailable_shard
clickhouse-client -u default --password "" --query "SELECT * FROM system.clusters"
①修改 /etc/clickhouse-server/config.xml 文件,把 remote_servers 标签的默认配置注释掉
②在 /etc/clickhouse-server/config.xml 文件中添加以下配置,
<include_from>/etc/metrika.xml</include_from>
<remote_servers>
<cluster_3shards_2replicas>
<internal_replication>true</internal_replication>
hadoop004
9000
hadoop001
9000
<internal_replication>true</internal_replication>
hadoop005
9000
hadoop002
9000
<internal_replication>true</internal_replication>
hadoop006
9000
hadoop003
9000
</cluster_3shards_2replicas>
</remote_servers>
shard01
replica_01_02
hadoop001
2181
hadoop002
2181
hadoop003
2181
③查看标签配置
sed -n '751, 755p' /etc/clickhouse-server/config.xml
sed -n '78, 81p' /etc/clickhouse-server/config.d/metrika.xml
chmod 664 /etc/clickhouse-server/config.xml
chown clickhouse:clickhouse /etc/clickhouse-server/config.xml
ll /etc/clickhouse-server/config.xml
④添加 /etc/metrika.xml 文件,配置信息和4.1中的rpm安装的配置一致
chmod 664 /etc/metrika.xml
既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!
由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新