我报名参加金石计划1期挑战——瓜分10万奖池,这是我的第n篇文章,点击查看活动详情
最近项目不是很忙,抽时间记录一下我们是如何在k8s上面基于clickhouse-operator部署clickhouse的。
Operator是什么
https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop.大家可以去官网去了解一下,简单来说Operators允许你定义一个自定义控制器来监控你的应用程序并根据其状态执行自定义任务
1-安装clickhouse-operator
如果不想指定namespace,默认安装在kube-system下面可以直接执行
kubectl apply -f https://raw.githubusercontent.com/Altinity/clickhouse-operator/master/deploy/operator/clickhouse-operator-install-bundle.yaml
如果需要指定则可以执行
curl -s https://raw.githubusercontent.com/Altinity/clickhouse-operator/master/deploy/operator-web-installer/clickhouse-operator-install.sh | OPERATOR_NAMESPACE=default bash
安装过程如下:
kubectl create -f clickhouse-operator-install-bundle.yaml
customresourcedefinition.apiextensions.k8s.io/clickhouseinstallations.clickhouse.altinity.com created
customresourcedefinition.apiextensions.k8s.io/clickhouseinstallationtemplates.clickhouse.altinity.com created
customresourcedefinition.apiextensions.k8s.io/clickhouseoperatorconfigurations.clickhouse.altinity.com created
serviceaccount/clickhouse-operator created
clusterrole.rbac.authorization.k8s.io/clickhouse-operator-kube-system created
clusterrolebinding.rbac.authorization.k8s.io/clickhouse-operator-kube-system created
configmap/etc-clickhouse-operator-files created
configmap/etc-clickhouse-operator-confd-files created
configmap/etc-clickhouse-operator-configd-files created
configmap/etc-clickhouse-operator-templatesd-files created
configmap/etc-clickhouse-operator-usersd-files created
deployment.apps/clickhouse-operator created
service/clickhouse-operator-metrics created
安装完成后可以用命令查看
kubectl get pod -n kube-system
github上面也提供了安装详情的说明和校验:
https://github.com/Altinity/clickhouse-operator/blob/master/docs/operator_installation_details.md
kubectl get serviceaccounts -n kube-system|grep clickhouse
kubectl get customresourcedefinitions|grep clickhouse
kubectl get clusterrolebinding|grep clickhouse
kubectl get deployments --namespace kube-system|grep clickhouse
校验无误后,operator的安装基本已经结束
2-安装zookeeper
现有阶段clickhouse还是需要依赖zookeeper。
在大数据的场景中zookeeper可以提供数据一致性的能力。在clickhouse中zookeeper主要用在副本表数据的同步(ReplicatedMergeTree引擎)以及分布式表(Distributed)的操作上。
我们来看一下如何安装zookeeper
安装zk的yaml,可以从github中 的代码路径中获取clickhouse-operator-release-0.19.1/deploy/zookeeper/quick-start-persistent-volume
创建空间
kubectl create namespace zoons
持久化zk数据,基Local Persistent Volumes方式,他代表了直接绑定在计算节点上的一块本地磁盘。
首先创建StorageClasskubectl create -f sc.yaml -n zoons
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: zk-local-storage
annotations:
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Retain
volumeBindingMode: Immediate
创建PV
kubectl create -f zk_volume.yaml -n zoons
apiVersion: v1
kind: PersistentVolume
metadata:
name: volume-zk01
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: zk-local-storage
local:
path: /app/k8soperator/zookeeper/data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- docker-desktop
修改提供的zookeeper-1-node-1GB-for-tests-only.yaml文件中的volume信息,这里资源有先我只安装了
一个节点.
volumeClaimTemplates:
- metadata:
name: datadir-volume
spec:
storageClassName: zk-local-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
创建zk:
kubectl apply -f zookeeper-1-node-1GB-for-tests-only.yaml -n zoons
查看结果:
kubectl get pod -n zoons
NAME READY STATUS RESTARTS AGE
zookeeper-0 0/1 Running 0 6s
删除 kubectl delete -f zookeeper-1-node-1GB-for-tests-only.yaml -n zoons
3-部署clickhouse
前面的准备步骤终于结束了,到此我们就真正可以来部署我们的clickhouse了.
首先来为clickhouse做一个持久化的准备
kubectl create -f ck_sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ck-local-storage
annotations:
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Retain
volumeBindingMode: Immediate
kubectl create -f ck_volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: volume-ck01
spec:
capacity:
storage: 1Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: ck-local-storage
local:
path: /Users/lizu/app/k8soperator/clickhouse/data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- docker-desktop
修改部署ck的yaml
docker search clickhouse-server
查询clickhouse 的镜像,我们使用yandex/clickhouse-server:21.7.5这个版本
我这里只部署了一个分片和一个副本,大家可以自行修改
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "repl-05"
spec:
defaults:
templates:
dataVolumeClaimTemplate: default
podTemplate: clickhouse:21.7.5
configuration:
zookeeper:
nodes:
- host: zookeeper-0.zookeepers.zoons ## 配置zk的信息
port: 2181
clusters:
- name: replicated
layout:
shardsCount: 1
replicasCount: 1 ## 我是单机,所以单分片单副本
templates:
volumeClaimTemplates:
- name: data-storage-vc-template
spec:
storageClassName: ck-local-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi
podTemplates:
- name: clickhouse:21.7.5
spec:
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:21.7.5
volumeMounts:
- name: data-storage-vc-template
mountPath: /var/lib/clickhouse
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 200m
memory: 1Gi
kubectl apply -f replication-zookeeper-05-simple-PV.yaml
启动后检查状态:
kubectl get pod -A|grep chi
default chi-repl-05-replicated-0-0-0 0/1 ContainerCreating 0 8s
kubectl get pod|grep chi
chi-repl-05-replicated-0-0-0 0/1 Running 0 13s
如果启动有问题可以检查一下这些命令:
kubectl describe pod chi-repl-05-replicated-0-0-0
kubectl get ClickHouseInstallation
kubectl get StatefulSet
删除 kubectl delete -f replication-zookeeper-05-simple-PV.yaml
最后按文档来检查一下
进入容器内部:
kubectl exec -it chi-repl-05-replicated-0-0-0 bash
进入客户端命令:
clickhouse-client -m
创建本地表:
CREATE TABLE events_local on cluster '{cluster}' (
event_date Date,
event_type Int32,
article_id Int32,
title String
) engine=ReplicatedMergeTree('/clickhouse/{installation}/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_type, article_id);
创建分布式表:
CREATE TABLE events on cluster '{cluster}' AS events_local ENGINE = Distributed('{cluster}', default, events_local, rand());数据写入:
数据写入
INSERT INTO events SELECT today(), rand()%3, number, 'my title' FROM numbers(100);
数据查询
SELECT count()
FROM events
Query id: c30dd112-6def-4f4c-b1d2-8044668dc991
┌─count()─┐
│ 100 │
└─────────┘