从用户角度体验 ClickHouse 快速开始作为一个脱胎于 Yandex，开源的，面向 OLAP（online ana

作为一个脱胎于 Yandex，开源的，面向 OLAP（online analytical processing of queries）的分布式纵列数据库管理系统，ClickHouse 一直活跃在明星数据库的舞台上。

在 2022 年 4 月 26 日的一篇博客 ClickHouse Docs have a new look and feel! 中，Rich Raposa 表示，鉴于文档对帮助用户（开发者、DBA、架构师等）理解和使用产品的重要作用，ClickHouse 的文档做了很多努力在改善现有内容和增加新内容上。除去传统的面向功能的数据库操作手册（面向 DBA）和开发指南（面向数据库开发人员），ClickHouse 重点增加了一系列教程文档，方便开发者和 DBA 快速上手。这些文档包括：

帮助用户快速上手的快速开始和教程文档
一系列帮助用户连接知名 UI/BI 工具的教程，涵盖 Grafana, Metabase, Superset, and Tableau
一系列帮助用户连接外部工具的教程，例如 Kafka, AWS S3, PostgreSQL, MySQL, Airbyte

总体来看，ClickHouse 的新文档重点改善了新用户的上手体验，同时整体提升了文档的可操作性，增加了大量可实操的教程。另外，新文档补充了大量数据库与外部生态结合的内容，这就让技术内容不再局限于数据库本身。这对个人开发者无疑是非常友好的。

实际体验

当然，我们作为文档工程师，不能只看宣传，也要看实际效果。现在我们就跟着 ClickHouse 的教程实际操作一下。

根据官方文档，ClickHouse 兼容多种操作系统和 CPU 架构。

操作系统：

Linux
FreeBSD
macOS

CPU 架构：

x86_64
AArch64
PowerPC64LE

我们选择 Ubuntu 20.04 系统，体验 ClickHouse 的最新教程。

安装并启动 ClickHouse

ClickHouse 官方提供了一个 bash 脚本，帮助你自动化下载和安装。

# 下载 ClickHouse
curl https://clickhouse.com/ | sh

bash 脚本会自动进行下载。

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1292    0  1292    0     0    548      0 --:--:--  0:00:02 --:--:--   548

Will download https://builds.clickhouse.com/master/amd64/clickhouse

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0 2140M    0 2848k    0     0   525k      0  1:09:28  0:00:05  1:09:23  569k

完成之后，运行以下命令进行安装：

sudo ./clickhouse install

Copying ClickHouse binary to /usr/bin/clickhouse.new
Renaming /usr/bin/clickhouse.new to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-server to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-client to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-local to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-benchmark to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-copier to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-obfuscator to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-git-import to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-compressor to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-format to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-extract-from-config to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper-converter to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-disks to /usr/bin/clickhouse.
Creating clickhouse group if it does not exist.
 groupadd -r clickhouse
Creating clickhouse user if it does not exist.
 useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse clickhouse
Will set ulimits for clickhouse user in /etc/security/limits.d/clickhouse.conf.
Creating config directory /etc/clickhouse-server.
Creating config directory /etc/clickhouse-server/config.d that is used for tweaks of main server configuration.
Creating config directory /etc/clickhouse-server/users.d that is used for tweaks of users configuration.
Data path configuration override is saved to file /etc/clickhouse-server/config.d/data-paths.xml.
Log path configuration override is saved to file /etc/clickhouse-server/config.d/logger.xml.
User directory path configuration override is saved to file /etc/clickhouse-server/config.d/user-directories.xml.
OpenSSL path configuration override is saved to file /etc/clickhouse-server/config.d/openssl.xml.
Creating log directory /var/log/clickhouse-server.
Creating data directory /var/lib/clickhouse.
Creating pid directory /var/run/clickhouse-server.
 chown -R clickhouse:clickhouse '/var/log/clickhouse-server'
 chown -R clickhouse:clickhouse '/var/run/clickhouse-server'
 chown  clickhouse:clickhouse '/var/lib/clickhouse'
Enter password for default user:

在这里输入密码或直接回车。

ClickHouse has been successfully installed.

Start clickhouse-server with:
 sudo clickhouse start

Start clickhouse-client with:
 clickhouse-client

两条命令搞定安装。再敲一条命令，就来到了传说中的 A-ha moment。

sudo clickhouse start

$ sudo clickhouse start
 chown -R clickhouse: '/var/run/clickhouse-server/'
Will run clickhouse su 'clickhouse' /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
Waiting for server to start
Waiting for server to start
Server started

数据库服务端启动了。

在 ClickHouse 里 CRUD

教程里提供的 Connect to ClickHouse 部分使用了自带的 HTML 操作界面，可用于 SQL 查询。但由于我使用的 Ubuntu 系统是不带 UI 的，所以只能跳到 The ClickHouse Client 部分使用 CLI 客户端。

感觉这里可以加个说明，告诉没有 UI 的用户直接去 CLI 客户端。体验更顺滑。

运行客户端

运行 CLI 客户端：

clickhouse-client

运行成功

$ clickhouse-client
ClickHouse client version 22.7.1.2100 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.7.1 revision 54456.

Warnings:
 * Linux is not using a fast TSC clock source. Performance can be degraded. Check /sys/devices/system/clocksource/clocksource0/current_clocksource
 * Linux threads max count is too low. Check /proc/sys/kernel/threads-max
 * Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.

localhost.localdomain :)

创建数据库

我使用的测试机性能不足，因此报了上述 warning。这里就从教程前面的建库开始：

CREATE DATABASE IF NOT EXISTS helloworld

localhost.localdomain :) CREATE DATABASE IF NOT EXISTS helloworld

CREATE DATABASE IF NOT EXISTS helloworld

Query id: 2807edaa-c084-4501-bb0f-a32701f359ff

Ok.

0 rows in set. Elapsed: 0.006 sec.

创建数据表

接下来建表。可以看到 ClickHouse 里的任何表都需要一个 ENGINE。教程里推荐使用 MergeTree。

CREATE TABLE helloworld.my_first_table
(
    user_id UInt32,
    message String,
    timestamp DateTime,
    metric Float32
)
ENGINE = MergeTree()
PRIMARY KEY (user_id, timestamp)

插入数据

在操作之前，文档专门通过 A Brief Intro to Primary Keys 讲解了 ClickHouse 的一个独特之处，即主键（Primary Key）不是独一无二的！这一点确实和大部分数据库不同。这个讲解看似突兀，实际必要。否则，用户看到下面插入的数据就会产生疑惑。

在产生疑惑之前打消疑惑是非常有效的做法。

插入数据：

INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
    (101, 'Hello, ClickHouse!',                                 now(),       -1.0    ),
    (102, 'Insert a lot of rows per batch',                     yesterday(), 1.41421 ),
    (102, 'Sort your data based on your commonly-used queries', today(),     2.718   ),
    (101, 'Granules are the smallest chunks of data read',      now() + 5,   3.14159 )

插入成功：

localhost.localdomain :) INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
                             (101, 'Hello, ClickHouse!',                                 now(),       -1.0    ),
                             (102, 'Insert a lot of rows per batch',                     yesterday(), 1.41421 ),
                             (102, 'Sort your data based on your commonly-used queries', today(),     2.718   ),
                             (101, 'Granules are the smallest chunks of data read',      now() + 5,   3.14159 )

INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) FORMAT Values

Query id: 4d376221-2786-4d44-9ef9-129a8f03ddcf

Ok.

4 rows in set. Elapsed: 0.009 sec.

检查插入的数据：

SELECT * FROM helloworld.my_first_table

结果如下：

localhost.localdomain :) SELECT * FROM helloworld.my_first_table

SELECT *
FROM helloworld.my_first_table

Query id: 867f2212-6d8b-43ec-b1ff-efce3c879f73

┌─user_id─┬─message────────────────────────────────────────────┬───────────timestamp─┬──metric─┐
│     101 │ Hello, ClickHouse!                                 │ 2022-07-16 00:19:30 │      -1 │
│     101 │ Granules are the smallest chunks of data read      │ 2022-07-16 00:19:35 │ 3.14159 │
│     102 │ Insert a lot of rows per batch                     │ 2022-07-15 00:00:00 │ 1.41421 │
│     102 │ Sort your data based on your commonly-used queries │ 2022-07-16 00:00:00 │   2.718 │
└─────────┴────────────────────────────────────────────────────┴─────────────────────┴─────────┘

4 rows in set. Elapsed: 0.002 sec.

从 CSV 文件插入数据

新建一个 CSV 文件 data.csv 并输入以下内容：

102,This is data in a file,2022-02-22 10:43:28,123.45
101,It is comma-separated,2022-02-23 00:00:00,456.78
103,Use FORMAT to specify the format,2022-02-21 10:43:30,678.90

将 CSV 的数据插入 helloworld 数据库的 my_first_table 表：

clickhouse-client --query='INSERT INTO helloworld.my_first_table FORMAT CSV' < data.csv

这里官方文档是通过 Web UI 来看数据的，我们通过客户端 CLI 操作一下：

SELECT *
                         FROM helloworld.my_first_table
                         ORDER BY timestamp ASC

结果如下，证明数据插入成功：

SELECT *
FROM helloworld.my_first_table
ORDER BY timestamp ASC

Query id: d7216864-2b85-4ad2-9073-6c0bef7ed0c6

┌─user_id─┬─message────────────────────────────────────────────┬───────────timestamp─┬──metric─┐
│     103 │ Use FORMAT to specify the format                   │ 2022-02-21 10:43:30 │   678.9 │
│     102 │ This is data in a file                             │ 2022-02-22 10:43:28 │  123.45 │
│     101 │ It is comma-separated                              │ 2022-02-23 00:00:00 │  456.78 │
│     102 │ Insert a lot of rows per batch                     │ 2022-07-15 00:00:00 │ 1.41421 │
│     102 │ Sort your data based on your commonly-used queries │ 2022-07-16 00:00:00 │   2.718 │
│     101 │ Hello, ClickHouse!                                 │ 2022-07-16 00:19:30 │      -1 │
│     101 │ Granules are the smallest chunks of data read      │ 2022-07-16 00:19:35 │ 3.14159 │
└─────────┴────────────────────────────────────────────────────┴─────────────────────┴─────────┘

操作体验

整体来说非常顺滑。ClickHouse 本身也做了非常大的努力来提升数据库的易用性和文档的可操作性。对于 ENGINE、Primary Key 等概念，尽量避免了过于复杂的解释，而专注于引导用户按照文档顺利完成任务。

客户端部分，建议使用一个统一的客户端，例如 CLI 走完全程。而不必分成 Web UI 和 CLI 两个部分。