Clickhouse通过jdbc连接获取Mysql/Oracle数据

208 阅读4分钟

Clickhouse通过jdbc连接获取Mysql/Oracle数据

引言

Clickhouse连接获取Mysql/Oracle数据的方式主要为jdbc和odbc两种。

  • jdbc:

​ 通过jdbc的方式来连接到外部数据库,为实现这连接,需要借助clickhouse-jdbc-bridge这一组件

clickhouse-jdbc-bridge

​ clickhouse-jdbc-bridge主要充当无状态代理,主要作用为将查询从clickhouse传递到外部数据。通过这一扩展,可在clickhouse上实时跨多个数据源运行分布式查询,从而简化为数据仓储、监控和完整性检查等构件数据管道的过程。


jdbc使用步骤

1. 下载clickhouse-jdbc-bridge插件

# 在线访问 https://github.com/ClickHouse/clickhouse-jdbc-bridge/releases/download/v2.0.2/clickhouse-jdbc-bridge-2.0.2-1.noarch.rpm

# rz上传至虚拟机并安装
yum localinstall -y clickhouse-jdbc-bridge-2.0.2-1.noarch.rpm

2.导入jdbc mysql/oracle依赖

cd /etc/clickhouse-jdbc-bridge
mkdir drivers
cd drivers

# rz上传mysql/oracle jar包
mysql-connector-java-5.1.34.jar  oracle-jdbc-14.jar

3. 配置json文件(也可不配置json)

# 进入目标文档
cd /etc/clickhouse-jdbc-bridge/config/datasources

vi mysql.json

说明:

mysql-server为这个bridge的别名;

driverUrl为存放mysql连接jar的地址

driverClassName、jdbcUrl、username、password自行修改

{
  "$schema": "../datasource.jschema",
  "mysql-server": {
    "aliases": [
      "self"
    ],
    "driverUrls": [
      "/etc/clickhouse-jdbc-bridge/drivers/mysql-connector-java-5.1.34.jar"
    ],
    "driverClassName": "com.mysql.jdbc.Driver",
    "jdbcUrl": "jdbc:mysql://0.0.0.0.:3306/test?compress=false",
    "username": "root",
    "password": "root",
    "maximumPoolSize": 5
  }
}

oracle.json如下

{
  "$schema": "../datasource.jschema",
  "oracle-server": {
    "aliases": [
      "oracle"
    ],
    "driverUrls": [
      "/etc/clickhouse-jdbc-bridge/drivers/ojdbc6-11.2.0.3.jar"
    ],
    "driverClassName": "oracle.jdbc.driver.OracleDriver",
    "jdbcUrl": "jdbc:oracle:thin:@0.0.0.0:1521:test",
    "username": "root",
    "password": "root",
    "maximumPoolSize": 5
  }
}

4. 启动clickhouse-jdbc-bridge

# 另起一虚拟机客户端
clickhouse-jdbc-bridge

5. 操作实例

mysql源数据

-- 创建mysql测试表
CREATE TABLE `tablea` (
  `id` int(11) NOT NULL,
  `age` int(11) DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- 插入数据
INSERT INTO `tablea` values(1,18,a),(2,18,b),(3,18,c),(4,18,d);

clickhouse目标数据

/* 说明
clickhouse建表时的列 和 mysql表中的列 关系为 真包含 关系;即clickhouse中的列必须为mysql中已存在的列,然则会报错(如Test3)。
且得将mysql中的列类型适配为clickhouse的列类型,然则也会出错(如Test4)
JDBC引擎参数为(bridge别名,测试库名,测试表名)
*/

-- Test1 读取mysql表中id,age,name列
-- 创建clickhouse测试表
CREATE TABLE jdbc_table1
(
    `id` Int32,
    `age` Int32,
    `name` String
)
ENGINE = JDBC('mysql-server', 'test', 'tablea') 

-- 查看jdbc_table1表数据
SELECT *
FROM jdbc_table1

┌─id─┬─age─┬─name─┐
│  118 │ a    │
│  218 │ b    │
│  318 │ c    │
│  418 │ d    │
└────┴─────┴──────┘

4 rows in set. Elapsed: 0.002 sec. 
-- 数据读取成功


-- Test2 读取mysql表中id,age列
-- 创建clickhouse测试表
CREATE TABLE jdbc_table2
(
    `id` Int32,
    `age` Int32
)
ENGINE = JDBC('mysql-server', 'test', 'tablea') 

-- 查看jdbc_table2表数据
SELECT *
FROM jdbc_table2

┌─id─┬─age─┐
│  118 │
│  218 │
│  318 │
│  418 │
└────┴─────┘

4 rows in set. Elapsed: 0.002 sec. 
-- 数据读取成功


-- Test3 读取mysql表中id,age,name,gender列
-- 创建clickhouse测试表
CREATE TABLE jdbc_table9
(
    `id` Int32,
    `age` Int32,
    `name` String,
    `gender` String
)
ENGINE = JDBC('mysql-server', 'test', 'tablea') 

-- 查看jdbc_table3表数据
SELECT *
FROM jdbc_table9

0 rows in set. Elapsed: 0.023 sec. 

Received exception from server (version 21.2.4):
Code: 86. DB::Exception: Received from localhost:19000. DB::Exception: Received error from remote server /?connection_string=mysql-server&columns=columns%20format%20version%3A%201%0A4%20columns%3A%0A%60id%60%20Int32%0A%60age%60%20Int32%0A%60name%60%20String%0A%60gender%60%20String%0A&max_block_size=65505. HTTP status code: 500 Internal Server Error, body: Failed to access [mysql-server] due to: SQLState(42S22) VendorCode(1054) Unknown column 'gender' in 'field list'. 
-- 数据读取失败,显示mysql中并没有gender列


-- Test4 读取mysql表中id,age,name列,列适配错误
-- 创建clickhouse测试表
CREATE TABLE jdbc_table4
(
    `id` Int32,
    `age` Int32,
    `name` Int32
)
ENGINE = JDBC('mysql-server', 'test', 'tablea') 

-- 查看jdbc_table4表数据
SELECT *
FROM jdbc_table10

0 rows in set. Elapsed: 0.019 sec. 

Received exception from server (version 21.2.4):
Code: 86. DB::Exception: Received from localhost:19000. DB::Exception: Received error from remote server /?connection_string=mysql-server&columns=columns%20format%20version%3A%201%0A3%20columns%3A%0A%60id%60%20Int32%0A%60age%60%20Int32%0A%60name%60%20Int32%0A&max_block_size=65505. HTTP status code: 500 Internal Server Error, body: Failed to query against [mysql-server] due to: SQLState(S1009) VendorCode(0) Invalid value for getInt() - 'a'. 
-- 数据读取失败,显示mysql中name列的值不是int

连接读取oracle中数据的区别除了json文件不一样,其余步骤和上述步骤一致

不配置json文件

由于配置json文件太过繁琐,也可不配置json文件,在jdbc引擎里直接指定数据库url driverUrls为jdbc jar包存放位置

cd /etc/clickhouse-jdbc-bridge/config/datasources
# 编辑script.json
# 将文件内容修改为如下内容
{
        "script": {
                "type": "script",
                "driverUrls": [
                        "/etc/clickhouse-jdbc-bridge/drivers"
                ]
        }
}

# 接着启动clickhouse-jdbc-bridge
clickhouse-jdbc-bridge

# 配置json文件时,JDBC引擎的参数为(bridge别名,测试库名,测试表名)
# 不配置时,JDBC引擎的参数为(url,测试库名,测试表名)
#测试
SELECT *
FROM jdbc('jdbc:mysql://0.0.0.0:3306/?user=root&password=root', 'test', 'tablea')
LIMIT 3

Query id: e206ae6e-92be-4246-b6b9-88eb6223a382

┌─id─┬─age─┬─name─┐
│  1 │  18 │ a    │
│  2 │  18 │ b    │
│  3 │  18 │ c    │
└────┴─────┴──────┘

# 结果与配置json文件的一致

两种方法都可行,可根据实际情况选用


总结

使用clickhouse-jdbc-bridge连接mysql/oracle的基本步骤为

  1. 启动bridge
  2. 编写连接数据库的json文件
  3. 建立适配数据库列类型的表
  4. 数据获取完成