Clickhouse通过jdbc连接获取Mysql/Oracle数据
引言
Clickhouse连接获取Mysql/Oracle数据的方式主要为jdbc和odbc两种。
- jdbc:
通过jdbc的方式来连接到外部数据库,为实现这连接,需要借助clickhouse-jdbc-bridge这一组件
clickhouse-jdbc-bridge
clickhouse-jdbc-bridge主要充当无状态代理,主要作用为将查询从clickhouse传递到外部数据。通过这一扩展,可在clickhouse上实时跨多个数据源运行分布式查询,从而简化为数据仓储、监控和完整性检查等构件数据管道的过程。
jdbc使用步骤
1. 下载clickhouse-jdbc-bridge插件
# 在线访问 https://github.com/ClickHouse/clickhouse-jdbc-bridge/releases/download/v2.0.2/clickhouse-jdbc-bridge-2.0.2-1.noarch.rpm
# rz上传至虚拟机并安装
yum localinstall -y clickhouse-jdbc-bridge-2.0.2-1.noarch.rpm
2.导入jdbc mysql/oracle依赖
cd /etc/clickhouse-jdbc-bridge
mkdir drivers
cd drivers
# rz上传mysql/oracle jar包
mysql-connector-java-5.1.34.jar oracle-jdbc-14.jar
3. 配置json文件(也可不配置json)
# 进入目标文档
cd /etc/clickhouse-jdbc-bridge/config/datasources
vi mysql.json
说明:
mysql-server为这个bridge的别名;
driverUrl为存放mysql连接jar的地址
driverClassName、jdbcUrl、username、password自行修改
{
"$schema": "../datasource.jschema",
"mysql-server": {
"aliases": [
"self"
],
"driverUrls": [
"/etc/clickhouse-jdbc-bridge/drivers/mysql-connector-java-5.1.34.jar"
],
"driverClassName": "com.mysql.jdbc.Driver",
"jdbcUrl": "jdbc:mysql://0.0.0.0.:3306/test?compress=false",
"username": "root",
"password": "root",
"maximumPoolSize": 5
}
}
oracle.json如下
{
"$schema": "../datasource.jschema",
"oracle-server": {
"aliases": [
"oracle"
],
"driverUrls": [
"/etc/clickhouse-jdbc-bridge/drivers/ojdbc6-11.2.0.3.jar"
],
"driverClassName": "oracle.jdbc.driver.OracleDriver",
"jdbcUrl": "jdbc:oracle:thin:@0.0.0.0:1521:test",
"username": "root",
"password": "root",
"maximumPoolSize": 5
}
}
4. 启动clickhouse-jdbc-bridge
# 另起一虚拟机客户端
clickhouse-jdbc-bridge
5. 操作实例
mysql源数据
-- 创建mysql测试表
CREATE TABLE `tablea` (
`id` int(11) NOT NULL,
`age` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- 插入数据
INSERT INTO `tablea` values(1,18,a),(2,18,b),(3,18,c),(4,18,d);
clickhouse目标数据
/* 说明
clickhouse建表时的列 和 mysql表中的列 关系为 真包含 关系;即clickhouse中的列必须为mysql中已存在的列,然则会报错(如Test3)。
且得将mysql中的列类型适配为clickhouse的列类型,然则也会出错(如Test4)
JDBC引擎参数为(bridge别名,测试库名,测试表名)
*/
-- Test1 读取mysql表中id,age,name列
-- 创建clickhouse测试表
CREATE TABLE jdbc_table1
(
`id` Int32,
`age` Int32,
`name` String
)
ENGINE = JDBC('mysql-server', 'test', 'tablea')
-- 查看jdbc_table1表数据
SELECT *
FROM jdbc_table1
┌─id─┬─age─┬─name─┐
│ 1 │ 18 │ a │
│ 2 │ 18 │ b │
│ 3 │ 18 │ c │
│ 4 │ 18 │ d │
└────┴─────┴──────┘
4 rows in set. Elapsed: 0.002 sec.
-- 数据读取成功
-- Test2 读取mysql表中id,age列
-- 创建clickhouse测试表
CREATE TABLE jdbc_table2
(
`id` Int32,
`age` Int32
)
ENGINE = JDBC('mysql-server', 'test', 'tablea')
-- 查看jdbc_table2表数据
SELECT *
FROM jdbc_table2
┌─id─┬─age─┐
│ 1 │ 18 │
│ 2 │ 18 │
│ 3 │ 18 │
│ 4 │ 18 │
└────┴─────┘
4 rows in set. Elapsed: 0.002 sec.
-- 数据读取成功
-- Test3 读取mysql表中id,age,name,gender列
-- 创建clickhouse测试表
CREATE TABLE jdbc_table9
(
`id` Int32,
`age` Int32,
`name` String,
`gender` String
)
ENGINE = JDBC('mysql-server', 'test', 'tablea')
-- 查看jdbc_table3表数据
SELECT *
FROM jdbc_table9
0 rows in set. Elapsed: 0.023 sec.
Received exception from server (version 21.2.4):
Code: 86. DB::Exception: Received from localhost:19000. DB::Exception: Received error from remote server /?connection_string=mysql-server&columns=columns%20format%20version%3A%201%0A4%20columns%3A%0A%60id%60%20Int32%0A%60age%60%20Int32%0A%60name%60%20String%0A%60gender%60%20String%0A&max_block_size=65505. HTTP status code: 500 Internal Server Error, body: Failed to access [mysql-server] due to: SQLState(42S22) VendorCode(1054) Unknown column 'gender' in 'field list'.
-- 数据读取失败,显示mysql中并没有gender列
-- Test4 读取mysql表中id,age,name列,列适配错误
-- 创建clickhouse测试表
CREATE TABLE jdbc_table4
(
`id` Int32,
`age` Int32,
`name` Int32
)
ENGINE = JDBC('mysql-server', 'test', 'tablea')
-- 查看jdbc_table4表数据
SELECT *
FROM jdbc_table10
0 rows in set. Elapsed: 0.019 sec.
Received exception from server (version 21.2.4):
Code: 86. DB::Exception: Received from localhost:19000. DB::Exception: Received error from remote server /?connection_string=mysql-server&columns=columns%20format%20version%3A%201%0A3%20columns%3A%0A%60id%60%20Int32%0A%60age%60%20Int32%0A%60name%60%20Int32%0A&max_block_size=65505. HTTP status code: 500 Internal Server Error, body: Failed to query against [mysql-server] due to: SQLState(S1009) VendorCode(0) Invalid value for getInt() - 'a'.
-- 数据读取失败,显示mysql中name列的值不是int
连接读取oracle中数据的区别除了json文件不一样,其余步骤和上述步骤一致
不配置json文件
由于配置json文件太过繁琐,也可不配置json文件,在jdbc引擎里直接指定数据库url driverUrls为jdbc jar包存放位置
cd /etc/clickhouse-jdbc-bridge/config/datasources
# 编辑script.json
# 将文件内容修改为如下内容
{
"script": {
"type": "script",
"driverUrls": [
"/etc/clickhouse-jdbc-bridge/drivers"
]
}
}
# 接着启动clickhouse-jdbc-bridge
clickhouse-jdbc-bridge
# 配置json文件时,JDBC引擎的参数为(bridge别名,测试库名,测试表名)
# 不配置时,JDBC引擎的参数为(url,测试库名,测试表名)
#测试
SELECT *
FROM jdbc('jdbc:mysql://0.0.0.0:3306/?user=root&password=root', 'test', 'tablea')
LIMIT 3
Query id: e206ae6e-92be-4246-b6b9-88eb6223a382
┌─id─┬─age─┬─name─┐
│ 1 │ 18 │ a │
│ 2 │ 18 │ b │
│ 3 │ 18 │ c │
└────┴─────┴──────┘
# 结果与配置json文件的一致
两种方法都可行,可根据实际情况选用
总结
使用clickhouse-jdbc-bridge连接mysql/oracle的基本步骤为
- 启动bridge
- 编写连接数据库的json文件
- 建立适配数据库列类型的表
- 数据获取完成