ClickHouse (十五)解析json数据

3,591 阅读2分钟

背景

使用字符串行式保存在ClickHouse的json数据,需要我们解析提取相关字段,将json行转多列。如提取json数据的各个字段的值,以方便查询。

WITH 
    '[{"name":"天台","tall":100,"model":"M779011"},{"name":"楼顶","tall":90,"model":"M669011"}]' AS new, 
    'S123' AS num
SELECT 
    new, 
    num

┌─new────────────────────────────────────────────────────────────────────────────────────────┬─num──┐
│ [{"name":"天台","tall":100,"model":"M779011"},{"name":"楼顶","tall":90,"model":"M669011"}] │ S123 │
└────────────────────────────────────────────────────────────────────────────────────────────┴──────┘

准备 ClickHouse提供了JSON函数,方便我们操作json数据:

准备 ClickHouse提供了JSON函数,方便我们操作json数据: 

  • visitParamExtractBool(json,name) → 提取json中的name字段,返回UInt8,0或1。 
  • visitParamExtractInt(json,name) →提取json中的name字段,返回Int型的值。 
  • visitParamExtractFloat (json,name)→ 提取json中的name字段,返回Float型的值。 
  • visitParamExtractString (json,name)→提取json中的name字段,返回String型的值。 
  • visitParamExtractRaw (json,name)→ 提取json中的name字段,返回字段的值,包含空格符。 

示例:

SELECT 
    visitParamExtractBool('{"name":true}', 'name') AS bool, 
    visitParamExtractInt('{"name":123}', 'name') AS int, 
    visitParamExtractFloat('{"name":0.1}', 'name') AS float, 
    visitParamExtractString('{"name":"你好"}', 'name') AS str, 
    visitParamExtractRaw('{"name":"你好"}', 'name') AS raw

┌─bool─┬─int─┬─float─┬─str──┬─raw────┐
│    11230.1 │ 你好 │ "你好" │
└──────┴─────┴───────┴──────┴────────┘

测试,解析json数组

使用JSONExtractArrayRaw()函数,将字符串转化为json数组:

SELECT 
    visitParamExtractString(json, 'name') AS name, 
    visitParamExtractInt(json, 'tall') AS tall, 
    visitParamExtractString(json, 'model') AS model, 
    num
FROM 
(
    WITH 
        '[{"name":"天台","tall":100,"model":"M779011"},      {"name":"楼顶","tall":90,"model":"M669011"},      {"name":"秀儿","tall":80,"model":"M559011"}]' AS new, 
        'S123' AS num
    SELECT 
        new, 
        num, 
        JSONExtractArrayRaw(new) AS arr, 
        arrayJoin(arr) AS json
)

┌─name─┬─tall─┬─model───┬─num──┐
│ 天台 │  100 │ M779011 │ S123 │
│ 楼顶 │   90 │ M669011 │ S123 │
│ 秀儿 │   80 │ M559011 │ S123 │
└──────┴──────┴─────────┴──────┘

还可以使用字符截取:

WITH 
    '[{"name":"天台","tall":100,"model":"M779011"},         {"name":"楼顶","tall":90,"model":"M669011"},       {"name":"秀儿","tall":80,"model":"M559011"}]' AS new, 
    replaceAll(replaceAll(new, '[', ''), ']', '') AS out, 
    concat(arrayJoin(splitByString('},', out)), '}') AS json, 
    'S123' AS num
SELECT 
    visitParamExtractString(json, 'name') AS name, 
    visitParamExtractInt(json, 'tall') AS tall, 
    visitParamExtractString(json, 'model') AS model, 
    num

┌─name─┬─tall─┬─model───┬─num──┐
│ 天台 │  100M779011S123 │
│ 楼顶 │   90M669011S123 │
│ 秀儿 │   80M559011S123 │
└──────┴──────┴─────────┴──────┘