PostgreSQL jsonpath使用实践

620 阅读3分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路


jsonpath是用来解析json数据的工具,类似于xpath,jsonpath可以解析十分复杂的json数据。

PostgreSQL json发展历史:

PostgreSQL从9.2开始就支持json数据类型,但是由于解析json数据的性能很差,导致并不受大家青睐,而是选择使用nosql数据库代替。于是从pg9.4开始支持了jsonb数据类型,相较于json类型,jsonb由于并不需要每次使用时都去进行解析,因此性能提升很多,都是还支持索引查询等。

而从pg12开始对于json的支持更加强大:sql 2016的sql/json标准有15条, PG 12 实现了14条, 远远超过oracle(18c 11/15), mysql(8.0.4 5/15), sqlserver(2017 2/15)最新版本。

同时在pg12中引入了jsonpath类型,以及一系列相关的函数,使得json数据的查询性能更进一步,功能也愈发强大。

JSONPATH语法:

JSONpath 函数表达式语法如下:

  • 点号 . 表示引用 Json 数据的元素
  • 方括号 [] 表示引用数组元素
  • Json 数据中的数组元素下标从0开始

JSONpath中的变量如下:

  1. $ 符号表示要查询的Json文本的变量
  2. $varname 表示指定变量
  3. @ 指在 filter 表达式中表示当前路径元素的变量

JSONPATH使用举例:

简单查询:

bill@bill=>SELECT jsonb_path_query_array('[1,2,3,4,5]', '$[*] ? (@ > 3)');
 jsonb_path_query_array
------------------------
 [4, 5]
(1 row)

创建测试表:

CREATE TABLE house(js jsonb);
INSERT INTO house VALUES
('{
	"address": {
		   "city":"Moscow",
		   "street": "Ulyanova, 7A"
	},
	"lift": false,
	"floor": [
		 {
			"level": 1,
		 	"apt": [
		       	       {"no": 1, "area": 40, "rooms": 1},
		       	       {"no": 2, "area": 80, "rooms": 3},
		       	       {"no": 3, "area": 50, "rooms": 2}
			]
		},
		{
			"level": 2,
			"apt": [
		       	       {"no": 4, "area": 100, "rooms": 3},
		       	       {"no": 5, "area": 60, "rooms": 2}
			]
		}
	]
}');

查询:

bill@bill=>select jsonb_pretty(js) from house ;
           jsonb_pretty
----------------------------------
 {                               +
     "lift": false,              +
     "floor": [                  +
         {                       +
             "apt": [            +
                 {               +
                     "no": 1,    +
                     "area": 40, +
                     "rooms": 1  +
                 },              +
                 {               +
                     "no": 2,    +
                     "area": 80, +
                     "rooms": 3  +
                 },              +
                 {               +
                     "no": 3,    +
                     "area": 50, +
                     "rooms": 2  +
                 }               +
             ],                  +
             "level": 1          +
         },                      +
         {                       +
             "apt": [            +
                 {               +
                     "no": 4,    +
                     "area": 100,+
                     "rooms": 3  +
                 },              +
                 {               +
                     "no": 5,    +
                     "area": 60, +
                     "rooms": 2  +
                 }               +
             ],                  +
             "level": 2          +
         }                       +
     ],                          +
     "address": {                +
         "city": "Moscow",       +
         "street": "Ulyanova, 7A"+
     }                           +
 }
(1 row)

该数据的层次结构如下图:
在这里插入图片描述
看上去该数据层次挺复杂的,但是实际中可能数据层次远比这个复杂的多,那么我们看看如何使用jsonpath来进行查询的:

bill@bill=>SELECT jsonb_path_query_array(js, '$.floor[0, 1].apt[1 to last]')  FROM house;
                                          jsonb_path_query_array
-----------------------------------------------------------------------------------------------------------
 [{"no": 2, "area": 80, "rooms": 3}, {"no": 3, "area": 50, "rooms": 2}, {"no": 5, "area": 60, "rooms": 2}]
(1 row)

而如果不用jsonpath的话,我们可能需要这么写:

bill@bill=>SELECT jsonb_agg(apt) FROM (
bill(#         SELECT apt->generate_series(1, jsonb_array_length(apt) - 1) FROM (
bill(#              SELECT js->'floor'->unnest(array[0, 1])->'apt' FROM house
bill(#         ) apts(apt)
bill(# ) apts(apt);
                                                 jsonb_agg
-----------------------------------------------------------------------------------------------------------
 [{"no": 2, "area": 80, "rooms": 3}, {"no": 3, "area": 50, "rooms": 2}, {"no": 5, "area": 60, "rooms": 2}]
(1 row)

相比之下,使用jsonpath相关的函数查询简便太多了。

又比如,我们需要判断json数据中是否包含某个值,可以这样:

bill@bill=>SELECT jsonb_path_exists(js, '$.** ? (@ == "Moscow")') FROM house;
 jsonb_path_exists
-------------------
 t
(1 row)

而如果不使用jsonpath呢?

bill@bill=>WITH RECURSIVE t(value) AS (
bill(#       SELECT * FROM house UNION ALL (
bill(#             SELECT COALESCE(kv.value, e.value) AS value
bill(#             FROM t
bill(#             LEFT JOIN LATERAL jsonb_each (
bill(#                 CASE WHEN jsonb_typeof(t.value) = 'object' THEN t.value
bill(#                 ELSE NULL END
bill(#             ) kv ON true
bill(#             LEFT JOIN LATERAL jsonb_array_elements (
bill(#                 CASE WHEN jsonb_typeof(t.value) = 'array' THEN t.value
bill(#                 ELSE NULL END
bill(#             ) e ON true
bill(#             WHERE kv.value IS NOT NULL OR e.value IS NOT NULL
bill(#       )
bill(# ) SELECT EXISTS (SELECT 1 FROM t WHERE value = '"Moscow"');
 exists
--------
 t
(1 row)

那么如何使用jsonpath进行数据过滤呢?已上面这张表为例,我们查询apt.no大于3的数据:

bill@bill=>SELECT jsonb_path_query(js, '$.floor.apt.no ? (@>3)') FROM house;
 jsonb_path_query
------------------
 4
 5
(2 rows)

同样,jsonpath也支持索引的使用:

bill@bill=>CREATE INDEX ON house USING gin (js);
CREATE INDEX
bill@bill=>SET ENABLE_SEQSCAN TO OFF;
SET
bill@bill=>EXPLAIN (COSTS OFF) SELECT * FROM house
bill-#       WHERE js @? '$.floor[*].apt[*] ? (@.rooms == 3)';
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Bitmap Heap Scan on house
   Recheck Cond: (js @? '$."floor"[*]."apt"[*]?(@."rooms" == 3)'::jsonpath)
   ->  Bitmap Index Scan on house_js_idx
         Index Cond: (js @? '$."floor"[*]."apt"[*]?(@."rooms" == 3)'::jsonpath)
(4 rows)

除此之外,jsonpath支持了20多种相关的函数,是不是十分强大,赶快用起来吧!

postgres=# \df *.*json*path*
                                                                        List of functions
   Schema   |             Name             | Result data type |                                    Argument data types
 | Type
------------+------------------------------+------------------+------------------------------------------------------------------------------------------
-+------
 pg_catalog | gin_consistent_jsonb_path    | boolean          | internal, smallint, jsonb, integer, internal, internal, internal, internal
 | func
 pg_catalog | gin_extract_jsonb_path       | internal         | jsonb, internal, internal
 | func
 pg_catalog | gin_extract_jsonb_query_path | internal         | jsonb, internal, smallint, internal, internal, internal, internal
 | func
 pg_catalog | gin_triconsistent_jsonb_path | "char"           | internal, smallint, jsonb, integer, internal, internal, internal
 | func
 pg_catalog | json_extract_path            | json             | from_json json, VARIADIC path_elems text[]
 | func
 pg_catalog | json_extract_path_text       | text             | from_json json, VARIADIC path_elems text[]
 | func
 pg_catalog | jsonb_delete_path            | jsonb            | jsonb, text[]
 | func
 pg_catalog | jsonb_extract_path           | jsonb            | from_json jsonb, VARIADIC path_elems text[]
 | func
 pg_catalog | jsonb_extract_path_text      | text             | from_json jsonb, VARIADIC path_elems text[]
 | func
 pg_catalog | jsonb_path_exists            | boolean          | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonb_path_exists_opr        | boolean          | jsonb, jsonpath
 | func
 pg_catalog | jsonb_path_exists_tz         | boolean          | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonb_path_match             | boolean          | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonb_path_match_opr         | boolean          | jsonb, jsonpath
 | func
 pg_catalog | jsonb_path_match_tz          | boolean          | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonb_path_query             | SETOF jsonb      | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonb_path_query_array       | jsonb            | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonb_path_query_array_tz    | jsonb            | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonb_path_query_first       | jsonb            | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonb_path_query_first_tz    | jsonb            | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonb_path_query_tz          | SETOF jsonb      | target jsonb, path jsonpath, vars jsonb DEFAULT '{}'::jsonb, silent boolean DEFAULT false
 | func
 pg_catalog | jsonpath_in                  | jsonpath         | cstring
 | func
 pg_catalog | jsonpath_out                 | cstring          | jsonpath
 | func
 pg_catalog | jsonpath_recv                | jsonpath         | internal
 | func
 pg_catalog | jsonpath_send                | bytea            | jsonpath
 | func
(25 rows)

参考链接:
github.com/digoal/blog…
www.postgres.cn/docs/12/fun…
www.postgres.cn/docs/12/dat…