09 | 复合查询

132 阅读8分钟

开启掘金成长之旅!这是我参与「掘金日新计划 · 12 月更文挑战」的第15天,[点击查看活动详情]

1. ES的复合查询

什么是复合查询呢?复合查询其实就是将你的多个查询条件,用一定的的逻辑组合在一起,通俗来说就是将一些过滤条件组合在一起,进行查询。

复合查询主要有以下几个:

  • bool query:布尔查询,将多个叶或复合查询子句组合为 mustshouldmust_notfilter子句的默认查询

  • boosting query:提高高查询,返回与positive查询匹配的文档,但降低也与negative查询匹配的文档的分数。

  • constant_score:固定分数查询,包装另一个查询的查询,但在过滤器上下文中执行它。所有匹配的文档都被赋予相同的“常量” _score

  • dis_max:最佳匹配查询,接受多个查询并返回与任何查询子句匹配的任何文档的查询。虽然bool查询组合了所有匹配查询的分数,但dis_max查询使用单个最佳匹配查询子句的分数。

  • function_score:函数查询,使用函数修改主查询返回的分数,以考虑流行度、新近度、距离或使用脚本实现的自定义算法等因素。

接下来用两小节文章加上几个案例来实现以上几个查询

2. Bool Query布尔查询

2.1布尔查询的子查询

布尔查询是最常用的组合查询,根据自查询的规则,只有当文档满足所有自查询条件时,es才会返回结果。bool查询允许你在查询的时候自由组合,可以组合一些像必须匹配(must),应该(should)匹配或是必须不能匹配(must_not)。

  • must表示是必须匹配,只有当文档内容匹配上这些查询的结果才回被返回,相当于逻辑查询当中的and
  • should表示应该匹配,只有当文档内容满足should查询条件时,才返回,相当于逻辑查询当中的or

如果查询当中已经有了filter或者must,那么should只影响评分,查询当中没有匹配should当中的查询条件也会正确返回,若查询当中没有filter跟must,那么查询必须满足should条件当中的一项。

  • must_not表示必须不匹配,,只有文档内容没有匹配上这些查询的结果才回被返回
  • filter,和must—样 ,匹配filter选项下的查询条件的文档才会被返回;跟must的区别是:filter不评分(即:不影响score),只起到过滤功能。(相当于逻辑与)

2.2 DSL实现bool复合查询

  • must查询

需求时这样我们需要查询出商品的基础必须价格是800的商品信息

GET /goods_item_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "basePrice": {
              "value": "800.00"
            }
          }
        }
      ]
    }
  }
}

term是查询条件,表示匹配字段值

通过上述命令,查询出必须满足basePrice等于800.00的商品信息。

  • must_not

接下来通过must_not查询出,必须不匹配的信息

GET /goods_item_index/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "term": {
            "basePrice": {
              "value": "800.00"
            }
          }
        }
      ]
    }
  }
}
  • should查询

现在我们要查询出basePrice等于800.00,商品名称应该满足skuName等于"书"的商品信息

GET /goods_item_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "basePrice": {
              "value": "800.00"
            }
          }
        }
      ],
      "should": [
        {"term": {
          "skuName": {
            "value": "书"
          }
        }}
      ]
    }
  }
}

查询结果如下

"hits": [
      {
        "_index": "goods_item_index",
        "_id": "96788",
        "_score": 6.0290394,
        "_source": {
          "id": 96788,
          "category": "10",
          "basePrice": 800,
          "marketPrice": 650.09,
          "stockNum": 3481,
          "skuImgUrl": "https://lorempixel.com/1920/1200/nature/",
          "skuId": "74050301848",
          "skuName": "书架",
          "createTime": "1993-10-27 07:46:37",
          "updateTime": "1985-08-22 06:20:54"
        }
      },
      {
        "_index": "goods_item_index",
        "_id": "81113",
        "_score": 1,
        "_source": {
          "id": 81113,
          "category": "3",
          "basePrice": 800,
          "marketPrice": 295.82,
          "stockNum": 727,
          "skuImgUrl": "https://lorempixel.com/g/1680/1050/city/",
          "skuId": "35020730408",
          "skuName": "各种型号电池",
          "createTime": "1995-05-04 15:53:58",
          "updateTime": "1997-02-22 10:02:17"
        }
      }
      ...
      ...
      ...
  }

结果只展示了一部分,可以看到结果有"书"字的内容_score分数较高,其他的不含的都是1

  • filter查询不再演示,上一节已经有相应的案例

2.3 Java Client实现Bool查询

在service新建类复合查询


    /**
     * bool查询
     * @param basePrice 价格
     * @param skuName sku
     * @return
     */
    List<GoodsItemRep> boolQuery(String basePrice, String skuName);

实现方法

@Override
    public List<GoodsItemRep> boolQuery(String basePrice, String skuName) {
        List<GoodsItemRep> goodsItemReps = new ArrayList<>();

        //查询条件组合
        Query query = BoolQuery.of(b -> b
                .must(q -> q.term(t -> t.field("basePrice").value(basePrice)))//must
                .should(q -> q.term(t -> t.field("skuName").value(skuName)))        //should
        )._toQuery();

        List<GoodsItem> result = client.search(index, GoodsItem.class, query);

        result.stream().forEach(goodsItem -> {
            GoodsItemRep goodsItemRep = new GoodsItemRep();
            BeanUtils.copyProperties(goodsItem, goodsItemRep);
            goodsItemReps.add(goodsItemRep);
        });

        return goodsItemReps;
    }

在postman进行调用

curl --location --request POST 'http://localhost:8089/goods/boolQuery?basePrice=800&skuName=书'

查询结果

image-20221126171037556.png ## 3.boosting query提高查询

3.1什么是boosting 查询

在我们的实际查询当中,无论你用何种合数据库做存储,都会碰到一些数据库查询排序的场景,把一些我们最想看到的结果排到最前边,其他的内容放到最后边,这里我们就要引出 boosting query了,es这里用相关性来表示查询结果匹配的关联性,boosting query是把不想看到的文档进行减分,把排配程度较高的排在前边,匹配程度较低的排在后边。

3.2 DSL实现boosting查询

下边我们来进行下演示,现在index里边插入几条数据,用于测试

POST /goods_item_index/_bulk
{"index":{"_id":1000002}}
{"id":1000002,"category":"8","basePrice":778.35,"marketPrice":53.66,"stockNum":3303,"skuImgUrl":"tmall","skuId":"13242377281","skuName":"apple phone","createTime":"1994-03-31 07:03:44","updateTime":"1988-09-10 00:45:46"}
{"index":{"_id":1000003}}
{"id":1000003,"category":"8","basePrice":778.35,"marketPrice":53.66,"stockNum":3303,"skuImgUrl":"taobao","skuId":"13242377281","skuName":"apple phone","createTime":"1994-03-31 07:03:44","updateTime":"1988-09-10 00:45:46"}
{"index":{"_id":1000004}}
{"id":1000004,"category":"8","basePrice":778.35,"marketPrice":53.66,"stockNum":3303,"skuImgUrl":"jd","skuId":"13242377281","skuName":"apple phone","createTime":"1994-03-31 07:03:44","updateTime":"1988-09-10 00:45:46"}

我们先用DSL在kibana做一个简单的查询

GET /goods_item_index/_search
{
  "query": {
    "match": {
      "skuName": "apple"
    }
  }
}

查询结果如下,发现文档的得分差不多都一样大

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 27.226557,
    "hits": [
      {
        "_index": "goods_item_index",
        "_id": "1000002",
        "_score": 27.226557,
        "_source": {
          "id": 1000002,
          "category": "8",
          "basePrice": 778.35,
          "marketPrice": 53.66,
          "stockNum": 3303,
          "skuImgUrl": "tmall",
          "skuId": "13242377281",
          "skuName": "apple phone",
          "createTime": "1994-03-31 07:03:44",
          "updateTime": "1988-09-10 00:45:46"
        }
      },
      {
        "_index": "goods_item_index",
        "_id": "1000003",
        "_score": 27.099957,
        "_source": {
          "id": 1000003,
          "category": "8",
          "basePrice": 778.35,
          "marketPrice": 53.66,
          "stockNum": 3303,
          "skuImgUrl": "taobao",
          "skuId": "13242377281",
          "skuName": "apple phone",
          "createTime": "1994-03-31 07:03:44",
          "updateTime": "1988-09-10 00:45:46"
        }
      },
      {
        "_index": "goods_item_index",
        "_id": "1000004",
        "_score": 27.096817,
        "_source": {
          "id": 1000004,
          "category": "8",
          "basePrice": 778.35,
          "marketPrice": 53.66,
          "stockNum": 3303,
          "skuImgUrl": "jd",
          "skuId": "13242377281",
          "skuName": "apple phone",
          "createTime": "1994-03-31 07:03:44",
          "updateTime": "1988-09-10 00:45:46"
        }
      }
    ]
  }
}

然后我们想要查询skuName是apple phone,然后把skuImgUrl是tmall的放到后边,查询DSL如下,positive是想放到前边的,negative是想放到后边的,negative_boost就是想放到后边的一个权重。

GET /goods_item_index/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "skuName": "apple phone"
        }
      },
      "negative": {
        "match": {
          "skuImgUrl": "tmall"
        }
      },
      "negative_boost": 0.5
    }
  }
}

再次查询,结果如下,可以发现,最后一条的分数明显小了很多,将近一半。

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 27.677588,
    "hits": [
      {
        "_index": "goods_item_index",
        "_id": "1000004",
        "_score": 27.677588,
        "_source": {
          "id": 1000004,
          "category": "8",
          "basePrice": 778.35,
          "marketPrice": 53.66,
          "stockNum": 3303,
          "skuImgUrl": "jd",
          "skuId": "13242377281",
          "skuName": "apple phone",
          "createTime": "1994-03-31 07:03:44",
          "updateTime": "1988-09-10 00:45:46"
        }
      },
      {
        "_index": "goods_item_index",
        "_id": "1000003",
        "_score": 27.457375,
        "_source": {
          "id": 1000003,
          "category": "8",
          "basePrice": 778.35,
          "marketPrice": 53.66,
          "stockNum": 3303,
          "skuImgUrl": "taobao",
          "skuId": "13242377281",
          "skuName": "apple phone",
          "createTime": "1994-03-31 07:03:44",
          "updateTime": "1988-09-10 00:45:46"
        }
      },
      {
        "_index": "goods_item_index",
        "_id": "1000002",
        "_score": 13.582527,
        "_source": {
          "id": 1000002,
          "category": "8",
          "basePrice": 778.35,
          "marketPrice": 53.66,
          "stockNum": 3303,
          "skuImgUrl": "tmall",
          "skuId": "13242377281",
          "skuName": "apple phone",
          "createTime": "1994-03-31 07:03:44",
          "updateTime": "1988-09-10 00:45:46"
        }
      }
    ]
  }
}

3.3 Java Client实现

现在service新建boosting查询接口

    /**
     * bool查询
     * @param skuImgUrl 图片url
     * @param skuName sku
     * @return
     */
    List<GoodsItemRep> boostingQuery(String skuImgUrl, String skuName);

实现方法

@Override
    public List<GoodsItemRep> boostingQuery(String skuImgUrl, String skuName) {
        List<GoodsItemRep> goodsItemReps = new ArrayList<>();

        //查询条件组合
        Query query = BoostingQuery.of(b->b
                .positive(q->q.match(m->m.field("skuName").query(skuName)))
                .negative(q->q.match(m->m.field("skuImgUrl").query(skuImgUrl)))
                .negativeBoost(0.5))._toQuery();

        List<GoodsItem> result = client.search(index, GoodsItem.class, query);

        result.stream().forEach(goodsItem -> {
            GoodsItemRep goodsItemRep = new GoodsItemRep();
            BeanUtils.copyProperties(goodsItem, goodsItemRep);
            goodsItemReps.add(goodsItemRep);
        });

        return goodsItemReps;
    }

然后通过postman等工具,对接口进行调用

curl --location --request POST 'http://localhost:8089/goods/boostingQuery?skuImgUrl=tmall&skuName=apple phone'

调用结果

image-20221126201327229.png 由于篇幅限制,剩下三个复合查询下一节演示