Elastic Search中must_not在嵌套查询(nested query)中不生效

458 阅读3分钟

预备知识

嵌套字段出现的背景

普通的字段在es中以扁平化存储,举例,存储下述数据,comments字段会自动映射成object类型。

PUT /my_index/blogpost/1
{
  "title": "Nest eggs",
  "body":    "Making your money work...",
  "tags":    [ "cash", "shares" ],
  "comments": [ 
    {
      "name":        "John Smith",
      "comment": "Great article",
      "age":          28,
      "stars":      4,
      "date":        "2014-09-01"
    },
    {
      "name":        "Alice White",
      "comment": "More like this please",
      "age":          31,
      "stars":      5,
      "date":        "2014-10-22"
    }
  ]
}

内部存储结构为扁平的JOSN键值对。

{
    "title":            [ eggs, nest ],
    "body":             [ making, money, work, your ],
    "tags":             [ cash, shares ],
    "comments.name":    [ alice, john, smith, white ],
    "comments.comment": [ article, great, like, more, please, this ],
    "comments.age":     [ 28, 31 ],
    "comments.stars":   [ 4, 5 ],
    "comments.date":    [ 2014-09-01, 2014-10-22 ]
}

这时候如果我们去查询name为Alice且age为28的数据,上面的文档也会被命中返回。因为所有的数据扁平化处理后,丢失了字段(name、age、stars和date)间的相关性。嵌套对象就是来解决这个问题

GET /_search
{
    "query": {
        "bool": {
            "must": [
                { "match": { "name": "Alice" }},
                { "match": { "age":    28            }} //Alice实际是31岁,不是28!
            ]
        }
    }
}

嵌套字段

通过将字段type设置为nested可以设置字段为嵌套字段。如下所述

PUT /my_index
{
    "mappings": {
        "blogpost": {
            "properties": {
                "comments": {
                    "type": "nested”,
                    "properties": {
                        "name":        { "type": "string"    },
                        "comment": { "type": "string"    },
                        "age":          { "type": "short"      },
                        "stars":      { "type": "short"      },
                        "date":        { "type": "date"        }
                    }
                }
            }
        }
    }

这时每一个comments对象都会被映射为一个隐藏的独立文档,对象的各个字段之间的相关性得以保留。如下所示:

{//第一个 嵌套文档
    "comments.name":        [ john, smith ],
    "comments.comment": [ article, great ],
    "comments.age":          [ 28 ],
    "comments.stars":      [ 4 ],
    "comments.date":        [ 2014-09-01 ]
}
{//第二个 嵌套文档
    "comments.name":        [ alice, white ],
    "comments.comment": [ like, more, please, this ],
    "comments.age":          [ 31 ],
    "comments.stars":      [ 5 ],
    "comments.date":        [ 2014-10-22 ]
}
{//根文档 或者也可称为父文档
    "title":                        [ eggs, nest ],
    "body":                          [ making, money, work, your ],
    "tags":                          [ cash, shares ]
}

嵌套查询(nested query)

因为嵌套字段是隐藏存储的一个单独的文档,所以普通查询无法获取。需要通过嵌套查询。如下所述:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title":“eggs”
          }
        },
        {
          "nested": {
            "path": "comments",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "comments.name": "john"
                    }
                  },
                  {
                    "match": {
                      "comments.age": 28
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }}}

问题描述

在嵌套查询中使用must_not条件查询,得到预期之外的结果。有如下数据

PUT my-index
{
  "mappings": {
    "properties": {
      "comments": {
        "type": "nested"
      }
    }
  }
}

PUT my-index/_doc/1?refresh
{
  "comments": [
    {
      "author": "kimchy"
    }
  ]
}

PUT my-index/_doc/2?refresh
{
  "comments": [
    {
      "author": "kimchy"
    },
    {
      "author": "nik9000"
    }
  ]
}

PUT my-index/_doc/3?refresh
{
  "comments": [
    {
      "author": "nik9000"
    }
  ]
}

查询语句如下

POST my-index/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must_not": [
            {
              "term": {
                "comments.author": "nik9000"
              }
            }
          ]
        }
      }
    }
  }
}

查询命中2条数据,预期结果是1条数据。

{
  ...
  "hits" : {
    ...
    "hits" : [
      {
        "_index" : "my-index",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "comments" : [
            {
              "author" : "kimchy"
            }
          ]
        }
      },
      {
        "_index" : "my-index",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "comments" : [
            {
              "author" : "kimchy"              
            },
            {
              "author" : "nik9000"             
            }
          ]
        }
      }
    ]
  }
}

问题原因与分析

只要嵌套字段包含的多个隐藏独立文档中的有一个或者多个满足查询条件,则会将此文档返回。尤其需要注意的是在使用must_not条件查询的时候,嵌套字段中只要有一个对象不符合条件,都会被视为命中并返回。

解决方案

方案1:将must_not条件移到嵌套查询外侧

嵌套查询中must条件时符合预期的,再在嵌套查询外侧加上must_not条件则可以得到预期结果。

POST my-index/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "nested": {
            "path": "comments",
            "query": {
              "term": {
                "comments.author": "nik9000"
              }
            }
          }
        }
      ]
    }
  }
}

查询结果,命中1条数据

{
  ...
  "hits" : {
    ...
    "hits" : [
      {
        "_index" : "my-index",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "comments" : [
            {
              "author" : "kimchy"
            }
          ]
        }
      }
    ]
  }
}

方案2:使用script脚本

GET my-index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "function_score": {
            "query": {
              "bool": {
                "adjust_pure_negative": true,
                "boost": 1
              }
            },
            "functions": [
              {
                "filter": {
                  "match_all": {
                    "boost": 1
                  }
                },
                "script_score": {
                  "script": {
                    "source": """
def comments(def doc, def params){def comments = params["_source"]["comments"];            if(comments == null){                return null;            }            for(def comment: comments){             if(params.target.contains(comment['author'])){                return params.target;              }              }          return null;}
 return comments(doc, params) == null? 1:0;
""",
                    "lang": "painless",
                    "params": {
                      "target": [
                        "nik9000"
                      ]
                    }
                  }
                }
              }
            ],
            "score_mode": "multiply",
            "max_boost": 3.4028235e+38,
            "min_score": 1,
            "boost": 1
          }
        }
      ]
    }
  }
}

方案3:设置include_in_parent属性为true

include_in_parent( 默认为false 如果为true,则嵌套对象中的所有字段也会作为标准(扁平flat)字段添加到父文档(上一级文档)中。

{
  "mappings": {
    "properties": {
      "comments": {
        "type": "nested""include_in_root": true
      }
    }
  }
}

查询语句

GET my-index/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "term": {
            "comments.author": "nik9000"
          }
        }
      ]
    }
  }
}