大数据利器Elasticsearch之全文本查询之intervals查询

601 阅读2分钟

这是我参与8月更文挑战的第8天,活动详情查看:8月更文挑战

本Elasticsearch相关文章的版本为:7.4.2

全文本查询使您能够搜索已分词的文本字段例如电子邮件正文。使用在建立倒排索引期间应用于字段的相同分析器处理查询字符串。

intervals 查询:允许对匹配项的顺序和接近度进行细粒度控制的全文本查询。

创建例子

POST /intervals_test/_doc/1
{
  "my_text": "my favorite food is cold porridge"
}

POST /intervals_test/_doc/2
{
  "my_text": "when it's cold my favorite food is porridge"
}

当各个匹配项有顺序要求时:
请求体:

GET _search
{
  "query": {
    "match_all": {}
  }
}

GET /_cat/indices


POST /intervals_test/_doc/1
{
  "my_text": "my favorite food is cold porridge"
}

POST /intervals_test/_doc/2
{
  "my_text": "when it's cold my favorite food is porridge"
}


POST /intervals_test/_search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "all_of" : {
          "ordered" : true,
          "intervals" : [
            {
              "match" : {
                "query" : "my favorite food",
                "max_gaps" : 0,
                "ordered" : true
              }
            },
            {
              "any_of" : {
                "intervals" : [
                  { "match" : { "query" : "hot water" } },
                  { "match" : { "query" : "cold porridge" } }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

响应体:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.3333333,
    "hits" : [
      {
        "_index" : "intervals_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.3333333,
        "_source" : {
          "my_text" : "my favorite food is cold porridge"
        }
      }
    ]
  }
}

当各个匹配项没有顺序要求时:
请求体:

POST /intervals_test/_search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "all_of" : {
          "ordered" : false,
          "intervals" : [
            {
              "match" : {
                "query" : "my favorite food",
                "max_gaps" : 0,
                "ordered" : true
              }
            },
            {
              "any_of" : {
                "intervals" : [
                  { "match" : { "query" : "hot water" } },
                  { "match" : { "query" : "cold porridge" } }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

响应体:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.3333333,
    "hits" : [
      {
        "_index" : "intervals_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.3333333,
        "_source" : {
          "my_text" : "my favorite food is cold porridge"
        }
      },
      {
        "_index" : "intervals_test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.3333333,
        "_source" : {
          "my_text" : "when it's cold my favorite food is porridge"
        }
      }
    ]
  }
}

intervals查询的类型有:

  • match
  • prefix
  • wildcard
  • all_of
  • any_of
  • filter

match类型的intervals查询

返回满足match子句条件的文档,可以设定查询的字符串是否需要按照顺序以及分词之间的间隔等条件。
match类型的参数:

参数描述
query用户查询的字符串
max_gaps字符串中每个词在text field中出现的最大词间距,超过最大间距的将不会被检索到;默认值是-1,即不限制,设置为0的话,query中的字符串必须彼此相连不能拆分
orderedquery中的字符串是否需要有序显示,默认值是false,即不考虑先后顺序
analyzer对query参数中的字符串使用什么分词器,默认使用mapping时该field配置的 search analyzer
filter可以为query搭配一个intervals filter,该filter不同于Boolean filter 有自己的语法结构

其他类型的参数和match类型的参数用法一样。