这是我参与8月更文挑战的第11天,活动详情查看:8月更文挑战
本Elasticsearch相关文章的版本为:7.4.2
测试数据:
POST /match_phrase_test/_doc/1
{
"my_text": "my favorite dialet is cold porridge"
}
POST /match_phrase_test/_doc/2
{
"my_text": "when it's cold his favorite food is porridge"
}
match_phrase查询
match_phrase查询会对待查询的文本进行分词,然后对所得到的分词进行phrase查询。
例子:
POST /match_phrase_test/_search
{
"query": {
"match_phrase": {
"my_text": {
"query": "my favorite"
}
}
}
}
分析:
my favorite经过分词得到["my", "favorite"];- doc1这两个分词都具有且
my后面紧跟favorite, 但doc2只具有favorite, 不满足短语要求; - 所以返回doc1.
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6520334,
"hits" : [
{
"_index" : "match_phrase_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6520334,
"_source" : {
"my_text" : "my favorite dialet is cold porridge"
}
}
]
}
}
slop参数可以设置允许调换文本顺序的最大调换次数,此值是2的倍数。假如文档里记录的是favorite food,输入的查询文本是food favorite, 那么调整到和文档favorite food的顺序一样需要调换步骤:
food放到favorite所在的位置;favorite放到food所在的位子。
总结:所以调换一个分词需要2个slop,调换两个分词就需要4个slop,调换n个分词需要最少2*n个slop, 也可以理解为使用(顺序错乱的分词的个数-1)*2。
例子:
假如输入my dialet favorite,那么要命中doc1的my favorite dialet is cold porridge,因为dialet favorite的顺序是错乱的,只需要调换其中一个即可,所需要的最少slop就是1*2即2. 也可以这样计算:(顺序错乱的分词的个数-1)*2 ==> (2-1)*2
POST /match_phrase_test/_search
{
"query": {
"match_phrase": {
"my_text": {
"query": "my dialet favorite is",
"slop": 2
}
}
}
}
查询结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.9197583,
"hits" : [
{
"_index" : "match_phrase_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.9197583,
"_source" : {
"my_text" : "my favorite dialet is cold porridge"
}
}
]
}
}
也可以使用analyzer这个参数指定在进行分词时的分词器,默认是使用所查询的字段的mapping时所显式指定的search_analyzer或索引的默认analyzer。
POST /match_phrase_test/_search
{
"query": {
"match_phrase": {
"my_text": {
"query": "favorite Dialet",
"analyzer": "whitespace"
}
}
}
}
因为指定analyzer为whitespace,亦即按空格进行分词,得到["favorite", "Dialet"],
doc1的my_text在进行倒排索引分词所使用的analyzer为standard分词器(以空格分词,然后统一为小写字母),得到的是["my", "favorite", "dialect", "is", "cold", "porridge"],
因为Dialet并存在doc1的倒排索引里,所以doc1并不会被命中,所以查询结果为空。
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}