Elasticsearch进阶笔记第十五篇Elasticsearch高手进阶篇(28) 深度探秘搜索技术_实战用func

Elasticsearch高手进阶篇(28)

深度探秘搜索技术_实战用function_score自定义相关度分数算法

function_score

我们可以做到自定义一个function_score函数，自己将某个field的值，跟es内置算出来的分数进行运算，然后由自己指定的field来进行分数的增强

给所有的帖子数据增加follower数量

 POST /waws/article/_bulk
 {"update": { "_id": "1"}}
 {"doc" : {"follower_num" : 5}}
 {"update": { "_id": "2"}}
 {"doc" : {"follower_num" : 10}}
 {"update": { "_id": "3"}}
 {"doc" : {"follower_num" : 25}}
 {"update": { "_id": "4"}}
 {"doc" : {"follower_num" : 3}}
 {"update": { "_id": "5"}}
 {"doc" : {"follower_num" : 60}}

将对帖子搜索得到的分数，跟follower_num进行运算，由follower_num在一定程度上增强帖子的分数看帖子的人越多，那么帖子的分数就越高

 GET /waws/article/_search
 {
   "query": {
     "function_score": {
       "query": {
         "multi_match": {
           "query": "java spark",
           "fields": ["tile", "content"]
         }
       },
       "field_value_factor": {
         "field": "follower_num",
         "modifier": "log1p",
         "factor": 0.5
       },
       "boost_mode": "sum",
       "max_boost": 2
     }
   }
 }
 
 {
   "took": 32,
   "timed_out": false,
   "_shards": {
     "total": 5,
     "successful": 5,
     "failed": 0
   },
   "hits": {
     "total": 2,
     "max_score": 2.1746066,
     "hits": [
       {
         "_index": "waws",
         "_type": "article",
         "_id": "5",
         "_score": 2.1746066,
         "_source": {
           "articleID": "DHJK-B-1395-#Ky5",
           "userID": 3,
           "hidden": false,
           "postDate": "2017-03-01",
           "tag": [
             "elasticsearch"
           ],
           "tag_cnt": 1,
           "view_cnt": 10,
           "title": "this is spark blog",
           "content": "spark is best big data solution based on scala ,an programming language similar to java spark",
           "sub_title": "haha, hello world",
           "author_first_name": "Tonny",
           "author_last_name": "Peter Smith",
           "new_author_last_name": "Peter Smith",
           "new_author_first_name": "Tonny",
           "follower_num": 60
         }
       },
       {
         "_index": "waws",
         "_type": "article",
         "_id": "2",
         "_score": 1.4645591,
         "_source": {
           "articleID": "KDKE-B-9947-#kL5",
           "userID": 1,
           "hidden": false,
           "postDate": "2017-01-02",
           "tag": [
             "java"
           ],
           "tag_cnt": 1,
           "view_cnt": 50,
           "title": "this is java blog",
           "content": "i think java is the best programming language",
           "sub_title": "learned a lot of course",
           "author_first_name": "Smith",
           "author_last_name": "Williams",
           "new_author_last_name": "Williams",
           "new_author_first_name": "Smith",
           "follower_num": 10
         }
       }
     ]
   }
 }

如果只有field，那么会将每个doc的分数都乘以follower_num，如果有的doc follower是0，那么分数就会变为0，效果很不好。因此一般会加个log1p函数，公式会变为，new_score = old_score * log(1 + number_of_votes)，这样出来的分数会比较合理
factor
- 可以进一步影响分数，new_score = old_score * log(1 + factor * number_of_votes)
boost_mode
- 可以决定分数与指定字段的值如何计算，multiply，sum，min，max，replace
max_boost
- 限制计算出来的分数不要超过max_boost指定的值

Elasticsearch高手进阶篇(29)

深度探秘搜索技术_实战掌握误拼写时的fuzzy模糊搜索技术

搜索的时候，可能输入的搜索文本会出现误拼写的情况

doc1: hello world doc2: hello java

搜索：hallo world

fuzzy搜索技术 --> 自动将拼写错误的搜索文本，进行纠正，纠正以后去尝试匹配索引中的数据

 POST /waws_index/waws_type/_bulk
 {"index": { "_id": 1 }}
 {"text": "Surprise me!"}
 {"index": { "_id": 2 }}
 {"text": "That was surprising."}
 {"index": { "_id": 3 }}
 {"text": "I wasn't surprised."}

搜索数据

 GET /waws_index/waws_type/_search 
 {
   "query": {
     "fuzzy": {
       "text": {
         "value": "surprize",
         "fuzziness": 2
       }
     }
   }
 }
 
 {
   "took": 47,
   "timed_out": false,
   "_shards": {
     "total": 5,
     "successful": 5,
     "failed": 0
   },
   "hits": {
     "total": 2,
     "max_score": 0.22585157,
     "hits": [
       {
         "_index": "waws_index",
         "_type": "waws_type",
         "_id": "1",
         "_score": 0.22585157,
         "_source": {
           "text": "Surprise me!"
         }
       },
       {
         "_index": "waws_index",
         "_type": "waws_type",
         "_id": "3",
         "_score": 0.1898702,
         "_source": {
           "text": "I wasn't surprised."
         }
       }
     ]
   }
 }

surprize --> 拼写错误 --> surprise --> s -> z
- surprize --> surprise -> z -> s，纠正一个字母，就可以匹配上，所以在fuziness指定的2范围内
- surprize --> surprised -> z -> s，末尾加个d，纠正了2次，也可以匹配上，在fuziness指定的2范围内
- surprize --> surprising -> z -> s，去掉e，ing，3次，总共要5次，才可以匹配上，始终纠正不了
fuzzy搜索以后，会自动尝试将你的搜索文本进行纠错，然后去跟文本进行匹配
fuzziness，你的搜索文本最多可以纠正几个字母去跟你的数据进行匹配，默认如果不设置，就是2

 GET /waws_index/waws_type/_search 
 {
   "query": {
     "match": {
       "text": {
         "query": "SURPIZE ME",
         "fuzziness": "AUTO",
         "operator": "and"
       }
     }
   }
 }
 
 {
   "took": 8,
   "timed_out": false,
   "_shards": {
     "total": 5,
     "successful": 5,
     "failed": 0
   },
   "hits": {
     "total": 1,
     "max_score": 0.44248468,
     "hits": [
       {
         "_index": "waws_index",
         "_type": "waws_type",
         "_id": "1",
         "_score": 0.44248468,
         "_source": {
           "text": "Surprise me!"
         }
       }
     ]
   }
 }