Elasticsearch高手进阶篇(28)
深度探秘搜索技术_实战用function_score自定义相关度分数算法
function_score
我们可以做到自定义一个function_score函数,自己将某个field的值,跟es内置算出来的分数进行运算,然后由自己指定的field来进行分数的增强
给所有的帖子数据增加follower数量
POST /waws/article/_bulk
{"update": { "_id": "1"}}
{"doc" : {"follower_num" : 5}}
{"update": { "_id": "2"}}
{"doc" : {"follower_num" : 10}}
{"update": { "_id": "3"}}
{"doc" : {"follower_num" : 25}}
{"update": { "_id": "4"}}
{"doc" : {"follower_num" : 3}}
{"update": { "_id": "5"}}
{"doc" : {"follower_num" : 60}}
将对帖子搜索得到的分数,跟follower_num进行运算,由follower_num在一定程度上增强帖子的分数 看帖子的人越多,那么帖子的分数就越高
GET /waws/article/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "java spark",
"fields": ["tile", "content"]
}
},
"field_value_factor": {
"field": "follower_num",
"modifier": "log1p",
"factor": 0.5
},
"boost_mode": "sum",
"max_boost": 2
}
}
}
{
"took": 32,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.1746066,
"hits": [
{
"_index": "waws",
"_type": "article",
"_id": "5",
"_score": 2.1746066,
"_source": {
"articleID": "DHJK-B-1395-#Ky5",
"userID": 3,
"hidden": false,
"postDate": "2017-03-01",
"tag": [
"elasticsearch"
],
"tag_cnt": 1,
"view_cnt": 10,
"title": "this is spark blog",
"content": "spark is best big data solution based on scala ,an programming language similar to java spark",
"sub_title": "haha, hello world",
"author_first_name": "Tonny",
"author_last_name": "Peter Smith",
"new_author_last_name": "Peter Smith",
"new_author_first_name": "Tonny",
"follower_num": 60
}
},
{
"_index": "waws",
"_type": "article",
"_id": "2",
"_score": 1.4645591,
"_source": {
"articleID": "KDKE-B-9947-#kL5",
"userID": 1,
"hidden": false,
"postDate": "2017-01-02",
"tag": [
"java"
],
"tag_cnt": 1,
"view_cnt": 50,
"title": "this is java blog",
"content": "i think java is the best programming language",
"sub_title": "learned a lot of course",
"author_first_name": "Smith",
"author_last_name": "Williams",
"new_author_last_name": "Williams",
"new_author_first_name": "Smith",
"follower_num": 10
}
}
]
}
}
-
如果只有field,那么会将每个doc的分数都乘以follower_num,如果有的doc follower是0,那么分数就会变为0,效果很不好。因此一般会加个log1p函数,公式会变为,
new_score = old_score * log(1 + number_of_votes),这样出来的分数会比较合理 -
factor
- 可以进一步影响分数,
new_score = old_score * log(1 + factor * number_of_votes)
- 可以进一步影响分数,
-
boost_mode
- 可以决定分数与指定字段的值如何计算,
multiply,sum,min,max,replace
- 可以决定分数与指定字段的值如何计算,
-
max_boost
- 限制计算出来的分数不要超过
max_boost指定的值
- 限制计算出来的分数不要超过
Elasticsearch高手进阶篇(29)
深度探秘搜索技术_实战掌握误拼写时的fuzzy模糊搜索技术
搜索的时候,可能输入的搜索文本会出现误拼写的情况
doc1: hello world doc2: hello java
搜索:hallo world
fuzzy搜索技术 --> 自动将拼写错误的搜索文本,进行纠正,纠正以后去尝试匹配索引中的数据
POST /waws_index/waws_type/_bulk
{"index": { "_id": 1 }}
{"text": "Surprise me!"}
{"index": { "_id": 2 }}
{"text": "That was surprising."}
{"index": { "_id": 3 }}
{"text": "I wasn't surprised."}
- 搜索数据
GET /waws_index/waws_type/_search
{
"query": {
"fuzzy": {
"text": {
"value": "surprize",
"fuzziness": 2
}
}
}
}
{
"took": 47,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.22585157,
"hits": [
{
"_index": "waws_index",
"_type": "waws_type",
"_id": "1",
"_score": 0.22585157,
"_source": {
"text": "Surprise me!"
}
},
{
"_index": "waws_index",
"_type": "waws_type",
"_id": "3",
"_score": 0.1898702,
"_source": {
"text": "I wasn't surprised."
}
}
]
}
}
-
surprize --> 拼写错误 --> surprise --> s -> z
- surprize --> surprise -> z -> s,纠正一个字母,就可以匹配上,所以在fuziness指定的2范围内
- surprize --> surprised -> z -> s,末尾加个d,纠正了2次,也可以匹配上,在fuziness指定的2范围内
- surprize --> surprising -> z -> s,去掉e,ing,3次,总共要5次,才可以匹配上,始终纠正不了
-
fuzzy搜索以后,会自动尝试将你的搜索文本进行纠错,然后去跟文本进行匹配
-
fuzziness,你的搜索文本最多可以纠正几个字母去跟你的数据进行匹配,默认如果不设置,就是2
GET /waws_index/waws_type/_search
{
"query": {
"match": {
"text": {
"query": "SURPIZE ME",
"fuzziness": "AUTO",
"operator": "and"
}
}
}
}
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.44248468,
"hits": [
{
"_index": "waws_index",
"_type": "waws_type",
"_id": "1",
"_score": 0.44248468,
"_source": {
"text": "Surprise me!"
}
}
]
}
}