Elasticsearch 学习笔记Day 16

224 阅读6分钟

hi,我是蛋挞,一个初出茅庐的后端开发,希望可以和大家共同努力、共同进步!


  • 起始标记->深入搜索(13讲):「34 | Term&Phrase Suggester」
  • 结尾标记->深入搜索(13讲):「36 | 配置跨集群搜索」

Term&Phrase Suggester

什么是搜索建议

image.png

  • 现代的搜索引擎.一般都会提供 Suggest asyou type 的功能
  • 帮助用户在输入搜索的过程中,进行自动补全或者纠错。通过协助用户输入更加精准的关键词,提高后续搜索阶段文档匹配的程度
  • 在 google 上搜索,一开始会自动补全。当输入到一定长度,如因为单词拼写错误无法补全就会开始提示相似的词或者句子

Elasticsearch Suggester API

  • 搜索引擎中类似的功能,在 Elasticsearch 中是通过 Suggester API 实现的
  • 原理:将输入的文本分解为 Token,然后在索引的字典里查找相似的 Term 并返回
  • 根据不同的使用场景,Elasticsearch 设计了 4 种类别的 Suggesters
    • Term & Phrase Suggester
    • Complete & Context Suggester

Term Suggester

image.png

  • Suggester 就是一种特殊类型的搜索。”text”里是调用时候提供的文本,通常来自于用户界面上用户输入的内容
  • 用户输入的“lucen”是一个错误的拼写
  • 会到指定的字段“body”上搜索,当无法搜索到结果时(missing),返回建议的词

Term Suggester - Missing Mode

image.png

  • 搜索“lucen rock
    • 每个建议都包含了一个算分,相似性是通过 LevenshteinEdit Distance 的算法实现的。核心思想就是一个词改动多少字符就可以和另外一个词一致。 提供了很多可选参数来控制相似性的模糊程度。例如“max _edits
  • 几种 Suggestion Mode
    • Missing - 如索引中已经存在,就不提供建议
    • Popular - 推荐出现频率更加高的词
    • Always - 无论是否存在,都提供建议

Phrase Suggester

image.png

  • Phrase Suggester 在 Term Suggester 上增加了一些额外的逻辑
  • 一些参数
    • Suggest Mode : missing, popular, always
    • Max Errors: 最多可以拼错的 Terms 数
    • Confidence: 限制返回结果数,默认为1

本节知识回顾

学习了基于Term和基于Phrase的Suggester API,通过这两个API可以给用户提供更好的搜索体验。

CodeDemo

DELETE articles PUT articles { "mappings": { "properties": { "title_completion":{ "type": "completion" } } } }

POST articles/_bulk { "index" : { } } { "title_completion": "lucene is very cool"} { "index" : { } } { "title_completion": "Elasticsearch builds on top of lucene"} { "index" : { } } { "title_completion": "Elasticsearch rocks"} { "index" : { } } { "title_completion": "elastic is the company behind ELK stack"} { "index" : { } } { "title_completion": "Elk stack rocks"} { "index" : {} }

POST articles/_search?pretty { "size": 0, "suggest": { "article-suggester": { "prefix": "elk ", "completion": { "field": "title_completion" } } } }

DELETE articles

POST articles/_bulk { "index" : { } } { "body": "lucene is very cool"} { "index" : { } } { "body": "Elasticsearch builds on top of lucene"} { "index" : { } } { "body": "Elasticsearch rocks"} { "index" : { } } { "body": "elastic is the company behind ELK stack"} { "index" : { } } { "body": "Elk stack rocks"} { "index" : {} } { "body": "elasticsearch is rock solid"}

POST _analyze { "analyzer": "standard", "text": ["Elk stack rocks rock"] }

POST /articles/_search { "size": 1, "query": { "match": { "body": "lucen rock" } }, "suggest": { "term-suggestion": { "text": "lucen rock", "term": { "suggest_mode": "missing", "field": "body" } } } }

POST /articles/_search {

"suggest": { "term-suggestion": { "text": "lucen rock", "term": { "suggest_mode": "popular", "field": "body" } } } }

POST /articles/_search {

"suggest": { "term-suggestion": { "text": "lucen rock", "term": { "suggest_mode": "always", "field": "body", } } } }

POST /articles/_search {

"suggest": { "term-suggestion": { "text": "lucen hocks", "term": { "suggest_mode": "always", "field": "body", "prefix_length":0, "sort": "frequency" } } } }

POST /articles/_search { "suggest": { "my-suggestion": { "text": "lucne and elasticsear rock hello world ", "phrase": { "field": "body", "max_errors":2, "confidence":0, "direct_generator":[{ "field":"body", "suggest_mode":"always" }], "highlight": { "pre_tag": "", "post_tag": "" } } } } }

自动补全与基于上下文的提示

The Completion Suggester

  • Completion Suggester 提供了”自动完成”(Auto Complete)的功能。用户每输入一个字符,就需要即时发送一个查询请求到后段查找匹配项
  • 对性能要求比较苛刻。Elasticsearch 采用了不同的数据结构,并非通过倒排索引来完成而是将Analze 的数据编码成 FST 和索一起存放。FST 会被ES 整个加载进内存速度很快
  • FST只能用于前缀查找

使用 Completion Suggester 的一些步骤

image.png

  • 定义 Mapping,使用“completion”type
  • 索引数据
  • 运行“suggest”查询,得到搜索建议

什么是 Context Suggester

  • Completion Suggester 的扩展
  • 可以在搜索中加入更多的上下文信息,例如,输入“star
    • 咖啡相关: 建议“Starbucks
    • 电影相关:”star wars“

实现 Context Suggester

  • 可以定义两种类型的 Context
    • Category - 任意的字符串
    • Geo - 地理位置信息
  • 实现 Context Suggester 的具体步骤
    • 定制一个 Mapping
    • 索引数据,并且为每个文档加入 Context 信息
    • 结合Context 进行 Suggestion 查询

精准度和召回率

  • 精准度
    • Completion > Phrase > Term
  • 召回率
    • Term > Phrase > Completion
  • 性能
    • Completion > Phrase > Term

本节知识回顾

学习了Completion Suggester,有着非常好的性能,如果要使用Completion Suggester需要对索引的Mapping做一个设置,同时在ES当中还提供了基于上下文感知的Suggester,可以在需要感知用户输入场景的地方使用这个API。

CodeDemo

DELETE articles PUT articles { "mappings": { "properties": { "title_completion":{ "type": "completion" } } } }

POST articles/_bulk { "index" : { } } { "title_completion": "lucene is very cool"} { "index" : { } } { "title_completion": "Elasticsearch builds on top of lucene"} { "index" : { } } { "title_completion": "Elasticsearch rocks"} { "index" : { } } { "title_completion": "elastic is the company behind ELK stack"} { "index" : { } } { "title_completion": "Elk stack rocks"} { "index" : {} }

POST articles/_search?pretty { "size": 0, "suggest": { "article-suggester": { "prefix": "elk ", "completion": { "field": "title_completion" } } } }

DELETE comments PUT comments PUT comments/_mapping { "properties": { "comment_autocomplete":{ "type": "completion", "contexts":[{ "type":"category", "name":"comment_category" }] } } }

POST comments/_doc { "comment":"I love the star war movies", "comment_autocomplete":{ "input":["star wars"], "contexts":{ "comment_category":"movies" } } }

POST comments/_doc { "comment":"Where can I find a Starbucks", "comment_autocomplete":{ "input":["starbucks"], "contexts":{ "comment_category":"coffee" } } }

POST comments/_search { "suggest": { "MY_SUGGESTION": { "prefix": "sta", "completion":{ "field":"comment_autocomplete", "contexts":{ "comment_category":"coffee" } } } } }

配置跨集群搜索

水平扩展的痛点

  • 单集群一当水平扩展时,节点数不能无限增加
    • 当集群的 meta 信息 (节点,索引,集群状态)过多,会导致更新压力变大,单个 Active Master 会成为性能瓶颈,导致整个集群无法正常工作
  • 早期版本,通过 Tribe Node 可以实现多集群访问的需求,但是还存在一定的问题
  • Tribe Node 会以 Client ode 的方式加入每个集群。集群中Master 节点的任务变更需要 Tribe Node 的回应才能继续
  • Tribe Node 不保存 Cluster State 信息,一旦重启,初始化很慢
  • 当多个集群存在索引重名的情况时,只能设置一种 Prefer 规则

跨集群搜索 - Cross Cluster Search

  • 早期 Tribe Node 的方案存在一定的问题,现已被 Deprecated
  • Elasticsearch 5.3 引入了跨集群搜索的功能(Cross Cluster Search),推荐使用
    • 允许任何节点扮演 federated 节点,以轻量的方式,将搜索请求进行代理
    • 不需要以 Client Node 的形式加入其他集群

配置及查询

image.png image.png

本节知识回顾

介绍了跨集群搜索的功能,在ES早期的版本可以通过Tribe Node 实现多集群访问的需求,但是还存在一定的问题,Elasticsearch 5.3 引入了跨集群搜索的功能(Cross Cluster Search),推荐使用。

CodeDemo

//启动3个集群

bin/elasticsearch -E node.name=cluster0node -E cluster.name=cluster0 -E path.data=cluster0_data -E discovery.type=single-node -E http.port=9200 -E transport.port=9300 bin/elasticsearch -E node.name=cluster1node -E cluster.name=cluster1 -E path.data=cluster1_data -E discovery.type=single-node -E http.port=9201 -E transport.port=9301 bin/elasticsearch -E node.name=cluster2node -E cluster.name=cluster2 -E path.data=cluster2_data -E discovery.type=single-node -E http.port=9202 -E transport.port=9302

//在每个集群上设置动态的设置 PUT _cluster/settings { "persistent": { "cluster": { "remote": { "cluster0": { "seeds": [ "127.0.0.1:9300" ], "transport.ping_schedule": "30s" }, "cluster1": { "seeds": [ "127.0.0.1:9301" ], "transport.compress": true, "skip_unavailable": true }, "cluster2": { "seeds": [ "127.0.0.1:9302" ] } } } } }

#cURL curl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' {"persistent":{"cluster":{"remote":{"cluster0":{"seeds":["127.0.0.1:9300"],"transport.ping_schedule":"30s"},"cluster1":{"seeds":["127.0.0.1:9301"],"transport.compress":true,"skip_unavailable":true},"cluster2":{"seeds":["127.0.0.1:9302"]}}}}}'

curl -XPUT "http://localhost:9201/_cluster/settings" -H 'Content-Type: application/json' -d' {"persistent":{"cluster":{"remote":{"cluster0":{"seeds":["127.0.0.1:9300"],"transport.ping_schedule":"30s"},"cluster1":{"seeds":["127.0.0.1:9301"],"transport.compress":true,"skip_unavailable":true},"cluster2":{"seeds":["127.0.0.1:9302"]}}}}}'

curl -XPUT "http://localhost:9202/_cluster/settings" -H 'Content-Type: application/json' -d' {"persistent":{"cluster":{"remote":{"cluster0":{"seeds":["127.0.0.1:9300"],"transport.ping_schedule":"30s"},"cluster1":{"seeds":["127.0.0.1:9301"],"transport.compress":true,"skip_unavailable":true},"cluster2":{"seeds":["127.0.0.1:9302"]}}}}}'

#创建测试数据 curl -XPOST "http://localhost:9200/users/_doc" -H 'Content-Type: application/json' -d' {"name":"user1","age":10}'

curl -XPOST "http://localhost:9201/users/_doc" -H 'Content-Type: application/json' -d' {"name":"user2","age":20}'

curl -XPOST "http://localhost:9202/users/_doc" -H 'Content-Type: application/json' -d' {"name":"user3","age":30}'

#查询 GET /users,cluster1:users,cluster2:users/_search { "query": { "range": { "age": { "gte": 20, "lte": 40 } } } }

相关阅读


此文章为3月Day8学习笔记,内容来源于极客时间《Elasticsearch 核心技术与实战》