持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第29天,点击查看活动详情
前言
昨天为文章中提到,为了提高图数据库的查询结果,一方面是采用模糊查询的方法,但是有一个问题是,用户并不知道我们数据库数据的存储形式,例如数据库中存储的为microsoft,而用户检索微软是没有意义的,因此在查询的搜索框中利用ES中的Suggest做联想词查询,昨天使用Docker进行了ES数据库的部署以及分词器IK的安装,今天则使用suggest进行测试以及python查询接口的编写。
ES数据库中的suggest
ES数据库通过Suggesters API来实现这个功能,该API包含4种不同的功能,大家按需索取
-
term suggester
该方法根据经过tokenizer之后的分词结果进行匹配,依据是编辑距离,常用的是对文本进行纠错,例如 pytron -> python
-
Phrase Suggester
该方法用于短语的补全,同时具备纠错的能力。它可以基于共生和频率选出更好的建议短语。
-
Completion Suggester
自动补全功能,支持三种查询【前缀查询(prefix)模糊查询(fuzzy)正则表达式查询(regex)】本次使用的也是这个方法,该方法由于将数据保存在内存中的FST中。因此性能能够保证,可以满足检索框实时展示补全结果。
-
Context suggester 该方法用于上下文补全。
Completion Suggester
构建索引
PUT方法
http://127.0.0.1:19200/vulnerability?pretty=true
body值:
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"keywords": {
"type": "completion",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
}
}
插入测试数据
Post方法
http://127.0.0.1:19200/vulnerability/_doc/?pretty=true
body值
{
"keywords":"microsoft"
}
suggest测试
http://127.0.0.1:19200/vulnerability/_doc/_search?pretty=true
body值
{
"suggest": {
"suggest": {
"text": "mic",
"completion": {
"field": "keywords",
"skip_duplicates": true
}
}
}
}
response值:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"suggest": [
{
"text": "mic",
"offset": 0,
"length": 3,
"options": [
{
"text": "micro1",
"_index": "vulnerability",
"_type": "_doc",
"_id": "O_cCH4QBabWewAf35yzC",
"_score": 1.0,
"_source": {
"keywords": "micro1"
}
},
{
"text": "micro2",
"_index": "vulnerability",
"_type": "_doc",
"_id": "PPcDH4QBabWewAf3BCzG",
"_score": 1.0,
"_source": {
"keywords": "micro2"
}
},
{
"text": "microsoft",
"_index": "vulnerability",
"_type": "_doc",
"_id": "PfcDH4QBabWewAf3JSy7",
"_score": 1.0,
"_source": {
"keywords": "microsoft"
}
}
]
}
]
}
}
可以看到包含mic的所有词都返回回来了,达到了我们的预期。 查询无果的时候返回结果:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"suggest": [
{
"text": "o",
"offset": 0,
"length": 1,
"options": []
}
]
}
}
python程序编写
es.工具类
[ES]
REMOTE_URL=http://127.0.0.1:19200
ES 工具类
from configparser import RawConfigParser
from elasticsearch import Elasticsearch
class esUtils(object):
def __init__(self):
self.getMysqlCfg()
self.client = Elasticsearch(self.REMOTE_URL, request_timeout=3600)
def getMysqlCfg(self):
config = RawConfigParser()
# 获取配置文件的真实路径
path = r"../config/es.ini"
config.read(path, encoding="utf-8")
self.REMOTE_URL = config.get("ES", "REMOTE_URL")
def count_index(self, index):
return self.client.count(index=index)
def search_index_all(self, index, size=10):
query = {
"query": {
"match_all": {}
}
}
return self.client.search(index=index, body=query, size=size)
def delete_index(self, index='dx_info'):
self.client.indices.delete(index)
def get_spec_id(self, index, id):
return self.client.get(index=index, id=id)
def delete_by_id(self, index, id):
return self.client.delete(index=index, id=id)
def update_by_id(self, index, id, body):
self.client.update(index=index, doc_type='_doc', id=id, body=body)
def suggest(self, index, tag, query, suggest_size=10):
body = self.set_suggest_optional(query,tag,suggest_size)
return self.get_suggest_list(self.client.search(index=index, body=body, size=10))
def get_suggest_list(self, es_result):
result_items = es_result['suggest']['suggest'][0]["options"]
final_results = []
for item in result_items:
final_results.append(item['text'])
return final_results
def set_suggest_optional(self, query, tag, suggest_size):
body = {
"suggest": {
"suggest": {
"text": query,
"completion": {
"field": tag,
"skip_duplicates": True,
"size":suggest_size
}
}
}
}
return body
通过ES的工具类,可以完成ES数据库的查询,编辑,删除等操作,使用的是elasticsearch库,并根据suggest方法写了返回数据处理的脚本get_suggest_list()。
安装方法
pip install elasticsearch
测试:
es = esUtils()
print(es.suggest(index= "vulnerability",tag= "keywords",query="m",suggest_size=2))
结果如下:
['micro1', 'micro2']