全文本查询
1. match_all
会返回所有文档
GET /get-together/_search
{
"query": {
"match_all": {}
}
}
2. match
根据分词结果查询,只能是一个查询字段,查询参数进行分词,分词后or的关系
post 192.168.94.151:9200/user/_doc/_search
{
"query":{
"match":{
"name":"Elasticsearch Denver"
}
}
}
搜索同时包含Elasticsearch 和 Denver关键词的结果,设置operator为AND
POST /get-together/_search
{
"query": {
"match": {
"name": {
"query": "Elasticsearch Denver",
"operator": "AND"
}
}
}
}
其它参数:
minimum_should_match:2
3. match_phrase
phrase查询,必须包含全部单词的可以被查出来,必须按照顺序出现,但是每个单词位置之间可以留有间隔slop
slop是拆分出来的词之间的最大间距,若超出此间距,不会被查询出来
POST /get-together/_search
{
"query": {
"match_phrase": {
"name": {
"query": "Elasticsearch Denver",
"slop":1
}
}
}
}
4. match_phrase_prefix
把查询文本分析,查询文本的最后一个分词只做前缀匹配,参数 max_expansions 控制最后一个单词会被重写成多少个前缀,也就是,控制前缀扩展成分词的数量,默认值是50。扩展的前缀数量越多,找到的文档数量就越多;如果前缀扩展的数量太少,可能查找不到相应的文档,遗漏数据。
POST /get-together/_search
{
"query": {
"match_phrase_prefix": {
"name":{
"query": "Elasticsearch D",
"max_expansions": 5,
"slop":2
}
}
},
"_source": "name"
}
5. multi_match
多字段匹配,允许搜索多个字段的值,fields中可以使用通配符
类似match,也可以转化为phrase查询、phrase_prefix查询,通过type字段指定
POST /get-together/_search
{
"query": {
"multi_match": {
"type": "phrase",
"query": "Elasticsearch Francisco",
"fields": ["name", "description"],
"slop":1
}
},
"_source": ["name", "description"]
}
| type | details |
|---|---|
best_fields | (default) Finds documents which match any field, but uses the _score from the best field. See best_fields. |
most_fields | Finds documents which match any field and combines the _score from each field. See most_fields. |
cross_fields | Treats fields with the same analyzer as though they were one big field. Looks for each word in any field. See cross_fields. |
phrase | Runs a match_phrase query on each field and uses the _score from the best field. See phrase and phrase_prefix. |
phrase_prefix | Runs a match_phrase_prefix query on each field and uses the _score from the best field. See phrase and phrase_prefix. |
bool_prefix | Creates a match_bool_prefix query on each field and combines the _score from each field. See bool_prefix. |
单字符串多字段查询
三种场景
-
最佳字段(Best Fields)
- 当字段之间相互竞争,又相互关联。例如title和body这样的字段。评分来自最匹配字段
-
多数字段(Most Fields)
- 处理英文内容时:一种常见的手段是,在主字段(English Analyzer),抽取词干,加入同义词,以匹配更多的文档。相同的文本,加入子字段(Standard Analyzer),以提供更佳精确的匹配。其它字段作为匹配文档提高相关度的信号。匹配字段越多则越好
-
混合字段(Cross Field)
- 对于某些实体,例如人名、地址、图书信息。需要在多个字段中确定信息,单个字段只能作为整体的一部分。希望在任何这些列出的字段中找到尽可能多的词
Multi Match Query
最佳字段
POST blogs/_search
{
"query": {
"multi_match": {
"type": "best_fields",
"query": "Quick pets",
"fields": ["title", "body"],
"tie_breaker": 0.2,
"minimum_should_match": "20%"
}
}
}
- Best Fields是默认类型,可以不用指定
- Minimum should match等参数可以传递到生成的query中
多数字段
DELETE /titles
PUT /titles
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "english",
"fields": {"std": {"type": "text","analyzer": "standard"}}
}
}
}
}
POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }
GET /titles/_search
{
"query": {
"multi_match": {
"query": "barking dogs",
"type": "most_fields",
"fields": [ "title", "title.std" ]
}
}
}
-
用广度匹配title包括尽可能多的文档 - 以提升召回率,同时又使用字段title.std作为信号将相关度更高的文档置于结果顶部。
-
每个字段对于最终评分的贡献可以通过自定义值boost来控制。比如,使title字段更为重要,这样同时也降低了其它信号字段的作用。
GET /titles/_search { "query": { "multi_match": { "query": "barking dogs", "type": "most_fields", "fields": [ "title^10", "title.std" ] } } }
跨字段搜索
{
"street" : "5 Poland Street",
"city": "London",
"country": "United kingdom",
"postcode": "W1V 3DG"
}
POST address/_search
{
"query":{
"multi_match":{
"query": "Poland Street W1V",
"type": "most_fields",
// "operator": "and",
"fields":["street", "city", "country", "postcode"]
}
}
}
- 无法使用Operator
- 可以使用copy_to解决,但是需要额外的存储空间
跨字段搜索
POST address/_search
{
"query":{
"multi_match":{
"query": "Poland Street W1V",
"type": "cross_fields",
"operator": "and", //词都必须出现在下面字段当中
"fields":["street", "city", "country", "postcode"]
}
}
}
- 支持Operator
- 与copy_to相比,其中一个优势就是它可以在搜索时为单个字段提升权重