ElasticSearch(二)

126 阅读3分钟

检索

轻量搜索

在序章中员工的背景下进行下面的搜索

全部内容

通过_search返回某个索引库的全部内容

  • 请求URL GET /megacorp/employee/_search

  • 返回JSON

    {
        "took": 18,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 3,
            "max_score": 1.0,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "2",
                    "_score": 1.0,
                    "_source": {
                        "first_name": "Jane",
                        "last_name": "Smith",
                        "age": 32,
                        "about": "I like to collect rock albums",
                        "interests": [
                            "music"
                        ]
                    }
                },
                .........
            ]
        }
    }
    

高亮搜索

在_search的基础上,追加地址栏参数的方式进行检索

ps:此步骤采用lastName为Fir作为条件进行检索

  • 请求URL:GET /megacorp/employee/_search?q=last_name:Fir

  • 响应JSON

    {
        "took": 2,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 0.30685282,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "3",
                    "_score": 0.30685282,
                    "_source": {
                        "first_name": "Douglas",
                        "last_name": "Fir",
                        "age": 35,
                        "about": "I like to build cabinets",
                        "interests": [
                            "forestry"
                        ]
                    }
                }
            ]
        }
    }
    

查询表达式搜索

查询表达式搜索是ES在搜索中的领域特定语言(DSL),它支持构建复杂且健壮的语言。虽然轻量搜索可以进行检索,但是使用的场景极为局限

使用方式

探讨使用方式这里,直接对之前的轻量查询进行改造

  • URL:/megacorp/employee/_search

  • 请求体JSON

    {
        "query" : {
            "match" : {
                "last_name" : "Smith"
            }
        }
    }
    

    ​ 可以看到在请求体中对原来的轻量级查询进行改造,由地址传参,采用在query,match下的字段进行匹配查询条件,进而搜索

  • 响应体JSON

    {
        "took": 2,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 0.30685282,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "3",
                    "_score": 0.30685282,
                    "_source": {
                        "first_name": "Douglas",
                        "last_name": "Fir",
                        "age": 35,
                        "about": "I like to build cabinets",
                        "interests": [
                            "forestry"
                        ]
                    }
                }
            ]
        }
    }
    

更复杂的搜索

基本的搜索显然是不符合强大的ES的,所以采用更复杂的搜索,比如lastName是Smith,且年龄大于30的数据

使用方式

  • 请求URL:/megacorp/employee/_search

  • 请求JSON

    {
        "query" : {
            "bool": {
                "must": {
                    "match" : {
                        "last_name" : "smith" 
                    }
                },
                "filter": {
                    "range" : {
                        "age" : { "gt" : 30 } 
                    }
                }
            }
        }
    }
    

    ​ 请求JSON中在Match的基础上我们增加了filter字段进行过滤,filter中采用range表示范围,故此在range中指定要过滤的字段age,进行范围的确定

  • 响应JSON

    {
        "took": 2,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 0.30685282,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "2",
                    "_score": 0.30685282,
                    "_source": {
                        "first_name": "Jane",
                        "last_name": "Smith",
                        "age": 32,
                        "about": "I like to collect rock albums",
                        "interests": [
                            "music"
                        ]
                    }
                }
            ]
        }
    }
    

    ​ 显然年龄小于32岁的并能没有筛选出来

全文搜索

根据年龄筛选,根据姓名检索在传统的关系型数据库中都可以做到,接下来尝试传统关系型数据库做不到的事情

使用方式

需求 搜索关于字段中所有喜欢攀岩的员工

  • 请求URL:GET /megacorp/employee/_search

  • 请求JSON:依然采用查询表达式搜索 中的请求体

    {
        "query" : {
            "match" : {
                "about" : "rock climbing"
            }
        }
    }
    
  • 响应JSON:结果符合预期,在about属性中有2个员工喜欢爬山所以被检索出来

    {
        "took": 38,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 2,
            "max_score": 0.16273327,
            "hits": [
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "1",
                    "_score": 0.16273327,
                    "_source": {
                        "first_name": "John",
                        "last_name": "Smith",
                        "age": 25,
                        "about": "I love to go rock climbing",
                        "interests": [
                            "sports",
                            "music"
                        ]
                    }
                },
                {
                    "_index": "megacorp",
                    "_type": "employee",
                    "_id": "2",
                    "_score": 0.016878016,
                    "_source": {
                        "first_name": "Jane",
                        "last_name": "Smith",
                        "age": 32,
                        "about": "I like to collect rock albums",
                        "interests": [
                            "music"
                        ]
                    }
                }
            ]
        }
    }
    
    • 值得注意,每条匹配记录中包含相关性得分字段"_score": 0.016878016

    • image-20220811104821448

      显然第2条记录其实并不是完全匹配,此情况若在关系型数据库中的like 查询将不会匹配,所以ES的相关性分数直接在全文检索场景下,与传统数据库拉开差距。我们可以使用相关性做很多的事情