Elasticsearch快速入门-通过RESTful api使用ElasticsearchElasticsearch是

Elasticsearch是什么？

Elasticsearch 是一个开源的分布式搜索分析引擎，用于处理和分析大规模数据集。它是基于 Apache Lucene 搜索引擎库构建的，提供了简单易用的 RESTful API 接口。

Elasticsearch能为我们做什么

高性能搜索：实现快速、准确的全文搜索和关键字匹配，适用于构建强大的搜索引擎和企业搜索解决方案。
实时数据分析：处理和分析大规模的结构化和非结构化数据，提供实时的数据分析和洞察。
分布式存储和处理：通过分布式架构和数据分片技术，实现数据的高可用性、容错性和可扩展性。
复杂查询和聚合：支持复杂的查询操作，如布尔查询、模糊查询、范围查询和地理位置查询，并提供强大的聚合功能。
日志分析和监控：接收、索引和分析大量的日志数据，帮助监控系统行为和解决问题。
地理位置数据处理：支持地理位置搜索、距离计算和地理位置聚合，适用于地理信息系统和位置智能应用。

总的来说，Elasticsearch 是一种功能强大、灵活且可扩展的搜索和分析引擎，适用于各种场景，包括搜索引擎、企业搜索、数据分析、日志分析、监控和推荐系统等。它可以帮助我们处理和理解大规模数据，并从中获取有价值的信息和洞察。

安装运行Elasticsearch

想用最简单的方式去理解 Elasticsearch 能为你做什么，那就是使用它了。安装 Elasticsearch 之前，你需要先安装一个较新的版本的 Java版本。本文以Java 8为例。

下载安装Elasticsearch： elastic.co/downloads/elasticsearch。

上述链接下载Elasticsearch之后，进行解压。如果你是在 Windows 上面运行 Elasticseach，运行 `bin\elasticsearch.bat`

测试 Elasticsearch 是否启动成功。浏览器访问：http://localhost:9200/ 你应该得到和下面类似的响应(response)：

{
  "name" : "Tom Foster",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.1.0",
    "build_hash" : "72cd1f1a3eee09505e036106146dc1949dc5dc87",
    "build_timestamp" : "2015-11-18T22:40:03Z",
    "build_snapshot" : false,
    "lucene_version" : "5.3.1"
  },
  "tagline" : "You Know, for Search"
}

这就意味着你现在已经启动并运行一个 Elasticsearch 节点了，你可以用它做实验了。单个节点可以作为一个运行中的 Elasticsearch 的实例。而一个集群是一组拥有相同 cluster.name 的节点，他们能一起工作并共享数据，还提供容错与可伸缩性。

往ES中添加一条数据

使用postman往ES中添加三条数据

POST:http://localhost:9200/megacorp/employee/1

{
    "first_name" : "三",
    "last_name" :  "张",
    "age" :        25,
    "about" :      "我喜欢去攀岩",
    "interests": [ "运动", "音乐" ]
}

注意，路径 /megacorp/employee/1 包含了三部分的信息：

megacorp

索引名称
employee

类型名称
1

特定雇员的ID

请求体 —— JSON 文档 —— 包含了这位员工的所有详细信息，他的名字叫张三，今年 25 岁，喜欢攀岩

进行下一步前，让我们增加更多的员工信息到目录中

POST http://localhost:9200/megacorp/employee/2
{
    "first_name" :  "四",
    "last_name" :   "李",
    "age" :         32,
    "about" :       "我喜欢攀登高山",
    "interests":  [ "音乐" ]
}

POST http://localhost:9200/megacorp/employee/3
{
    "first_name" :  "小粉",
    "last_name" :   "张",
    "age" :         35,
    "about":        "我喜欢制作橱柜",
    "interests":  [ "林业" ]
}

检索文档

简单地执行一个 HTTP GET 请求并指定文档的地址——索引库、类型和ID。使用这三个信息可以返回原始的 JSON 文档： GET http://localhost:9200/megacorp/employee/1

{
    "_index": "megacorp",
    "_type": "employee",
    "_id": "1",
    "_version": 4,
    "_seq_no": 5,
    "_primary_term": 3,
    "found": true,
    "_source": {
        "first_name": "三",
        "last_name": "张",
        "age": 25,
        "about": "我喜欢去攀岩",
        "interests": [
            "运动",
            "音乐"
        ]
    }
}

获取全部雇员的信息：

GET  http://localhost:9200/megacorp/employee/_search

response：

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "first_name": "三",
                    "last_name": "张",
                    "age": 25,
                    "about": "我喜欢去攀岩",
                    "interests": [
                        "运动",
                        "音乐"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "first_name": "四",
                    "last_name": "李",
                    "age": 32,
                    "about": "我喜欢攀登高山",
                    "interests": [
                        "音乐"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "first_name": "小粉",
                    "last_name": "张",
                    "age": 35,
                    "about": "我喜欢制作橱柜",
                    "interests": [
                        "林业"
                    ]
                }
            }
        ]
    }
}

注意：返回结果不仅告知匹配了哪些文档，还包含了整个文档本身：显示搜索结果给最终用户所需的全部信息。

使用一个高亮搜索，搜索姓氏为 张 的雇员：

GET  http://localhost:9200/megacorp/employee/_search?q=last_name:张

返回结果给出了所有的姓张的：

{
    "took": 839,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.6931471,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 0.6931471,
                "_source": {
                    "first_name": "三",
                    "last_name": "张",
                    "age": 25,
                    "about": "我喜欢去攀岩",
                    "interests": [
                        "运动",
                        "音乐"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "3",
                "_score": 0.6931471,
                "_source": {
                    "first_name": "小粉",
                    "last_name": "张",
                    "age": 35,
                    "about": "我喜欢制作橱柜",
                    "interests": [
                        "林业"
                    ]
                }
            }
        ]
    }
}

还可以使用查询表达式：

GET  http://localhost:9200/megacorp/employee/_search
{
    "query" : {
        "match" : {
            "last_name" : "张"
        }
    }
}

现在尝试下更复杂的搜索。同样搜索姓氏为张的员工，但这次我们只需要年龄大于 30 的。查询需要稍作调整，使用过滤器 filter ，它支持高效地执行一个结构化查询。

GET http://localhost:9200/megacorp/employee/_search
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "last_name" : "张" 
                }
            },
            "filter": {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            }
        }
    }
}

现在结果只返回了一名员工，叫张小粉，35 岁：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.6931471,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "3",
                "_score": 0.6931471,
                "_source": {
                    "first_name": "小粉",
                    "last_name": "张",
                    "age": 35,
                    "about": "我喜欢制作橱柜",
                    "interests": [
                        "林业"
                    ]
                }
            }
        ]
    }
}

全文搜索

搜索下所有喜欢攀岩的员工：

GET http://localhost:9200/megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "攀岩"
        }
    }
}

得到两个匹配的文档：

{
    "took": 853,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 2.427258,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 2.427258,
                "_source": {
                    "first_name": "三",
                    "last_name": "张",
                    "age": 25,
                    "about": "我喜欢去攀岩",
                    "interests": [
                        "运动",
                        "音乐"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "2",
                "_score": 0.88553154,
                "_source": {
                    "first_name": "四",
                    "last_name": "李",
                    "age": 32,
                    "about": "我喜欢攀登高山",
                    "interests": [
                        "音乐"
                    ]
                }
            }
        ]
    }
}

Elasticsearch 默认按照相关性得分排序，即每个文档跟查询的匹配程度。第一个最高得分的结果很明显：张三的 about 属性清楚地写着 “攀岩” 。

但为什么李四也作为结果返回了呢？原因是他的 about 属性里提到了 “攀” 。因为只有 “攀” 而没有 “岩” ，所以她的相关性得分低于张三的。

这是一个很好的案例，阐明了 Elasticsearch 如何在全文属性上搜索并返回相关性最强的结果。Elasticsearch中的 相关性 概念非常重要，也是完全区别于传统关系型数据库的一个概念，数据库中的一条记录要么匹配要么不匹配。

精确匹配一系列单词或者短语，执行这样一个查询，仅匹配同时包含攀岩：

GET http://localhost:9200/megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "攀岩"
        }
    }
}

毫无悬念，返回结果仅有张三的文档。

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.5127167,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 1.5127167,
                "_source": {
                    "first_name": "三",
                    "last_name": "张",
                    "age": 25,
                    "about": "我喜欢去攀岩",
                    "interests": [
                        "运动",
                        "音乐"
                    ]
                }
            }
        ]
    }
}

高亮搜索

许多应用都倾向于在每个搜索结果中高亮部分文本片段，以便让用户知道为何该文档符合查询条件。在 Elasticsearch 中检索出高亮片段也很容易。

再次执行前面的查询，并增加一个新的 highlight 参数：

GET http://localhost:9200/megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "攀岩"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}

当执行该查询时，返回结果与之前一样，与此同时结果中还多了一个叫做 highlight 的部分。这个部分包含了 about 属性匹配的文本片段，并以 HTML 标签 <em></em> 封装：

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.5127167,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 1.5127167,
                "_source": {
                    "first_name": "三",
                    "last_name": "张",
                    "age": 25,
                    "about": "我喜欢去攀岩",
                    "interests": [
                        "运动",
                        "音乐"
                    ]
                },
                "highlight": {
                    "about": [
                        "我喜欢去<em>攀</em><em>岩</em>"
                    ]
                }
            }
        ]
    }
}

分析

Elasticsearch 有一个功能叫聚合（aggregations），允许我们基于数据生成一些精细的分析结果。聚合与 SQL 中的 GROUP BY 类似但更强大。

举个例子，挖掘出员工中最受欢迎的兴趣爱好：

GET http://localhost:9200/megacorp/employee/_search
{
    "aggs": {
    "all_interests": {
      "terms": { "field": "interests.keyword" }
    }
  }
}

直接查看结果：

{
 ...
    "aggregations": {
        "all_interests": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "音乐",
                    "doc_count": 2
                },
                {
                    "key": "林业",
                    "doc_count": 1
                },
                {
                    "key": "运动",
                    "doc_count": 1
                }
            ]
        }
    }
}

可以看到，两位员工对音乐感兴趣，一位对林业感兴趣，一位对运动感兴趣。这些聚合的结果数据根据匹配当前查询的文档即时生成的。如果想知道姓张的员工中最受欢迎的兴趣爱好，可以直接构造一个组合查询：

GET http://localhost:9200/megacorp/employee/_search
{
  "query": {
    "match": {
      "last_name": "张"
    }
  },
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests.keyword"
      }
    }
  }
}

all_interests 聚合已经变为只包含匹配查询的文档：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.4700036,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "3",
                "_score": 0.4700036,
                "_source": {
                    "first_name": "小粉",
                    "last_name": "张",
                    "age": 35,
                    "about": "我喜欢制作橱柜",
                    "interests": [
                        "林业"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 0.4700036,
                "_source": {
                    "first_name": "三",
                    "last_name": "张",
                    "age": 25,
                    "about": "我喜欢去攀岩",
                    "interests": [
                        "运动",
                        "音乐"
                    ]
                }
            }
        ]
    },
    "aggregations": {
        "all_interests": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "林业",
                    "doc_count": 1
                },
                {
                    "key": "运动",
                    "doc_count": 1
                },
                {
                    "key": "音乐",
                    "doc_count": 1
                }
            ]
        }
    }
}

聚合还支持分级汇总。比如，查询特定兴趣爱好员工的平均年龄：


{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests.keyword" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                }
            }
        }
    }
}

得到的聚合结果

{
    ...
    "aggregations": {
        "all_interests": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "音乐",
                    "doc_count": 2,
                    "avg_age": {
                        "value": 28.5
                    }
                },
                {
                    "key": "林业",
                    "doc_count": 1,
                    "avg_age": {
                        "value": 35.0
                    }
                },
                {
                    "key": "运动",
                    "doc_count": 1,
                    "avg_age": {
                        "value": 25.0
                    }
                }
            ]
        }
    }
}

输出基本是第一次聚合的加强版。依然有一个兴趣及数量的列表，只不过每个兴趣都有了一个附加的 avg_age 属性，代表有这个兴趣爱好的所有员工的平均年龄。