ElasticSearch 学习笔记

163 阅读8分钟

最近公司的开发任务需要用到Elasticsearch。作为一个切图仔,开始艰难的零基础学习Elasticsearch。下面是学习时候的一些笔记(以下概念与基本用法都基于Elasticsearch7)。

概念

节点

它指的是Elasticsearch的单个运行实例。单个物理和虚拟服务器可容纳多个节点,这取决于它们的物理资源(如RAM、存储和处理能力)的能力。

  1. 运行实例:起的一个服务
  2. 物理和虚拟服务器:一台真实 / 虚拟服务器或一个docker container / k8s中一个pod

集群

它是一个或多个节点的集合。群集为所有数据提供了跨所有节点的集体索引和搜索功能。

索引

它是不同类型的文档及其属性的集合。索引还使用分片的概念来提高性能。例如,一组文档包含社交网络应用程序的数据。

文档

它是以JSON格式定义的特定方式的字段集合。每个文档都属于一种类型,并且位于索引内。每个文档都与一个称为UID的唯一标识符相关联。

碎片

索引在水平方向上细分为碎片。这意味着每个分片都包含文档的所有属性,但所包含的JSON对象的数量要少于索引。水平分隔使分片成为一个独立的节点,可以将其存储在任何节点中。主分片是索引的原始水平部分,然后将这些主分片复制到副本分片中。

副本

Elasticsearch允许用户创建索引和碎片的副本。复制不仅有助于在发生故障时提高数据的可用性,而且还通过在这些副本中执行并行搜索操作来提高搜索性能。

优势

  • Elasticsearch是在Java上开发的,这使得它在几乎所有平台上都兼容。
  • Elasticsearch是实时的,换句话说,一秒钟后添加的文档就可以在这个引擎中搜索了
  • Elasticsearch是分布式的,因此可以轻松地在任何大型组织中进行扩展和集成。
  • 使用 gateway 的概念创建完整的备份非常简单,这个概念在 Elasticsearch 很常见。
  • 与Apache Solr相比,在Elasticsearch中处理多租户非常容易。
  • Elasticsearch使用JSON对象作为响应,这使得可以使用大量不同的编程语言来调用Elasticsearch服务器。
  • 除了不支持文本渲染的文档类型外,Elasticsearch支持几乎所有文档类型。

缺点

  • 在处理请求和响应数据方面,Elasticsearch不提供多语言支持(仅在JSON中可用),与Apache Solr不同,后者可以CSV,XML和JSON格式。
  • 有时,Elasticsearch会出现脑裂情况的问题。 ps: 索引、文档和碎片的区别: TODO

基本使用

下面所有操作都是基于访问localhost:9200curl的基本使用

Elasticsearch 添加一些索引、映射和数据:

  1. 创建索引
curl -X PUT localhost:9200/school

{"acknowledged":true,"shards_acknowledged":true,"index":"school"}
  1. 再次创建返回错误, 返回 already exists
{"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"index [school/WC6gYdYtRYKUpBW02pDFMg] already exists","index_uuid":"WC6gYdYtRYKUpBW02pDFMg","index":"school"}],"type":"resource_already_exists_exception","reason":"index [school/WC6gYdYtRYKUpBW02pDFMg] already exists","index_uuid":"WC6gYdYtRYKUpBW02pDFMg","index":"school"},"status":400}
  1. 创建其他索引
curl -X PUT localhost:9200/student

{"acknowledged":true,"shards_acknowledged":true,"index":"student"}
  1. 添加数据, 添加id为1的数据
curl -X POST -H 'Content-Type: application/json' localhost:9200/school/_doc/1 -d '
{
   "name":"Saint Paul School", "description":"ICSE Afiliation",
   "street":"Dawarka", "city":"Delhi", "state":"Delhi", "zip":"110075",
   "location":[28.5733056, 77.0122136], "fees":5000,
   "tags":["Good Faculty", "Great Sports"], "rating":"4.5"
}'

{"_index":"school","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

================================================================

curl -X POST -H 'Content-Type: application/json' localhost:9200/student/_doc/1 -d '
{
   "name":"test",
   "age":"20"
}'

{"_index":"student","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
  1. 查询数据
curl localhost:9200/school/_doc/1

{"_index":"school","_type":"_doc","_id":"1","_version":1,"_seq_no":0,     "_primary_term":1,"found":true,"_source":
  {
    "name":"Saint Paul School", "description":"ICSE Afiliation",
    "street":"Dawarka", "city":"Delhi", "state":"Delhi", "zip":"110075",
    "location":[28.5733056, 77.0122136], "fees":5000,
    "tags":["Good Faculty", "Great Sports"], "rating":"4.5"
  }
}
curl -X POST localhost:9200/school/_search

{"took":9,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"school","_type":"_doc","_id":"1","_score":1.0,"_source":
{
   "name":"Saint Paul School", "description":"ICSE Afiliation",
   "street":"Dawarka", "city":"Delhi", "state":"Delhi", "zip":"110075",
   "location":[28.5733056, 77.0122136], "fees":5000,
   "tags":["Good Faculty", "Great Sports"], "rating":"4.5"
}}]}}

API约定

多个索引

一次搜索或者其他操作可以针对多个索引进行,只需要用逗号将所有索引隔开:

一次性搜索多个索引
curl -X POST -H 'Content-Type: application/json' localhost:9200/school,student/_search -d '
{
   "query":{
      "query_string":{
         "query":"test"
      }
   }
}'

{"took":3,"timed_out":false,"_shards":{"total":2,"successful":2,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.2876821,"hits":[{"_index":"student","_type":"_doc","_id":"1","_score":0.2876821,"_source":
{
   "name":"test",
   "age":"20"
}}]}}

所有索引

在搜索全部索引时,使用关键字 _all

curl -X POST -H 'Content-Type: application/json' localhost:9200/_all/_search -d '
{
   "query":{
      "query_string":{
         "query":"test"
      }
   }
}'

搜索结果同上

通配符(*,+,–)

  • *: 任意匹配,在localhost:9200/sc*/_search中,搜索所有以sc开头的索引
  • +: 添加某个索引
  • -: 去掉该索引,在localhost:9200/sc*,-school/_search中,搜索所有以sc开头的索引(同时去掉school这个索引)

集群API

群集API用于获取有关群集及其节点的信息并在其中进行更改。要调用此API,我们需要指定节点名称,地址或_local。

集群所有节点信息

curl localhost:9200/_nodes/_local

{"_nodes":{"total":1,"successful":1,"failed":0},"cluster_name":"Elasticsearch","nodes":{"0uTLUfi0Ts-ZAIBOyAlQag":{"name":"f632e677186b",............

集群运行情况

curl localhost:9200/_cluster/health

{"cluster_name":"Elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":27,"active_shards":27,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":24,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":52.94117647058824}

集群状态

区别于运行健康情况的API,这个API更偏向有关集群的状态信息。状态信息包含版本,主节点,其他节点,路由表,元数据和块。(更加全面与详细)

curl localhost:9200/_cluster/state

{"cluster_name":"Elasticsearch","cluster_uuid":"59RT7mIeTZOqd996pVYpcQ","version":1713............

集群统计

统计集群分片号,存储大小,内存使用率,节点数,角色,操作系统和文件系统。

curl localhost:9200/_cluster/stats

{"cluster_name":"Elasticsearch","cluster_uuid":"59RT7mIeTZOqd996pVYpcQ","version":1713............

{"_nodes":{"total":1,"successful":1,"failed":0},"cluster_name":"Elasticsearch"............

节点统计

与统计集群的API返回结果基本相同

curl localhost:9200/_nodes/stats

{"_nodes":{"total":1,"successful":1,"failed":0},"cluster_name":"Elasticsearch","nodes":............

Cat API

通常,来自各种Elasticsearch API的结果以JSON格式显示。但是,JSON并非总是易于阅读。因此,Elasticsearch中提供了cat APIs功能,有助于使结果的打印格式更易于阅读和理解。cat API中使用了各种参数,这些参数具有不同的用途,例如-术语V使输出变得冗长。

索引的详细信息

curl localhost:9200/_cat/indices?v

health status index                                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   school                                WC6gYdYtRYKUpBW02pDFMg   1   1          1            0      8.8kb          8.8kb

计数 - Count

curl localhost:9200/_cat/count?v

epoch      timestamp count
1644994713 06:58:33  545

索引API

创建索引

curl -X PUT -H "Content-Type: application/json" localhost:9200/school -d '
{
  "settings" : {
      "index" : {
         "number_of_shards" : 3, //
         "number_of_replicas" : 2 //
      }
   }
}
'

{"acknowledged":true,"shards_acknowledged":true,"index":"school"}

删除索引

curl -X DELETE localhost:9200/school

获取索引信息

curl localhost:9200/school

{"school":{"aliases":{},"mappings":{"properties":{"city":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"description":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"fees":{"type":"long"},"location":{"type":"float"},"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"rating":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"state":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"street":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"tags":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"zip":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"number_of_shards":"1","provided_name":"school","creation_date":"1644982015360","number_of_replicas":"1","uuid":"WC6gYdYtRYKUpBW02pDFMg","version":{"created":"7160299"}}}}}

获取索引设置信息

curl localhost:9200/school/_settings

{"school":{"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"number_of_shards":"1","provided_name":"school","creation_date":"1644982015360","number_of_replicas":"1","uuid":"WC6gYdYtRYKUpBW02pDFMg","version":{"created":"7160299"}}}}}

索引是否存在

curl -X HEAD localhost:9200/school -> HTTP/1.1 200 OK

curl -X HEAD localhost:9200/school1 -> HTTP/1.1 404 Not Found

索引信息统计

curl localhost:9200/school/_stats

{"_shards":{"total":2,"successful":1,"failed":0},"_all":{"primaries":{"docs":{"count":1,"deleted":0}............

冲洗(Flush)

索引的刷新过程可确保当前仅保留在事务日志中的所有数据也将永久保留在Lucene中。这减少了恢复时间,因为在打开Lucene索引之后,不需要从事务日志中重新索引数据。

curl -X POST localhost:9200/school/_flush

{"_shards":{"total":2,"successful":1,"failed":0}

文档API

获取API

curl localhost:9200/student/_doc/1

{"_index":"student","_type":"_doc","_id":"1","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":
{
   "name":"test",
   "age":"20"
}}

更新API

curl -X POST -H 'Content-Type: application/json' localhost:9200/student/_update/1 -d '
{
   "name":"test_update",
   "age":"20"
}'

删除API

curl -X DELETE localhost:9200/student/_doc/1

索引API

有数据则修改,没有则创建一个新的

curl -X PUT -H 'Content-Type: application/json' localhost:9200/student/_doc/1 -d '
{
   "name":"test_update",
   "age":"20"
}'

curl -X PUT -H 'Content-Type: application/json' localhost:9200/student/_doc/2 -d '
{
   "name":"test2",
   "age":"20"
}'

================================================================

curl -X POST -H 'Content-Type: application/json' localhost:9200/student/_search

{"took":628,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"student","_type":"_doc","_id":"1","_score":1.0,"_source":
{
   "name":"test_update",
   "age":"20"
}},{"_index":"student","_type":"_doc","_id":"2","_score":1.0,"_source":
{
   "name":"test2",
   "age":"20"
}}]}}

搜索API

可以使用统一资源标识符在搜索操作中传递许多参数:

  1. q: 此参数用于指定查询字符串
  2. lenient: 此参数用于指定查询字符串。只要将此参数设置为 true,就可以忽略基于 Formatbased 的错误。默认情况下它是假的。
  3. fields: 此参数用于指定查询字符串
  4. sort: 我们可以通过使用这个参数得到排序的结果,这个参数的可能值是fieldName, fieldName:asc/ fieldName:desc
  5. timeout: 限制搜索时间
  6. terminate_after: 可以将响应限制为每个碎片的指定数量的文档,到达该分片时,查询将提前终止。默认情况下,没有 termin_after
  7. from: 要返回的命中数的起始索引。默认为0
  8. to: 表示要返回的命中数,默认值为10

未完待续

参考

elasticsearch-tutorial