ElasticSearch 基础

121 阅读5分钟

安装并作为集群启动

前置安装: jdk (java 12)

elasticsearch-7.6.0

下载并解压Elasticsearch 7.6.0

kibana-7.6.0

下载并解压 Kibana 7.6.0 | Elastic

cerebro-0.9.4

下载并解压 cerebro-0.9.4

启动两个2 es进程,并启动kinbana、cerebro

./bin/elasticsearch -E node.name=node1 -E cluster.name=ckes -E path.data=node1_data -d

./bin/elasticsearch -E node.name=node2 -E cluster.name=ckes -E path.data=node2_data -d

nohup ./bin/kibana 2>&1 1>>kibna.log&

nohup ./bin/cerebro 2>&1 1>>cerebro.log&

启动验证

es:

image.png kinbana:

image.png

cerebro:

image.png

可以看到,我们集群名 ckes, 启动了两个节点,启动node2是master, 目前集群状态正常(green)

ES数据模型

索引(Index)

一个索引是一类文档的集合,我们可以根据业务场景定义一个或多个索引,比如商品、商家各自可以定义个索引结构,每个索引结构由多个字段(field)组成。索引结构的定义由Mapping决定。索引除了定义结构外,还定义索引的一些分布情况,比如索引的分片数,副本数。

Mapping

Mapping定义了索引里的文档有哪些字段及这些字段的类型,类似数据的表结构。Mapping的除了设置字段名称和类型,还可以设置各个字段的特性,如分词器,是否可以被索引,空值等。

下面是一个典型的创建索引的例子:

# 创建 videos 索引
PUT videos
{
  "mappings": {
    "properties": {
        "video_id": {
          "type": "keyword"
        },
        "name": {
          "type": "text"
        },
        "author": {
          "type": "keyword"
        },
        "intro": {
          "type": "text",
          "analyzer": "standard"
        },
        "url": {
          "type": "keyword"
        }
      }
  },
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

可以看到,这里创建了videos索引,并设置了intro 字段的分词器为standard, 并在settings 中设置了分片数为3, 副本数为1. 创建完索引,可以在cerebro看到监控:

image.png

mapping的详细讲解参考官网: www.elastic.co/guide/en/el…

添加文档数据

往索引中添加数据,有2中API 3种方式

  1. Index Api
PUT /videos/_doc/1
{
  "name": "video 1",
  "author": "ck",
  "intro": "the first video",
  "url": "https://www.qq.com/"
}

返回:
{
  "_index" : "videos",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

// 更新
PUT /videos/_doc/1
{
  "name": "video 1 new",
  "author": "ck",
  "intro": "the first video",
  "url": "https://www.qq.com/"
}

{
  "_index" : "videos",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

使用index api 需指定id, 如果docid是重复的,会删除老的版本,并将版本号增1, 体现在'_version'字段。

  1. Create Api - PUT 方式
PUT /videos/_create/2
{
  "name": "video 1",
  "author": "ck",
  "intro": "the first video",
  "url": "https://www.qq.com/"
}

返回:
{
  "_index" : "videos",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

// 如果重复执行会报错
{
  "error" : {
    "root_cause" : [
      {
        "type" : "version_conflict_engine_exception",
        "reason" : "[2]: version conflict, document already exists (current version [1])",
        "index_uuid" : "CZzRr7sHSKCN8_GmLxOWuA",
        "shard" : "1",
        "index" : "videos"
      }
    ],
    "type" : "version_conflict_engine_exception",
    "reason" : "[2]: version conflict, document already exists (current version [1])",
    "index_uuid" : "CZzRr7sHSKCN8_GmLxOWuA",
    "shard" : "1",
    "index" : "videos"
  },
  "status" : 409
}

create api 的PUT方式需要指定id, 如果id重复,则会报已存在的错误。

  1. Create Api - POST 方式
POST /videos/_doc
{
  "name": "video 3",
  "author": "ck",
  "intro": "the first video",
  "url": "https://www.qq.com/"
}

返回:
{
  "_index" : "videos",
  "_type" : "_doc",
  "_id" : "riIgIoEBXLUTAD0acEF8",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

create api post方式可以不指定id, 系统会自动生成id.

总结一下:

更新文档或者非首次写入,使用index api, 如果对写入唯一性检测,使用create api put方式,如果让系统创建ID,使用create api post方式, create api post方式性能最佳。

获取文档数据

// 单个获取
GET /videos/_doc/2

返回
{
  "_index" : "videos",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "video 2",
    "author" : "ck",
    "intro" : "the second video",
    "url" : "https://www.qq.com/"
  }
}

// 批量获取
GET /videos/_mget
{
  "ids":["1", "2"]
}

返回:
{
  "docs" : [
    {
      "_index" : "videos",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 0,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "video 1",
        "author" : "ck",
        "intro" : "the first video",
        "url" : "https://www.qq.com/"
      }
    },
    {
      "_index" : "videos",
      "_type" : "_doc",
      "_id" : "2",
      "_version" : 3,
      "_seq_no" : 2,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "video 2",
        "author" : "checkking",
        "intro" : "the second video",
        "url" : "https://www.qq.com/"
      }
    }
  ]
}

更新文档数据

POST /videos/_doc/2
{
  "name": "video 2",
  "author": "checkking",
  "intro": "the second video",
  "url": "https://www.qq.com/"
}

返回: 
{
  "_index" : "videos",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "video 2",
    "author" : "checkking",
    "intro" : "the second video",
    "url" : "https://www.qq.com/"
  }
}

注意: 更新文档如果要更新某个字段,不能只填某个字段的更新值,必须所有字段均填上。

批量操作

bulk api

如果有同时发起多个索引操作的需求,为了提高性能,可以使用bulk api, bulk api可以支持同时新增、删除、修改索引数据的操作, 具体格式如下:

POST /_bulk
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

如:
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

返回结果里,会对每条操作数据详细返回, 如果其中一条操作失败,不影响其他的。

POST /_bulk
{"create": {"_index": "videos", "_id": "3"}}
{"name": "video 3","author": "checkking2","intro": "the thrid video","url": "https://www.qq.com/"}
{"create": {"_index": "videos", "_id": "4"}}
{"name": "video 4","author": "checkking4","intro": "the forth video","url": "https://www.qq.com/"}

返回:
{
  "took" : 153,
  "errors" : false,
  "items" : [
    {
      "create" : {
        "_index" : "videos",
        "_type" : "_doc",
        "_id" : "3",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "create" : {
        "_index" : "videos",
        "_type" : "_doc",
        "_id" : "4",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 4,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

批量读取 _mget

GET /_mget
{
  "docs": [
    {"_index": "videos", "_id": "3"},
    {"_index": "videos", "_id": "1"}
  ]
}

返回:
{
  "docs" : [
    {
      "_index" : "videos",
      "_type" : "_doc",
      "_id" : "3",
      "_version" : 1,
      "_seq_no" : 3,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "video 3",
        "author" : "checkking2",
        "intro" : "the thrid video",
        "url" : "https://www.qq.com/"
      }
    },
    {
      "_index" : "videos",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 0,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "video 1",
        "author" : "ck",
        "intro" : "the first video",
        "url" : "https://www.qq.com/"
      }
    }
  ]
}

_mget 支持同时查询多个索引的文档

批量读取 _msearch

GET /videos/_msearch
{}
{"query": {"match_all":{}}}
{"index": "videos"}
{"query": {"term":{"author": { "value": "checkking2"}}}}

查询格式为:

GET /{target}/_msearch

header\n
body\n
header\n
body\n

同样,也是用\n分隔

ES查询相关

未完待续...