1. 乐观锁

适用场景：写document有先后顺序的业务需求，例如商品库存：在高并发情况下需要保证数据精准。往往会使用乐观锁

1.1 使用version做乐观锁

新版本不支持直接使用内置version做乐观锁直接使用会报错

PUT /goods/_doc/1?version=3
{
     "name" : "zhonghua yagao",
    "desc" :  "gaoxiao meibai",
    "price" :  18,
    "producer" :"zhonghua producer",
    "tags": [ "meibai", "fangzhu" ] 
}
输出：
{
  "error" : {
    "root_cause" : [
      {
        "type" : "action_request_validation_exception",
        "reason" : "Validation Failed: 1: internal versioning can not be used for optimistic concurrency control. Please use `if_seq_no` and `if_primary_term` instead;"
      }
    ],
    "type" : "action_request_validation_exception",
    "reason" : "Validation Failed: 1: internal versioning can not be used for optimistic concurrency control. Please use `if_seq_no` and `if_primary_term` instead;"
  },
  "status" : 400
}

声明version_type=external才不会报错

PUT /goods/_doc/1?version=7&version_type=external
{
    "name" : "zhonghua yagao",
    "desc" :  "gaoxiao meibai",
    "price" :  18,
    "producer" :"zhonghua producer",
    "tags": [ "meibai", "fangzhu" ] 
}
输出：
{
  "_index" : "goods",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 7,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 20,
  "_primary_term" : 8
}

version_type=external的作用是输入的version需要比document当前的version大，文档操作才会成功，否则会报409

1.2 使用seq_no和primary_term做乐观锁

PUT /goods/_doc/1?if_seq_no=18&if_primary_term=8
{
    "name" : "zhonghua yagao",
    "desc" :  "gaoxiao meibai",
    "price" :  18,
    "producer" :"zhonghua producer",
    "tags": [ "meibai", "fangzhu" ] 
}
输出：
{
  "_index" : "goods",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 6,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 19,
  "_primary_term" : 8
}

当seq_no和primary_term对不上的时候，输出错误：

{
  "error" : {
    "root_cause" : [
      {
        "type" : "version_conflict_engine_exception",
        "reason" : "[1]: version conflict, required seqNo [18], primary term [8]. current document has seqNo [19] and primary term [8]",
        "index_uuid" : "_f8Bi42fTnymwg0vMsy4Kg",
        "shard" : "0",
        "index" : "goods"
      }
    ],
    "type" : "version_conflict_engine_exception",
    "reason" : "[1]: version conflict, required seqNo [18], primary term [8]. current document has seqNo [19] and primary term [8]",
    "index_uuid" : "_f8Bi42fTnymwg0vMsy4Kg",
    "shard" : "0",
    "index" : "goods"
  },
  "status" : 409
}

2. 使用内置的groovy脚本语言

添加测试数据

PUT /test_index/_doc/11
{
  "num": 100,
  "tags": []
}

2.1 内置脚本

把groovy脚本语言写在请求体里

2.1.1 fied自减

POST /test_index/_update/11
{
   "script" : "ctx._source.num-=1"
}
输出：
{
  "_index" : "test_index",
  "_type" : "_doc",
  "_id" : "11",
  "_version" : 5,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 5,
  "_primary_term" : 1
}

ctx._source.num表示获取ducument中num字段的值

2.1.2 满足条件时删除

POST /test_index/_update/11 { "script" : "ctx.op = ctx._source.num == 0 ? 'delete' : 'none'" }

2.1.3 upsert操作

如果指定的document不存在，就执行upsert中的初始化操作；如果指定的document存在，就执行doc或者script指定的partial update操作

POST /test_index/_update/11
{
   "script" : "ctx._source.num+=1",
   "upsert": {
       "num": 0,
       "tags": []
   }
}

2.2 外置脚本（`高版本不支持`）

高版本不支持直接从外部文件读取脚本。

config\scripts文件夹里新建文件test-scripts.groocy，低版本的可以把groovy脚本语言写在里面让ES从外部读取脚本

2.2.1 fied自减

在件test-scripts.groocy文件添加脚本

ctx._source.num-=add_num

POST /test_index/_update/11
{
  "script": {
    "lang": "groovy", 
    "file": "test-scripts",
    "params": {
      "add_num": 1
    }
  }
}

高版本会报错

{
  "error" : {
    "root_cause" : [
      {
        "type" : "x_content_parse_exception",
        "reason" : "[4:5] [script] unknown field [file]"
      }
    ],
    "type" : "x_content_parse_exception",
    "reason" : "[4:13] [UpdateRequest] failed to parse field [script]",
    "caused_by" : {
      "type" : "x_content_parse_exception",
      "reason" : "[4:5] [script] unknown field [file]"
    }
  },
  "status" : 400
}

3. 批量操作

3.1 批量查询（mget）

有些业务场景可能会循环查询N次数据，发送N次网络请求，这个开销还是很大的。

如果进行批量查询的话，查询N次数据，把它们合成一条查询语句，就只要发送1次网络请求，网络请求的性能开销缩减N倍

3.1.1 查询多条id

语法1

GET /_mget
{
  "docs":[
    {
      "_index":"goods",
      "_id":1
    },
        {
      "_index":"goods",
      "_id":2
    }
  ]
}

语法2

GET /goods/_mget
{
  "ids":[1,2]
}

3.1.2 查询多个index

GET /_mget
{
  "docs":[
    {
      "_index":"goods",
      "_id":1
    },
        {
      "_index":"goods",
      "_id":2
    },
    {
      "_index":"test_index",
      "_id":1
    },
        {
      "_index":"test_index",
      "_id":2
    }
  ]
}

3.2 批量增删改（bulk）

3.2.1 bulk说明

bulk说明：

bulk的每一个增删改操作都必须在一行里以json的方式表示，而不是那种多行的标准json语法；
bulk的数据也要作为单独的一行来传参，操作作为单独行、数据也要作为单独行
任意一个操作失败，不会影响其他操作，会在结果里返回异常日志
bulk请求会加载到内存里，如果太大的话，性能反而会下降，因此需要反复尝试找到最佳的bulk大小。一般从1000~5000条数据尝试逐渐增加。大小的话，最好是在5~15MB之间。
bulk中的每个操作都可能会转发到不同的node的shard去执行

错误示范：

POST /_bulk
{
  "delete":{
    "_index":"test_index",
    "_id":2
  }
}
报错：
{
  "error" : {
    "root_cause" : [
      {
        "type" : "json_e_o_f_exception",
        "reason" : "Unexpected end-of-input: expected close marker for Object (start marker at [Source: (org.elasticsearch.transport.netty4.ByteBufStreamInput); line: 1, column: 1])\n at [Source: (org.elasticsearch.transport.netty4.ByteBufStreamInput); line: 1, column: 2]"
      }
    ],
    "type" : "json_e_o_f_exception",
    "reason" : "Unexpected end-of-input: expected close marker for Object (start marker at [Source: (org.elasticsearch.transport.netty4.ByteBufStreamInput); line: 1, column: 1])\n at [Source: (org.elasticsearch.transport.netty4.ByteBufStreamInput); line: 1, column: 2]"
  },
  "status" : 400
}

3.2.2 bulk操作

delete：删除文档 -- 一行操作json
create：强制创建文档（和PUT /index/type/id/_create效果一样） -- 一行操作json、一行数据json
index：普通的put操作，可以是创建文档，也可以是全量替换文档 -- 一行操作json、一行数据json
update：执行的partial update操作 -- 一行操作json、一行数据json

POST /_bulk
{"delete":{"_index":"test_index","_id":"3"}} 
{"create":{"_index":"test_index","_id":"12" }}
{"test_field":"test12"}
{"index":{"_index":"test_index","_id":"2"}}
{"test_field":"replaced test2"}
{"update":{"_index":"test_index","_id":"1","retry_on_conflict":3}}
{"doc" : {"test_field2":"bulk test1"}}

输出：
{
  "took" : 100,
  "errors" : false,
  "items" : [
    {
      "delete" : {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "3",
        "_version" : 1,
        "result" : "not_found",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 7,
        "_primary_term" : 2,
        "status" : 404
      }
    },
    {
      "create" : {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "12",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 8,
        "_primary_term" : 2,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 9,
        "_primary_term" : 2,
        "status" : 201
      }
    },
    {
      "update" : {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 10,
        "_primary_term" : 2,
        "status" : 200
      }
    }
  ]
}

3.2.3 bulk数组操作

格式：

{"action": {"meta"}}
{"data"}
{"action": {"meta"}}
{"data"}

bulk支持阅读性比较好的json数组格式，以数组的方式来执行，流程和上面的不太一样。

将json数组解析为JSONArray对象，这个时候会把拷贝一份一样的完整数据在内>存中，一份数据是jsonObject，一份数据是JSONArray对象

解析json数组里的每个jsonObject，对每个请求中的document进行路由

为路由到同一个shard上的多个请求，创建一个请求数组

将这个请求数组序列化

将序列化后的请求数组发送到对应的节点上去

缺点： 耗费更多内存，更多的jvm gc开销

bulk极端操作例子：

上文提到到bulk size最佳大小，一般建议在几千条左右，大小在10MB左右。假设有100个bulk请求发送到了一个节点上去，每个请求需要占用10MB内存，100个请求就是1GB，每个请求的json都copy一份为jsonarray对象，此时内存中的占用就会翻倍占用2GB的内存。甚至还不止，因为弄成jsonarray之后，还可能会多搞一些其他的数据结构，2GB+的内存占用。

机器的内存都是有限的，假如bulk数组操作占用了太多内存，重要的搜索请求、分析请求就分不到足够的内存，就会导致性能急速下降、同时也会导致java虚拟机的垃圾回收次数更多、更频繁，每次要回收的垃圾对象更多，耗费的时间更多，导致es的java虚拟机停止工作线程的时间更多

3.2.2.1 bulk奇特格式

{"action": {"meta"}}\n
{"data"}\n
{"action": {"meta"}}\n
{"data"}\n

优点：

按照换行符切割json，保留了良好的可读性
这种格式不会转换为json对象，所以不会出现相同数据拷贝，性能更高
直接将对应的json发送到node上去

4. multi-index搜索可以一次搜索多条

multi-index搜索可以指定多个index

搜索所有index

GET /_search

搜索一个index

GET /index1/_search

搜索两个index

GET /index1,index2/_search

通配符匹配index

GET /good*,test*/_search

原理：

client随机连接一个ES节点，这个ES节点被称为Coordinate node（协调节点）；
协调节点会把请求转发到所有的shard、replica里面，这里说的所有shard、replica并不是每个机器的所有shard、replica都会收到请求。举个例子，以shard A有两个replica A，协调节点会根据负载均衡策略在shard A和两个replica A中三选一来执行；
协调节点会收集所有shard、replica返回的数据，排序返回给client；

ES入门篇6--语法与原理2