那些你应该知道的Elasticsearch基本语法随着公司业务发展，mysql 已经满足不了大数量的业务场景，很多公司会

持续创作，加速成长！这是我参与「掘金日新计划 · 6 月更文挑战」的第2天，点击查看活动详情

随着公司业务发展，mysql 已经满足不了大数量的业务场景，很多公司会把数据迁移到 Elasticsearch，本篇主要总结我们在日常项目中会经常使用的基础语法，内容主要是对于数据的一些增删改查操作进行汇总，梳理，方便大家查阅。

1. 创建索引

1.1 创建索引 index_learn_v1

在 Mysql，我们都知道有建表的概念，在 ES 中也同样需要，只不会换了一种叫法：索引。so, 在 ES 中，一切的开始为创建索引，自定义字段（如果你不自定义也会自动生成字段，但是动态生成的字段默认多字段类型text,后续可能存在不满足业务需求，建议一开始建索引就把字段类型定好）。

创建索引名为 index_learn_v1 , 别名为 index_learn ，同时设置主分片数为 1 ，副本为 1 。索引中的字段有name,age ，birth。

curl -XPUT "http://localhost:9200/index_learn_v1" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "name":{
        "type": "text"
      },
      "age":{
        "type": "keyword"
      },
      "birth":{
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  },
  "aliases： 别名（想等于以前大家都会取个曾用名一样）": {
    "index_learn": {
    }
  }

number_of_shards：主分片（设置完不可变更，根据数据量等合理设置）
number_of_replicas：副本（可调整）
aliases：别名（想等于以前大家都会取个曾用名一样）

1.2 新增字段

场景：当索引已经在线上运行了，这个时候，来了个需求：数据新增个状态为，启用和停用，这个时候，两种方案选择：

重建索引，所有数据迁移（其实大可不必）
新增 enabled字段

curl -XPUT "http://localhost:9200/index_learn_v1/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "enabled":{
      "type":"keyword"
      }
    }
  }'

注意：

   如果要字段类型变更，就需要重建索引并且进行数据迁移reindex。

2. 新增数据

2.1 新增一遍文档

curl -XPUT "http://localhost:9200/index_learn/_doc/1" -H 'Content-Type: application/json' -d'
{
  "name": "huang96",
  "age": 18,
  "birth": "1996-01-01 01:01:01",
  "enabled": "1"
}'

当不指定 id 时，es 新建索引会自动生成 id 。当新增的 id 不存在创建新文档，否则，就是先删除现有的文档，在创建新的文档，版本号加 1.当新增字段为创建索引未定义类型时，dynamic 设置为 true 时，会自动创建该字段，该字段类型为多字段类型（text），子字段 keyword

create 新的文档，这种方式比较少用，了解即可

curl -XPUT "http://localhost:9200/index_learn/_create/1" -H 'Content-Type: application/json' -d'
{
  "age":"11"
}'

当文档已经存在，新增会出现冲突，失败。

2.2 批量写入文档 Bulk API

批量更新支持 Index，Create ，delete ，update 四种类型操作
操作中单条操作失败，对其他操作无影响
返回的结果包含每一条操作的执行结果

POST _bulk
{"index":{"_index":"index_learn","_id":"3"}}
{"age":"15","name":"qianxu","birth":"2020-01-10 10:10:09"}
{"create":{"_index":"index_learn","_id":"4"}}
{"age":"15","name":"qianxu","birth":"2020-01-10 10:10:09"}
{"update":{"_index":"index_learn","_id":"4"}}
{"doc":{"age":"18"}}
{"delete":{"_index":"index_learn","_id":"2"}}

3. 修改数据

3.1 修改某个字段的数据 _update

更新 id 为 1 的数据中的，age 为 18岁

curl -XPOST "http://localhost:9200/index_learn/_update/1" -H 'Content-Type: application/json' -d'
{
  "doc": {
    "age": "18"
  }
}'

不删除文档，真正实现数据更新

3.2 批量修改某个字段的值 update_by_query

按搜索条件，批量更新 index_learn下所有姓名为 huang96的用户的 age 为16 岁

curl -XPOST "http://localhost:9200/index_learn/_update_by_query" -H 'Content-Type: application/json' -d'
{
  "script": {
    "source": "ctx._source.age = "16"",
    "lang": "painless"
  },
  "query": {
    "term": {
      "name": "huang96"
    }
  }
}'

curl -XPOST "http://192.168.0.170:9200/cms/_update_by_query?conflicts=proceed" -H 'Content-Type: application/json' -d'
{
  "script": {
    "source": "    if(ctx._source["catId"] instanceof String){     String catId = ctx._source["catId"];   
    ArrayList list = new ArrayList();  
    list.add(catId); 
    ctx._source.catId=list;   
    }",
    "lang": "painless"
  }
 
}'

在数据处理场景，批量处理特别有用。

4. 删除数据

4.1 删除单条数据

删除文档 id 为 2 的数据

curl -XDELETE "http://localhost:9200/index_learn/_doc/2"

4.2 批量删除数据 _delete_by_query

按条件批量删除数据，默认删除 100条

curl -XPOST "http://localhost:9200/index_learn/_delete_by_query" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "age":"15"
    }
  }
}'

scroll_size 设置删除个数

POST index_learn/_delete_by_query?scroll_size=5000
{
  "query": {
    "term": {
      "age":"15"
    }
  }
}

注意：全量删除的话，直接 delete 索引名就好，尽量不使用 _delete_by_query 性能差，删除，只是把文档的变更为 deleted 状态，不会马上把文档删掉.

少用，慎用

curl -XPOST "http://localhost:9200/index_learn/_delete_by_query" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  }
}'

5.查询数据

5.1 通过 id 获取文档

curl -XGET "http://localhost:9200/index_learn/_doc/1"

5.2 等于查询（Term Query）

查询年龄等于 18岁

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "age": {
        "value": "18"
      }
    }
  }
}'

5.3 模糊查询（wildcard Query）

查询名字中包含huang 的用户

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "wildcard": {
      "name": {
        "value": "*huang*"
      }
    }
  }
}'

注意：尽量不要使用 wildcard 查询，数据量大时，全表扫描，性能极差，会拖垮整个应用。一被爬虫，cpu 飙升。应用不能承受之重。如果实在有模糊查询的需求，建议使用 text ，字段用分词查询。

不过最新版本的7.9版本，已经对 wildcard 查询的内部进行优化，采用的 ngram的分词方式。

5.4 存在查询（exist query）

查询存在用户状态正常的用户，也就是 enabled 存在的

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "exists": {
      "field": "enabled"
    }
  }
}'

5.5 范围查询（range query）

查询出生日期在 1995年到 2000年之前的孩子

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "birth": {
        "gte": "1995-01-01 01:00:01",
        "lte": "1999-12-31 23:59:59"
      }
    }
  }
}'

5.6 组合查询（bool query）最常用的

bool 查询就是多条件组合查询,平常使用最多的查询方式，需重点了解

5.6.1 and 查询（must）

查询姓名为 huang96 同时年龄为18岁的孩子

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "age": {
              "value": "18"
            }
          }
        },
        {
          "term": {
            "name": {
              "value": "huang96"
            }
          }
        }
      ]
    }
  }
}'

5.6.2 or 查询（should）

查询姓名为 huang96 或者年龄为18岁的孩子

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "age": {
              "value": "18"
            }
          }
        },
        {
          "term": {
            "name": {
              "value": "huang96"
            }
          }
        }
      ]
    }
  }
}'

5.6.3 不等于（！）must_not

查询年龄不等于 18岁的孩子

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must_not": [
        {
          "term": {
            "age": {
              "value": "18"
            }
          }
        }
        
      ]
    }
  }
}'

5.6.4 filter 等于查询（不算分）

查询年龄等于18岁的孩子

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "age": {
              "value": "18"
            }
          }
        }
        
      ]
    }
  }
}'

5.6.5 组合计算查询

查询姓名为kimchy, 同时标签是产品经理，年龄大于等于30，小于等于40，同时有足球或者篮球爱好的人。

curl -X POST "localhost:9200/demo/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "name" : "kimchy" }
      },
      "filter": {
        "term" : { "tags" : "production" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 30, "lte" : 40 }
        }
      },
      "should" : [
        { "term" : { "hobby" : "football" } },
        { "term" : { "hobby" : "basketball" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}'

filter 不计算得分，有缓存,查询效率高，业务上如果不是有算分需求，直接用 filter.

5.7 全文查询（match query）

查询name 中存在 “好奇心”的文档

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "name": "好奇心"
    }
  }
}'

注意：

生产环境用得并不多，因为一般内容会被分词。搜索的结果并不准确。这是我们会用下面的短语匹配

5.8 短语匹配（matcch phare）

查询 name 字段存在 “若善用好奇心善” 的文档

curl -XPOST "http://localhost:9200/index_learn/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_phare": {
      "name": "若善用好奇心善"
    }
  }
}'

5.9 多字段查询（multi match query）

多字段查询场景还是挺常见的，业务上经常会存在有个搜索框，但是

curl -XPOST "http://localhost:9200/credit_legal/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "multi_match": {
      "query": "无极限公司",
      "type": "best_fields",
      "operator": "and",
      "fields": [
        "zzmc",
        "tyshxydm",
        "zch",
        "zzjgdm"
      ]
    }
  }
}'

5.10 批量读取 mget

批量操作，可以减少网络连接产生的开销，提高性能

curl -XGET "http://localhost:9200/_mget" -H 'Content-Type: application/json' -d'
{
  "docs": [
    {
      "_index": "index_learn",
      "_id": "1"
    },
    {
      "_index": "test",
      "_id": "1"
    }
  ]
}'

5.11 批量搜索 _msearch

多个索引进行搜索

curl -XPOST "http://localhost:9200/index_learn/_msearch" -H 'Content-Type: application/json' -d'
{}
{"query":{"match_all":{}},"from":0,"size":10}
{"index":"testindex"}
{"query":{"match_all":{}}}
'

6.数据迁移

6.1 _reindex 从cms_v1迁移到v2

curl -XPOST "http://localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "cms_v1",
    "type": "_doc"
  },
  "dest": {
    "index": "cms_v2",
    "type": "_doc"
  }
}'

从索引 my-index-000001 中搜索只满足 user.id 为 kimchy 到 my-new-index-000001

POST _reindex
{
  "source": {
    "index": "my-index-000001",
    "query": {
      "term": {
        "user.id": "kimchy"
      }
    }
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

reindex API

结论

以上是我目前工作经常使用的比较基本的 ES 语法，当然还有聚合，备份恢复，跨集群查询等功能，这边暂时不罗列了，对于每个查询的一些细节，后续会有对应的篇章进行讲解。了解以上基本语法，开启我们的 Elasticsearch 之旅。

解锁更多ES DSL 请移步官网

那些你应该知道的Elasticsearch基本语法