Elastic Search 从安装到简单上手举例：放入一个叫twitter的index 索引往里面添加一条id=1的文

Elastic Search 从安装到简单上手

1.下载镜像

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.3.2

2.运行

开发环境运行，单机版

docker run -it -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.3.2

推荐使用官方的kibana作为生产工具 docker 安装

docker run -it --link  (ES容器id)d69781008e33:elasticsearch -p 5601:5601 kibana:7.3.2

运行完成后，打开 localhost:5601

3.操作数据方式

PUT新增
举例：放入一个叫twitter的index 索引往里面添加一条id=1的文档。格式说明：PUT INDEX/_doc/ID

PUT twitter/_doc/1  
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}
响应
{
  "_index" : "twitter",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

对返回数据结构体解析

字段	说明
_index	INDEX索引,可以理解为库
_type	TYPE类型,可以理解为表
_id	ID，id主键
_version	版本，文档更新的版本，每次更新+1
result	当前执行的结果
_shard	SHARD分片信息
_seq_no	文档版本号
_primary_term

GET twitter   
获取INDEX 信息，包含文档映射关系和分片信息

GET查询

获取索引为twitter id=1的数据

GET twitter/_doc/1/
响应
{
  "_index" : "twitter",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "user" : "GB",
    "uid" : 1,
    "city" : "Beijing",
    "province" : "Beijing",
    "country" : "China"
  }
}

字段	说明
_source	资源，数据部分

获取source中 user 部分的数据

GET twitter/_doc/1?_source=user
响应
{
  "_index" : "twitter",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "user" : "GB"
  }
}

批量获取数据
批量获取时需要指定查询的INDEX和ID,同时也支持查询部分source

GET _mget
{
  "docs":[
      {
        "_index":"twitter",
        "_id":1
      },
      {
        "_index":"twitter",
        "_id":2,
        "_source":["user","city"]
      }
    ]
}
响应
{
  "docs" : [
    {
      "_index" : "twitter",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 0,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "user" : "GB",
        "uid" : 1,
        "city" : "Beijing",
        "province" : "Beijing",
        "country" : "China"
      }
    },
    {
      "_index" : "twitter",
      "_type" : "_doc",
      "_id" : "2",
      "_version" : 1,
      "_seq_no" : 1,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "city" : "Beijing",
        "user" : "GB"
      }
    }
  ]
}

mget的另一种写法

查询index 中id数组
GET twitter/_mget
{
  "ids":["1","2","3"]
}

POST 更新
在进行添加数据时，我们通常使用PUT 并且指定id ，但是如果想要id自动增长，那么我们需要使用POST

POST twitter/_doc
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}

响应
{
  "_index" : "twitter",
  "_type" : "_doc",
  "_id" : "ranetnYB1rIShBts5iqO",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

当我们需要局部更新时，需要加上_update，语法： POST INDEX/_update/ID

POST twitter/_update/1
{
  "doc": {
    "city":"成都"
  }
}
响应
{
  "_index" : "twitter",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "noop",
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "failed" : 0
  }
}

upsert = insert or update ,提供如果存在就更新，如果不存在就插入。语法在于请求体中新增 doc_as_upsert

当前不存在id=5的记录
POST twitter/_update/5
{
  "doc":{
    "user": "GB",
    "uid": 1,
    "city": "Beijing",
    "province": "Beijing",
    "country": "China"
  },
  "doc_as_upsert": true
}
响应
{
  "_index" : "twitter",
  "_type" : "_doc",
  "_id" : "5",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 8,
  "_primary_term" : 1
}
即插入了一条新的记录

HEAD 简单确认

HEAD twitter/_doc/1
200 - OK

DELETE 删除一个文档 DELETE INDEX/_doc/ID

DELETE twitter/_doc/5
响应
{
  "_index" : "twitter",
  "_type" : "_doc",
  "_id" : "5",
  "_version" : 2,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 9,
  "_primary_term" : 1
}

查询后删除使用 POST INDEX/_delete_by_query

POST  twitter/_delete_by_query
{
  "query":{
    "match":{
      "city":"Changsha"
    }
  }
}

PATCH 局部更新

批处理 _bulk 可以通过很多请求封装成一个请求进行批量处理,提高执行效率，注意 payload 不能过长，控制在5M~15M左右

POST _bulk
{ "index" : { "_index" : "twitter", "_id": 1} }
{"user":"双榆树-张三","message":"今儿天气不错啊，出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"东城区-老刘","message":"出发，下一站云南！","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{ "index" : { "_index" : "twitter", "_id": 4} }
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"虹桥-老吴","message":"好友来了都今天我生日，好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

其他命令

Open/close Index 打开和关闭索引，会消耗很多资源，关闭后将阻止对索引的读写
Freeze/unfreeze index 冻结解冻索引，冻结操作会阻止对索引进行写入

查询分类

在ES中分为两类查询，query 和 aggregation 查询。query可以进行全文搜索，aggregation可以进行统计以及分析。当然，在一次请求中即可以进行query也可以同时进行aggregation统计

query

_search

对索引进行全文搜索 GET INDEX/_search
hits 命中，代表查询匹配的结果.value代表查询条数，relation代表关联关系
max_scoure 分数，代表匹配的分数，约接近搜索值，分数越高。

GET twitter/_search
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "user" : "GB",
          "uid" : 1,
          "city" : "Beijing",
          "province" : "Beijing",
          "country" : "China"
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "CRx_x3YBqDTqoEh67ELY",
        "_score" : 1.0,
        "_source" : {
          "user" : "GB",
          "uid" : 1,
          "city" : "Beijing",
          "province" : "Beijing",
          "country" : "China"
        }
      }
    ]
  }
}

同样也支持分页查询，格式为： GET INDEX/_search?size=PAGE_SIZE&from=PAGE_INDEX

GET twitter/_search?size=2&from=1

source_filtering 文档过滤
可以指定返回数据,例如我们只需要返回文档中的 user字段

GET twitter/_search
{
  "_source": ["user"],
  "query": {
    "match_all": {}
  }
}

同样也可以指定不返回的数据字段。通过includes-包含，excludes-排除

GET twitter/_search
{
  "_source": {
    "includes": [
      "user*",
      "location*"
    ],
    "excludes": [
      "*.lat"
    ]
  },
  "query": {
    "match_all": {}
  }
}

_count 计数
使用_count 对查询的数据进行计数

GET twitter/_count
{
  "query": {
    "match": {
      "user": "GB"
    }
  }
}
响应
{
  "count" : 7,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}

以下为ES中的查询特性和功能，暂不进行详细说明，有需求可以查询官方文档和官方博客

1.match query 基于某个字段匹配查询

2.Ids query 通过id 进行查询

3.multi_match 多字段匹配查询

4.Prefix query 基于某个字段前缀匹配

5.Term query 基于某个某个字段精确匹配

compound query 复杂查询，通过以上多种查询糅合在一起的查询方式

位置查询 ES特有的基于map的位置查询，可以进行范围性模糊搜索

通配符

ES支持的通配符

字符	说明	举例
*	完全匹配	' *海 ' ，会查询前缀为所有，后缀为' 海 '

SQL支持

ES支持关系型数据sql，如果想从mysql 转为ES 作为数据存储和查询，可以进行无缝连接格式为 GET /_sql {"query":"SQL语句"}

GET /_sql
{
  "query":"select * from twitter where user='GB'"
}

aggregation

在实际生产应用场景中，我们通常不需要具体的数据，但是需要有个总的面板或者统计分析数据，通常BI部门需要对这部分数据进行分析决策。分析数据需要有聚合框架进行，聚合框架是基于搜索查询提供聚合数据，多个聚合可以进行组合。

Bucketing 存储桶，构建存储桶的一系列聚合，每个存储桶和文档标准紧密相连。在执行聚合时，将上下文中的条件匹配到的文档落入到相应的桶中，结束后，我们会得到一个桶列表，每个桶都有一组属于它的文档。聚合可以在Bucketing 上关联聚合，也就是说聚合是可以进行嵌套。

Metric 指标，聚合可进行跟踪和计算一组文档的指标。

Martrix 矩阵，一系列聚合，它们在多个字段上运行，并根据从请求的文档字段中提取的值生成矩阵结果。

Pipeline 聚合其他聚合的输出及其关联度量的聚合

聚合操作

聚合请求的格式一般为

GET twitter/_search
{
    "size": 0,
    "aggs": {
    "file_name": {
        "aggs_type": {
                    <aggs_body>
            }
        }
    }
}

字段	名称	说明
size	结果大小	若我们不需要关心搜索的具体结果，只需要聚合的结果，那可以设置成0
aggs	聚合	aggs 是aggregations的简称
file_name	聚合字段名称	用户自定义聚合后结果的名称
aggs_type	聚合类型	常见的类型有 range max min avg 等等
aggs_body	聚合类型参数	每种聚合类型的参数不一样

数据准备

DELETE twitter
 
PUT twitter
{
  "mappings": {
    "properties": {
      "DOB": {
        "type": "date"
      },
      "address": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "age": {
        "type": "long"
      },
      "city": {
        "type": "keyword"
      },
      "country": {
        "type": "keyword"
      },
      "location": {
        "type": "geo_point"
      },
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "province": {
        "type": "keyword"
      },
      "uid": {
        "type": "long"
      },
      "user": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}




POST _bulk
{"index":{"_index":"twitter","_id":1}}
{"user":"张三","message":"今儿天气不错啊，出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}, "DOB": "1999-04-01"}
{"index":{"_index":"twitter","_id":2}}
{"user":"老刘","message":"出发，下一站云南！","uid":3,"age":22,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}, "DOB": "1997-04-01"}
{"index":{"_index":"twitter","_id":3}}
{"user":"李四","message":"happy birthday!","uid":4,"age":25,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}, "DOB": "1994-04-01"}
{"index":{"_index":"twitter","_id":4}}
{"user":"老贾","message":"123,gogogo","uid":5,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}, "DOB": "1989-04-01"}
{"index":{"_index":"twitter","_id":5}}
{"user":"老王","message":"Happy BirthDay My Friend!","uid":6,"age":26,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}, "DOB": "1993-04-01"}
{"index":{"_index":"twitter","_id":6}}
{"user":"老吴","message":"好友来了都今天我生日，好友来了,什么 birthday happy 就成!","uid":7,"age":28,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}, "DOB": "1991-04-01"}

下面将介绍几种常见的聚合类型

Range 聚合

举例：求在20~22 22~25 25~30 这几个年龄段的个数
GET twitter/_search
{
  "size": 0,
  "aggs": {
    "ageGroup": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 22
          },
          {
            "from": 22,
            "to": 25
          },{
            "from": 25,
            "to": 30
          }
          
        ]
      }
    }
  }
}

响应：
{
  "aggregations" : {
    "ageGroup" : {
      "buckets" : [
        {
          "key" : "20.0-22.0",
          "from" : 20.0,
          "to" : 22.0,
          "doc_count" : 1
        },
        {
          "key" : "22.0-25.0",
          "from" : 22.0,
          "to" : 25.0,
          "doc_count" : 1
        },
        {
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 3
        }
      ]
    }
  }
}

在响应结果中我们可以看到聚合的结果集中有许多bucket，到这里就能理解之前的概念中的定义。

Max Min Avg求最大最小值和平均值
在上文说到，bucket 是可以进行嵌套的，也就算说可以“聚合中再聚合”。我们可以在求完范围统计中，再在范围中求最大最小值。

GET twitter/_search
{
  "size": 0,
  "aggs": {
    "age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 22
          },
          {
            "from": 22,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "min_age": {
          "min": {
            "field": "age"
          }
        },
        "max_age":{
          "max": {
            "field": "age"
          }
        }
      }
    }
  }
}
响应
{
  "aggregations" : {
    "age" : {
      "buckets" : [
        {
          "key" : "20.0-22.0",
          "from" : 20.0,
          "to" : 22.0,
          "doc_count" : 1,
          "max_age" : {
            "value" : 20.0
          },
          "avg_age" : {
            "value" : 20.0
          },
          "min_age" : {
            "value" : 20.0
          }
        },
        {
          "key" : "22.0-25.0",
          "from" : 22.0,
          "to" : 25.0,
          "doc_count" : 1,
          "max_age" : {
            "value" : 22.0
          },
          "avg_age" : {
            "value" : 22.0
          },
          "min_age" : {
            "value" : 22.0
          }
        },
        {
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 3,
          "max_age" : {
            "value" : 28.0
          },
          "avg_age" : {
            "value" : 26.333333333333332
          },
          "min_age" : {
            "value" : 25.0
          }
        }
      ]
    }
  }
}

在请求中，最上层聚合名ageGroup使用的是 range类型聚合，然后在此基础上接着聚合，聚合名称avg_age 使用 avg类型，聚合名称min_age使用min类型，聚合名称max_age使用max类型。

Filters 聚合

filter 过滤器，每个桶都于与一个过滤器相关联，每个桶中收集的文档都是经过过滤器匹配的

GET twitter/_search
{
  "size": 0,
  "aggs": {
    "by_cities": {
      "filters": {
        "filters": {
          "beijing": {
            "match": {
              "city": "北京"
            }
          },
          "shanghai":{
            "match":{
              "city":"上海"
            }
          }
        }
      }
    }
  }
}
响应
{
  "aggregations" : {
    "by_cities" : {
      "buckets" : {
        "beijing" : {
          "doc_count" : 5
        },
        "shanghai" : {
          "doc_count" : 1
        }
      }
    }
  }
}

在上面的聚合请求中，我们添加了两个过滤器，一个filters 是 beijing，一个filters是shanghai

fiter聚合

单个过滤器的聚合，可以理解为fiters的特殊形式

求北京的评价年龄
GET twitter/_search
{
  "size": 0,
  "aggs":{
    "beijing":{
      "filter": {
        "match":{
          "city":"北京"
        }
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

date_range 聚合

range 是针对数值类型的，date_range是对时间进行聚合

查询从1989年1月到1990年1月出生
GET twitter/_search
{
  "size": 0,
  "aggs": {
    "birth_range": {
      "date_range": {
        "field": "DOB",
        "format": "yyyy-MM", 
        "ranges": [
          {
            "from": "1989-01",
            "to": "1990-01"
          }
        ]
      }
    }
  }
}

terms聚合

通过term聚合查询关键词出现的频率

查询happy birthday出现的频率
GET twitter/_search
{
  "query": {
    "match": {
      "message": "happy birthday"
    }
  },
  "size": 0,
  "aggs": {
    "city": {
      "terms": {
        "field": "city",
        "size": 10
      }
    }
  }
}
响应
{
  "aggregations" : {
    "city" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "北京",
          "doc_count" : 2
        },
        {
          "key" : "上海",
          "doc_count" : 1
        }
      ]
    }
  }
}

在terms聚合的请求体中，size=10 指的是count出来前10位文档，而不是出现10次的文档。聚合时还可以进行order排序。

Histogram Aggregation 柱状图聚合

见名知意，就算为柱状图而生的聚合，在柱状图中是有进行分段的。

GET twitter/_search
{
  "size": 0, 
  "aggs": {
    "age_histogram": {
      "histogram": {
        "field": "age",
        "interval": 2
      }
    }
  }
}
响应
{
  "aggregations" : {
    "age_histogram" : {
      "buckets" : [
        {
          "key" : 20.0,
          "doc_count" : 1
        },
        {
          "key" : 22.0,
          "doc_count" : 1
        },
        {
          "key" : 24.0,
          "doc_count" : 1
        },
        {
          "key" : 26.0,
          "doc_count" : 1
        },
        {
          "key" : 28.0,
          "doc_count" : 1
        },
        {
          "key" : 30.0,
          "doc_count" : 1
        }
      ]
    }
  }
}

interval 是间隔，上面的柱状图是间隔2岁进行柱状聚合。

date_histogram 日期柱状聚合

根据日期或者范围值进行柱状聚合

根据每年来进行柱状聚合
GET twitter/_search
{
  "size": 0,
  "aggs": {
    "age_aggs": {
      "date_histogram": {
        "field": "DOB",
        "interval": "year"
      }
    }
  }
}

cardinality聚合

可以看做是某字段类型的数量，比如city 字段只有北京和上海两种

GET twitter/_search
{
  "size": 0,
  "aggs": {
    "city_num": {
      "cardinality": {
        "field": "city"
      }
    }
  }
}
响应
{
  "aggregations" : {
    "city_num" : {
      "value" : 2
    }
  }
}

stats聚合

获得年龄这个字段整个的统计

GET twitter/_search
{
  "size": 0,
  "aggs": {
    "city_num": {
      "stats": {
        "field": "age"
      }
    }
  }
}

响应
{
  "aggregations" : {
    "city_num" : {
      "count" : 6,
      "min" : 20.0,
      "max" : 30.0,
      "avg" : 25.166666666666668,
      "sum" : 151.0
    }
  }
}

可以看到有count min max avg sum 等等。同样可以使用 extended_stats 进行扩展，可以显示平方差、方差、标准差、标准差范围。

Percentile 聚合

百分比聚合，可以从文档中的字段中计算一个或者多个百分位数，百分位通常用于查找离群值。

GET twitter/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "percentiles": {
        "field": "age",
        "percents": [
          25,
          50,
          75,
          99
        ]
      }
    }
  }
}

响应
{
  "aggregations" : {
    "NAME" : {
      "values" : {
        "25.0" : 22.0,
        "50.0" : 25.5,
        "75.0" : 28.0,
        "99.0" : 30.0
      }
    }
  }
}

可以从聚合的结果得知，25%的人年龄再22岁，50%的人在25.5岁，75%的人在28岁，99%的人在30岁。
有时我们需要明确知道达到某个给定的标准中，有多少占比，这时需要我们用Percentile Ranks聚合

GET twitter/_search
{
  "size": 0,
  "aggs": {
    "age_40_percentage": {
      "percentile_ranks": {
        "field": "age",
        "values": [
          24
        ]
      }
    }
  }
}

响应
{
"aggregations" : {
    "age_40_percentage" : {
      "values" : {
        "24.0" : 37.5
      }
    }
  }
}

聚合查询24岁的总占比，可以看到是37.5%。

Missing聚合

由于ES不像关系型数据库中的字段联系这么紧密，但是我们如果新增了一个字段，想查询出没有该字段的文档，这时候就需要Missing聚合。

Analyzer

ES Analyzer 解析，一个新的文档存储到ES中会经历以下几个部分，首先是 Char Filters对文档字符进行整理，比如html标签，可以说是整流器，接着是Tokenizer 分词器，将字符串进行拆分，可以根据每个字符拆分成token，也可以根据空格拆分，根据英文、中文拆分等等，最后是 Tokenizer Filter对token进行规范或者更改删除。

可以通过_analyze来查询解析器的对字符进行解析的结果

GET twitter/_analyze
{
  "text": ["Happy Birthday"],
  "analyzer": "standard"
}

响应
{
  "tokens" : [
    {
      "token" : "happy",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "birthday",
      "start_offset" : 6,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

对 “Happy Birthday” 使用标准的分词器进行解析，得到的结果可以看到将 Happy 和Birthday 分别拆分，然后建立了索引。

常用的分词器有以下几种

分词器名称	说明	举例
standard	标准分词器，默认的分词器
english	英文分词器	解析后会产生 happi 、 birthdai两个英文词根的分词
whitespace	空格分词器	解析后产生Happy、Birthday
simple	简单分词器	可以识别分隔符，例如‘.’
keyword	关键词分词器	会将整个text作为分词
还可以进行自定义分词器等等。

参考文档：Elastic 中国社区官方博客《Elastic：菜鸟上手指南》 elasticstack.blog.csdn.net/article/det…