elasticsearch入门看这篇就够了基本介绍 mysql用作持久化存储，elasticsearch用作检索，相较于

基本介绍

mysql用作持久化存储，elasticsearch用作检索，相较于mysql中的数据库、表和记录，elasticsearch的概念为：index库>type表>document文档

index索引

动词：相当于mysql的insert
名词：相当于mysql的db

Type类型

在index中，可以定义一个或多个类型，类似于mysql的table，每一种类型的数据放在一起

Document文档

保存在某个index下，某种type的一个数据document，文档是json格式的，document就像是mysql中的某个table里面的内容。每一行对应的列叫属性。

为什么elasticsearch搜索快？倒排索引

简单举个列子，例如我们想要保存如下的记录

红海行动
探索红海行动
红海特别行动
红海记录片
特工红海特别探索

将内容分词就记录到索引中

检索：

（1）、红海特工行动？查出后计算相关性得分：3号记录命中了2次，且3号本身才有3个单词，2/3，所以3号最匹配

（2）、红海行动？

关系型数据库中两个数据表示是独立的，即使他们里面有相同名称的列也不影响使用，但ES中不是这样的。elasticsearch是基于Lucene开发的搜索引擎，而ES中不同type下名称相同的field最终在Lucene中的处理方式是一样的。
两个不同type下的两个user_name，在ES同一个索引下其实被认为是同一个field，你必须在两个不同的type中定义相同的field映射。否则，不同type中的相同字段名称就会在处理中出现冲突的情况，导致Lucene处理效率下降。
去掉type就是为了提高ES处理数据的效率。
Elasticsearch 7.xURL中的type参数为可选。比如，索引一个文档不再要求提供文档类型。
Elasticsearch 8.x不再支持URL中的type参数。解决：将索引从多类型迁移到单类型，每种类型文档一个独立索引

安装elasticsearch

首先拉取elasticsearch镜像和kibana镜像(elasticsearch数据可视化工具，类似于navicat)

docker pull elasticsearch:7.4.2
docker pull kibana:7.4.2
版本要统一

拉取elasticsearch镜像

拉取kibana镜像

创建elasticsearch的docker实例需要挂载的主机目录

# 将docker里的目录挂载到linux的/mydata目录中
# 修改/mydata就可以改掉docker里的
mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data

配置elasticsearch可以被远程任何机器访问

# es可以被远程任何机器访问
echo "http.host: 0.0.0.0" >/mydata/elasticsearch/config/elasticsearch.yml

# 递归更改权限，es需要访问
chmod -R 777 /mydata/elasticsearch/

启动elasticsearch

# 9200是用户交互端口 9300是集群心跳端口
# -e指定是单阶段运行
# -e指定占用的内存大小，生产时可以设置32G
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e  "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v  /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2 


# 设置开机启动elasticsearch
docker update elasticsearch --restart=always

-name：指定运行起来的elasticsearch实例名称
-p 9200:9200 -p 9300:9300：9200是用户交互端口 9300是集群心跳端口
-e "discovery.type=single-node"：指定是单阶段运行
-e ES_JAVA_OPTS="-Xms64m -Xmx512m：指定占用的内存大小，生产时可以设置32G
-v：指定docker容器实例挂载到linux主机的路径
-d：指定运行docker容器实例使用的镜像版本

测试elasticsearch容器运行是否成功，访问：http:192.168.56.10:9200

查看elasticsearch的部署节点信息，访问：http://192.168.56.10:9200/_cat/nodes

接着安装kibana，这样我们的elasticsearch就会有可视化界面

#ELASTICSEARCH_HOSTS指定elasticsearch的访问地址
# kibana指定了了ES交互端口9200  # 5600为kibana主页端口
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.56.10:9200 -p 5601:5601 -d kibana:7.4.2


# 设置开机启动kibana
docker update kibana  --restart=always

测试访问kibana：http://192.168.56.10:5601/app/kibana

初步检索

GET /_cat/nodes：查看所有节点

GET /_cat/health：查看es健康状况

注：green表示健康值正常

GET /_cat/master：查看主节点

GET/_cat/indicies：查看所有索引，等价于mysql数据库的show databases;

这3个索引是kibana创建的

新增文档

保存一个数据，保存在哪个索引的哪个类型下（哪张数据库哪张表下），保存时用唯一标识指定

一、PUT方式 PUT可以新增也可以修改。PUT必须指定id；由于PUT需要指定id，我们一般用来做修改操作，不指定id会报错。

必须指定id
版本号总会增加

我们想要在customer索引下的external类型下新增id为1的数据

当我们再执行上图的操作时，发现此时会变成修改操作，_version会自增1

seq_no和version的区别：

每个文档的版本号"_version" 起始值都为1 每次对当前文档成功操作后都加1 而序列号"_seq_no"则可以看做是索引的信息在第一次为索引插入数据时为0，每对索引内数据操作成功一次_seq_no加1，并且文档会记录是第几次操作使它成为现在的情况的

二、Post方式

POST新增。如果不指定id，会自动生成id。指定id就会修改这个数据，并新增版本号；

可以不指定id，不指定id时永远为创建
指定不存在的id为创建
指定存在的id为更新，而版本号会根据内容变没变而觉得版本号递增与否

当我们不指定id时，为新增操作，id会自动生成

当我们指定不存在的id时，此时为新增操作

当我们指定存在的id时，此时为更新操作

查看文档

查询customer索引下的external类型下新增id为1的数据

{
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 2,
    "_seq_no": 5,//并发控制字段，每次更新都会+1，用来做乐观锁
    "_primary_term": 1,//同上，主分片重新分配，如重启，就会变化
    "found": true,
    "_source": {
        "name": "John Doe"
    }
}

乐观锁用法：通过“if_seq_no=1&if_primary_term=1”，当序列号匹配的时候，才进行修改，否则不修改。

客户端A想要将id为1的数据修改成张三，但是有个前提条件是修改之前的_seq_no=5&_primary_term=1

客户端B想要将id为1的数据修改成李四，但是有个前提条件也是修改之前的_seq_no=5&_primary_term=1

更新文档

一、通过post方法带了_update方式

我们再次执行上面的更新操作，并且请求的数据一样时，es就不会进行任何操作。

通过post方法带了_update方式进行更新操作时

请求的数据，即想要更新的数据必须放在doc下面
首先es会对比此时想要更新的数据是否和已经存在的数据是一样的，如果是一样的话就不会进行任何操作，即noop，否则才会进行更新操作

二、通过post方法不带_update方式

我们再次执行上面的更新操作，并且请求的数据一样时，es还是会进行更新操作。

请求的数据，即想要更新的数据不用放在doc下面
es不会对比想要更新的数据是否和已经存在的数据是否一样，都会直接执行更新操作，此时的_version和_seq_no都会改变

三、通过put方法

我们再次执行上面的更新操作，并且请求的数据一样时，es还是会进行更新操作。

请求的数据，即想要更新的数据不用放在doc下面
es不会对比想要更新的数据是否和已经存在的数据是否一样，都会直接执行更新操作，此时的_version和_seq_no都会改变

删除文档

删除customer索引下的external类型下新增id为1的数据

删除之后，我们再次执行上面的删除操作，result返回not found

此时我们再查询customer索引下的external类型下新增id为1的数据，found返回false。

我们想要删除整个索引呢，比如删除customer索引（相当于删除整个数据库），我们先来看一下当前es中的所有索引数据，如下图所示

接着我们执行删除customer索引操作

删除后，我们再查询所有的索引，发现此时的customer索引已经不存在了。

注意不能直接删除类型操作，es没有提供直接删除类型的操作，想要删除类型，则需要删除完该类型下面所有的文档。

批量操作

首先打开kibanna的dev tools

例如：我们想要在customer索引下的external类型下批量新增数据

两行为一个整体 ，第一个两行表示插入id为1的文档数据，数据为name=a
{"index":{"_id":"1"}} 
{"name":"a"} 
{"index":{"_id":"2"}} 
{"name":"b"} 
注意格式json和text均不可，要去kibana里Dev Tools

语法格式：

{action:{metadata}}\n
{request body  }\n

{action:{metadata}}\n
{request body  }\n

#! Deprecation: [types removal] Specifying types in bulk requests is deprecated.
{
  "took" : 318,  花费了多少ms
  "errors" : false, 没有发生任何错误
  "items" : [ 每个数据的结果
    {
      "index" : { 保存
        "_index" : "customer", 索引
        "_type" : "external", 类型
        "_id" : "1", 文档
        "_version" : 1, 版本
        "result" : "created", 创建
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201 新建完成
      }
    },
    {
      "index" : { 第二条记录
        "_index" : "customer",
        "_type" : "external",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

注意：这里的批量操作，当发生某一条执行发生失败时，其他的数据仍然能够接着执行，也就是说彼此之间是独立的。bulk api以此按顺序执行所有的action（动作）。如果一个单个的动作因任何原因失败，它将继续处理它后面剩余的动作。当bulk api返回时，它将提供每个动作的状态（与发送的顺序相同），所以您可以检查是否一个指定的动作是否失败了。

接下来我们对于整个索引执行批量操作

POST /_bulk
//删除website索引下类型为blog，id为123的文档数据
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
//创建website索引下类型为blog，id为123的文档数据
{"create":{"_index":"website","_type":"blog","_id":"123"}}
//上面创建id为123文档数据如下：
{"title":"my first blog post"}
//保存website索引下类型为blog的文档数据
{"index":{"_index":"website","_type":"blog"}}
//上面保存操作的文档数据如下：
{"title":"my second blog post"}
//更新website索引下类型为blog，id为123的文档数据
{"update":{"_index":"website","_type":"blog","_id":"123"}}
//上面更新操作的数据如下：
{"doc":{"title":"my updated blog post"}}

#! Deprecation: [types removal] Specifying types in bulk requests is deprecated.
{
  "took" : 304,
  "errors" : false,
  "items" : [
    {
      "delete" : { 删除
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 1,
        "result" : "not_found", 没有该记录
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 404 没有该
      }
    },
    {
      "create" : {  创建
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 2,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {  保存
        "_index" : "website",
        "_type" : "blog",
        "_id" : "5sKNvncBKdY1wAQmeQNo",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "update" : { 更新
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 3,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

最后我们为了后面测试方便，导入多条测试样式数据，数据地址为：github.com/elastic/ela…

POST bank/account/_bulk
上面的数据

进阶检索

Search API

elasticsearch支持两种基本方式检索；

通过REST request uri 发送搜索参数（uri +检索参数）
通过REST request body 来发送它们（uri+请求体）

例子：我们想要检索bank索引下的全部数据，按照文档中的account_number字段升序排序返回。

我们首先使用REST request uri 发送搜索参数（uri +检索参数）方式

请求参数方式检索
GET bank/_search?q=*&sort=account_number:asc
说明：
q=* # 查询所有
sort # 排序字段
asc #升序
## 检索了1000条数据，但是根据相关性算法，只返回10条

检索bank下所有信息，包括type和docs
GET bank/_search

返回的内容：

took：花费多少ms搜索
timed_out：是否超时
_shards：多少分片被搜索了，以及多少成功/失败的搜索分片
max_score：文档相关性最高得分
hits.total.value：多少匹配文档被找到
hits.sort：结果的排序key（列），没有的话按照score排序
hits._score：相关得分 (not applicable when using match_all)

我们接着使用uri+请求体进行检索方式查询，这种方式也叫做DSL，Elasticsearch提供了一个可以执行查询的Json风格的DSL(domain-specific language领域特定语言)。这个被称为Query DSL，该查询语言非常全面。

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

Query DSL

Elasticsearch提供了一个可以执行查询的Json风格的DSL(domain-specific language领域特定语言)。这个被称为Query DSL，该查询语言非常全面。

基本语法格式如下：

#如果针对于某个字段，那么它的结构如下：
{
  QUERY_NAME:{   # 使用的功能
     FIELD_NAME:{  #  功能参数
       ARGUMENT:VALUE,
       ARGUMENT:VALUE,...
      }   
   }
}

示例如下：

GET bank/_search
{
  "query": {  #  查询的字段
    "match_all": {}
  },
  "from": 0,  # 从第几条文档开始查
  "size": 5,  #每页的记录数
  "_source":["balance","firstname"], #只想返回balance和firstname字段
  "sort": [
    {
      "account_number": {  # 返回结果按哪个列排序
        "order": "desc"  # 降序
      }
    }
  ]
}

上面的例子返回的信息如下：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    # hits里面只有条数据
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "999",
        "_score" : null,
        "_source" : {
          "firstname" : "Dorothy",
          "balance" : 6087
        },
        "sort" : [
          999
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "998",
        "_score" : null,
        "_source" : {
          "firstname" : "Letha",
          "balance" : 16869
        },
        "sort" : [
          998
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "997",
        "_score" : null,
        "_source" : {
          "firstname" : "Combs",
          "balance" : 25311
        },
        "sort" : [
          997
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "996",
        "_score" : null,
        "_source" : {
          "firstname" : "Andrews",
          "balance" : 17541
        },
        "sort" : [
          996
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "995",
        "_score" : null,
        "_source" : {
          "firstname" : "Phelps",
          "balance" : 21153
        },
        "sort" : [
          995
        ]
      }
    ]
  }
}

query/match匹配查询

如果是非字符串，会进行精确匹配。如果是字符串，会进行全文检索

一、基本类型（非字符串），精确控制

GET bank/_search
{
  "query": {
    "match": {
      "account_number": "20"
    }
  }
}

match返回account_number=20的数据。

{
  "took" : 24,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "20",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "elinorratliff@scentric.com",
          "city" : "Ribera",
          "state" : "WA"
        }
      }
    ]
  }
}

二、字符串，全文检索：最终会按照评分进行排序，会对检索条件进行分词匹配。

GET bank/_search
{
  "query": {
    "match": {
      "address": "kings"
    }
  }
}

返回的数据如下：

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 5.9908285,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "20",
        "_score" : 5.9908285,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "elinorratliff@scentric.com",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "722",
        "_score" : 5.9908285,
        "_source" : {
          "account_number" : 722,
          "balance" : 27256,
          "firstname" : "Roberts",
          "lastname" : "Beasley",
          "age" : 34,
          "gender" : "F",
          "address" : "305 Kings Hwy",
          "employer" : "Quintity",
          "email" : "robertsbeasley@quintity.com",
          "city" : "Hayden",
          "state" : "PA"
        }
      }
    ]
  }
}

`query/match_phrase` 【不拆分匹配】

将需要匹配的值当成一整个单词（不分词）进行检索

match_phrase：不拆分字符串进行检索
字段.keyword：必须全匹配上才检索成功

一、match_phrase：不拆分字符串进行检索

GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill road"   #  就是说不要匹配只有mill或只有road的，要匹配mill road一整个子串
    }
  }
}

查出address中包含mill road的所有记录，并给出相关性得分

{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 8.926605,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 8.926605,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

二、字段.keyword：必须全匹配上才检索成功

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "990 Mill"  # 字段后面加上 .keyword
    }
  }
}

查询结果，一条也未匹配到

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0, # 因为要求完全equal，所以匹配不到
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

修改匹配条件为990 Mill Road

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "990 Mill Road"  # 正好有这条文档，所以能匹配到
    }
  }
}

查询出一条数据

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1, # 1
      "relation" : "eq"
    },
    "max_score" : 6.5032897,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 6.5032897,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",  # equal
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

文本字段的匹配，使用keyword，匹配的条件就是要显示字段的全部值，要进行精确匹配的。
match_phrase是做短语匹配，只要文本中包含匹配条件，就能匹配到。

query/multi_math【多字段匹配】

例子：state或者address中包含mill，并且在查询过程中，会对于查询条件进行分词。

GET bank/_search
{
  "query": {
    "multi_match": {  # 前面的match仅指定了一个字段。
      "query": "mill",
      "fields": [ # state和address有mill子串  不要求都有
        "state",
        "address"
      ]
    }
  }
}

查询结果：

{
  "took" : 28,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 5.4032025,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",  # 有mill
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"  # 没有mill
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane", # mill
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"  # 没有mill
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "345",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 345,
          "balance" : 9812,
          "firstname" : "Parker",
          "lastname" : "Hines",
          "age" : 38,
          "gender" : "M",
          "address" : "715 Mill Avenue",  # 有mill
          "employer" : "Baluba",
          "email" : "parkerhines@baluba.com",
          "city" : "Blackgum",
          "state" : "KY"  # 没有mill
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "472",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 472,
          "balance" : 25571,
          "firstname" : "Lee",
          "lastname" : "Long",
          "age" : 32,
          "gender" : "F",
          "address" : "288 Mill Street", #有mill
          "employer" : "Comverges",
          "email" : "leelong@comverges.com",
          "city" : "Movico",
          "state" : "MT" # 没有mill
        }
      }
    ]
  }
}

query/bool/must复合查询

复合语句可以合并，任何其他查询语句，包括符合语句。这也就意味着，复合语句之间可以互相嵌套，可以表达非常复杂的逻辑。

must：必须达到must所列举的所有条件
must_not：必须不匹配must_not所列举的所有条件。
should：应该满足should所列举的条件。满足条件最好，不满足也可以，满足得分更高

实例：查询gender=m，并且address=mill的数据

GET bank/_search
{
   "query":{
        "bool":{  # 
             "must":[ # 必须有这些字段
              {"match":{"address":"mill"}},
              {"match":{"gender":"M"}}
             ]
         }
    }
}

查询结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 6.0824604,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 6.0824604,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",  # M
          "address" : "990 Mill Road", # mill
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 6.0824604,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M", # M
          "address" : "198 Mill Lane", # Mill
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "345",
        "_score" : 6.0824604,
        "_source" : {
          "account_number" : 345,
          "balance" : 9812,
          "firstname" : "Parker",
          "lastname" : "Hines",
          "age" : 38,
          "gender" : "M",  # M
          "address" : "715 Mill Avenue",  # Mill
          "employer" : "Baluba",
          "email" : "parkerhines@baluba.com",
          "city" : "Blackgum",
          "state" : "KY"
        }
      }
    ]
  }
}

二、must_not：必须不是指定的情况

例子：查询gender=m，并且address=mill的数据，但是age不等于38的

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "gender": "M" }},
        { "match": {"address": "mill"}}
      ],
      "must_not": [  # 不可以是指定值
        { "match": { "age": "38" }}
      ]
   }
}

查询结果：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 6.0824604,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 6.0824604,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28, # 不是38
          "gender" : "M", #
          "address" : "990 Mill Road", #
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK" 
        }
      }
    ]
  }
}

三、should：应该达到should列举的条件，如果到达会增加相关文档的评分，并不会改变查询的结果。

实例：匹配lastName应该等于Wallace的数据

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 12.585751,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 12.585751,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",  # 因为匹配了should，所以得分第一
          "age" : 28, # 不是18
          "gender" : "M",  # 
          "address" : "990 Mill Road",  # 
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 6.0824604,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "345",
        "_score" : 6.0824604,
        "_source" : {
          "account_number" : 345,
          "balance" : 9812,
          "firstname" : "Parker",
          "lastname" : "Hines",
          "age" : 38,
          "gender" : "M",
          "address" : "715 Mill Avenue",
          "employer" : "Baluba",
          "email" : "parkerhines@baluba.com",
          "city" : "Blackgum",
          "state" : "KY"
        }
      }
    ]
  }
}

query/filter【结果过滤】

must 贡献得分
should 贡献得分
must_not 不贡献得分
filter 不贡献得分

例子：查询年龄在40到50之间的数据

GET bank/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gte": 40,
            "lte": 50
          }
        }
      }
    }
  }
}

query/term

和match一样。匹配某个属性的值。

全文检索字段用match（es一般将字符串进行分词，全文检索的话可以查询出包含分词的数据）
其他非text字段匹配用term，比如数值，这些是做精准匹配的

不要使用term来进行文本字段查询，因为文本字段es都是做分词处理的，如果使用term的话，会使得es分词查询变得困难。es默认存储text值时用分词分析，所以要搜索text值，使用match

使用term匹配查询

GET bank/_search
{
  "query": {
    "term": {
      "balance": 19955
    }
  }
}

查询结果如下：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "291",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 291,
          "balance" : 19955,
          "firstname" : "Lynn",
          "lastname" : "Pollard",
          "age" : 40,
          "gender" : "F",
          "address" : "685 Pierrepont Street",
          "employer" : "Slambda",
          "email" : "lynnpollard@slambda.com",
          "city" : "Mappsville",
          "state" : "ID"
        }
      }
    ]
  }
}

aggs/agg1（聚合)

聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于SQL Group by和SQL聚合函数。

在elasticsearch中，执行搜索返回this（命中结果），并且同时返回聚合结果，把响应中的所有hits（命中结果）分隔开的能力。这是非常强大且有效的，你可以执行查询和多个聚合，并且在一次使用中得到各自的（任何一个的）返回结果，使用一次简洁和简化的API啦避免网络往返。

aggs：执行聚合。聚合语法如下：

"aggs":{ # 聚合
    "aggs_name":{ # 这次聚合的名字，方便展示在结果集中
        "AGG_TYPE":{} # 聚合的类型(avg,term,terms)
     }
}

terms：看值的可能性分布，会合并锁查字段，给出计数即可
avg：看值的分布平均

例：搜索address中包含mill的所有人的年龄分布以及平均年龄

# 分别为包含mill、，平均年龄、
GET bank/_search
{
  "query": { # 查询出包含mill的
    "match": {
      "address": "Mill"
    }
  },
  "aggs": { #基于查询聚合
    "ageAgg": {  # 聚合的名字，随便起
      "terms": { # 看值的可能性分布
        "field": "age",
        "size": 10
      }
    },
    "ageAvg": { 
      "avg": { # 看age值的平均
        "field": "age"
      }
    }
  },
  "size": 0  # 不看详情，即不看hits中具体数据
}

查询返回如下：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4, // 命中4条
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : { // 第一个聚合的结果
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 38,  # age为38的有2条
          "doc_count" : 2
        },
        {
          "key" : 28,
          "doc_count" : 1
        },
        {
          "key" : 32,
          "doc_count" : 1
        }
      ]
    },
    "ageAvg" : { // 第二个聚合的结果
      "value" : 34.0  # age字段的平均值是34
    }
  }
}

复杂子聚合：查出所有年龄分布，并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAgg": {
      "terms": {  #  看age分布
        "field": "age",
        "size": 100
      },
      "aggs": { # 子聚合
        "genderAgg": {
          "terms": { # 看gender分布
            "field": "gender.keyword" # 注意这里，文本字段应该用.keyword
          },
          "aggs": { # 子聚合
            "balanceAvg": {
              "avg": { # 男女的平均薪资
                "field": "balance"
              }
            }
          }
        },
        "ageBalanceAvg": {
          "avg": { #age分布的平均（男女一起得）
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

查询返回如下：

{
  "took" : 119,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 31,
          "doc_count" : 61,
          "genderAgg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "M",
                "doc_count" : 35,
                "balanceAvg" : {
                  "value" : 29565.628571428573
                }
              },
              {
                "key" : "F",
                "doc_count" : 26,
                "balanceAvg" : {
                  "value" : 26626.576923076922
                }
              }
            ]
          },
          "ageBalanceAvg" : {
            "value" : 28312.918032786885
          }
        }
      ]
        .......//省略其他
    }
  }
}

Mapping字段映射

映射定义文档如何被存储和检索的，即定义文档中的字段类型

（1）字符串

text ⽤于全⽂索引，搜索时会自动使用分词器进⾏分词再匹配
keyword 不分词，搜索时需要匹配完整的值

（2）数值型

整型： byte，short，integer，long
浮点型： float, half_float, scaled_float，double

（3）日期类型：date

（4）范围型

integer_range， long_range， float_range，double_range，date_range

gt是大于，lt是小于，e是equals等于。

age_limit的区间包含了此值的文档都算是匹配。

（5）布尔

boolean

（6）对象

object一个对象中可以嵌套对象。

（7）数组

Array

（8）嵌套类型

nested 用于json对象数组

Mapping(映射)是用来定义一个文档（document），以及它所包含的属性（field）是如何存储和索引的。比如：使用maping来定义：

哪些字符串属性应该被看做全文本属性（full text fields）；
哪些属性包含数字，日期或地理位置；
文档中的所有属性是否都嫩被索引（all 配置）；
日期的格式；
自定义映射规则来执行动态添加属性；
查看mapping信息：GET bank/_mapping

  {
    "bank" : {
      "mappings" : {
        "properties" : {
          "account_number" : {
            "type" : "long" # long类型
          },
          "address" : {
            "type" : "text", # 文本类型，会进行全文检索，进行分词
            "fields" : {
              "keyword" : { # addrss.keyword
                "type" : "keyword",  # 该字段必须全部匹配到
                "ignore_above" : 256
              }
            }
          },
          "age" : {
            "type" : "long"
          },
          "balance" : {
            "type" : "long"
          },
          "city" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "email" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "employer" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "firstname" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "gender" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "lastname" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "state" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }

ElasticSearch7-去掉type概念

关系型数据库中两个数据表示是独立的，即使他们里面有相同名称的列也不影响使用，但ES中不是这样的。elasticsearch是基于Lucene开发的搜索引擎，而ES中不同type下名称相同的filed最终在Lucene中的处理方式是一样的。

两个不同type下的两个user_name，在ES同一个索引下其实被认为是同一个field。否则，不同type中的相同字段名称就会在处理中出现冲突的情况，导致Lucene处理效率下降。去掉type就是为了提高ES处理数据的效率。

Elasticsearch 7.x URL中的type参数为可选。比如，索引一个文档不再要求提供文档类型。Elasticsearch 8.x 不再支持URL中的type参数。

解决：

将索引从多类型迁移到单类型，每种类型文档一个独立索引
将已存在的索引下的类型数据，全部迁移到指定位置即可。详见数据迁移

创建映射`PUT /my_index`

第一次存储数据的时候es就猜出了映射

第一次存储数据前可以指定映射

PUT /my_index
{
  "mappings": {
    "properties": {
      "age": {
        "type": "integer"
      },
      "email": {
        "type": "keyword" # 指定为keyword
      },
      "name": {
        "type": "text" # 全文检索。保存时候分词，检索时候进行分词匹配
      }
    }
  }
}

输出：

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_index"
}

查看映射`GET /my_index`

GET /my_index

输出结果

{
  "my_index" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "email" : {
          "type" : "keyword"
        },
        "employee-id" : {
          "type" : "keyword",
          "index" : false
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1588410780774",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "ua0lXhtkQCOmn7Kh3iUu0w",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "my_index"
      }
    }
  }
}

添加映射字段

添加新的字段映射PUT /my_index/_mapping

PUT /my_index/_mapping
{
  "properties": {
    "employee-id": {
      "type": "keyword",
      "index": false # 字段不能被检索。检索
    }
  }
}

这里的 “index”: false，表明新增的字段不能被检索，只是一个冗余字段。

更新映射

对于已经存在的字段映射，我们不能更新。更新必须创建新的索引，进行数据迁移。

GET /bank/_search
查出
"age":{"type":"long"}

想要将年龄修改为integer，我们首先先创建新的索引

PUT /newbank
{
  "mappings": {
    "properties": {
      "account_number": {
        "type": "long"
      },
      "address": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "balance": {
        "type": "long"
      },
      "city": {
        "type": "keyword"
      },
      "email": {
        "type": "keyword"
      },
      "employer": {
        "type": "keyword"
      },
      "firstname": {
        "type": "text"
      },
      "gender": {
        "type": "keyword"
      },
      "lastname": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "state": {
        "type": "keyword"
      }
    }
  }
}

接着我们查看“newbank”的映射：GET /newbank/_mapping

能够看到age的映射类型被修改为了integer.
"age":{"type":"integer"}

将bank中的数据迁移到newbank中

POST _reindex
{
  "source": {
    "index": "bank",
    "type": "account"
  },
  "dest": {
    "index": "newbank"
  }
}

运行输出：

#! Deprecation: [types removal] Specifying types in reindex requests is deprecated.
{
  "took" : 768,
  "timed_out" : false,
  "total" : 1000,
  "updated" : 0,
  "created" : 1000,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

查看newbank中的数据

GET /newbank/_search

输出
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "newbank",
        "_type" : "_doc", # 没有了类型

分词

一个tokenizer（分词器）接收一个字符流，将之分割为独立的tokens（词元，通常是独立的单词），然后输出tokens流。

例如：whitespace tokenizer遇到空白字符时分割文本。它会将文本"Quick brown fox!"分割为[Quick,brown,fox!].该tokenizer（分词器）还负责记录各个terms(词条)的顺序或position位置（用于phrase短语和word proximity词近邻查询），以及term（词条）所代表的原始word（单词）的start（起始）和end（结束）的character offsets（字符串偏移量）（用于高亮显示搜索的内容）。

elasticsearch提供了很多内置的分词器（标准分词器），可以用来构建custom analyzers（自定义分词器）。所有的语言分词，默认使用的都是“Standard Analyzer”，但是这些分词器针对于中文的分词，并不友好。为此需要安装中文的分词器。

例子“使用默认的分词器

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 Brown-Foxes bone."
}

执行结果：

{
  "tokens" : [
    {
      "token" : "the",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "2",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<NUM>",
      "position" : 1
    },
    {
      "token" : "brown",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "foxes",
      "start_offset" : 12,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "bone",
      "start_offset" : 18,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]
}

对于中文，我们需要安装额外的分词器

注意：不能用默认elasticsearch-plugin install xxx.zip进行自动安装

安装`ik`分词器

在前面安装的elasticsearch时，我们已经将elasticsearch容器的“/usr/share/elasticsearch/plugins”目录，映射到宿主机的“/mydata/elasticsearch/plugins”目录下，所以比较方便的做法就是下载“/elasticsearch-analysis-ik-7.4.2.zip”文件，然后解压到该文件夹下即可。安装完毕后，需要重启elasticsearch容器。

首先我们将我们的虚拟机vagrant修改成可以通过密码登录的方式。切换到root用户

vi /etc/ssh/sshd_config

修改PasswordAuthentication yes

然后重启一下sshd服务：service sshd restart，就可以通过其他客户端工具比如xshell连接vagrant虚拟机了。

接着我们进入es容器内部plugin目录

docker exec -it 容器id /bin/bash

当然我们之前安装es的时候，已经在虚拟机外面挂载了该目录，目录在

下载ik分词器地址：github.com/medcl/elast…

接着将下载的ik7.4.2进行解压，上传到目录中

接着修改文件权限，重启es

chmod -R 777 plugins/elasticsearch-analysis-ik-7.4.2

docker restart elasticsearch

测试分词器

使用默认分词器

GET _analyze
{
   "text":"我是中国人"
}

请观察执行结果：

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "中",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "国",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "人",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    }
  ]
}

使用ik_smart分词器

GET _analyze
{
   "analyzer": "ik_smart", 
   "text":"我是中国人"
}

输出结果：

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

使用ik_max_word分词器

GET _analyze
{
   "analyzer": "ik_max_word", 
   "text":"我是中国人"
}

输出结果如下：

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "国人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

自定义词库

解决vagrant虚拟机网络问题，首先进入到cd /etc/sysconfig/network-scripts/

修改ifcfg-eth1 然后重启network服务

接着测试ping baidu.com，发现网络可用

接着我们安装新的yum源，用于下载基本工具，比如wget和unzip（原先默认的yum源地址是国外的，比较慢）

#使用新yum源
curl -o/etc/yum.repos.d/CentOS-Base.repo http://mirrors.163.com/.help/CentOS7-Base-163.repo

生成缓存
yum makecache

接着安装wget，直接yum install -y wget；接着安装unzip，直接yum install -y unzip

回归正题，比如我们要把尚硅谷算作一个词。

我们需要一个远程词库！！！对于远程词库，我们这里不用自己搭建项目返回分词结果，我们这里将分词库放到nginx服务器中来达到同样的效果，那么首先我们就需要安装nginx。

我们先来看一下我们vagrant虚拟机的内存仅剩下155M，不够用

我们将vagrant虚拟机停掉，修改内存

然后我们修改一下之前es启动的内存设置，之前内存设置太小了。

# 9200是用户交互端口 9300是集群心跳端口
# -e指定是单阶段运行
# -e指定占用的内存大小，生产时可以设置32G
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e  "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v  /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2 


# 设置开机启动elasticsearch
docker update elasticsearch --restart=always

启动完es之后，我们接着安装nginx

一、首先随便启动一个nginx实例，只是为了复制出配置

docker run -p 80:80 --name nginx -d nginx:1.10

二、将nginx容器内/etc/nginx配置文件拷贝到当前目录，

#将nginx容器中的/etc/nginx配置文件拷贝到当前目录
docker container cp nginx:/etc/nginx .

三、修改文件名称：mv nginx conf 把这个conf移动到/mydata/nginx下

#将nginx目录重新命名为conf
mv nginx conf
#将conf目录下移动到nginx目录下
mv conf nginx/

四、终止原容器：docker stop nginx，并执行命令删除原容器:docker rm $ContainerId

docker stop nginx
docker rm nginx

五、创建新的nginx，执行如下命令：

docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/log/nginx \
-v /mydata/nginx/conf:/etc/nginx \
-d nginx:1.10

为了简单测试nginx，我们在/mydata/nginx/html目录下创建一个index/html

接下来我们在/mydata/nginx/html目录下创建一个es目录，然后在这个目录下创建一个fenci.txt用于存储分词数据，然后可以这个fenci.txt文件中添加新增的分词，每个分词使用换行，例如：

echo "樱桃萨其马，带你甜蜜入夏" > /mydata/nginx/html/fenci.txt

然后我们修改es配置

修改/usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict"></entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords"></entry>
	<!--用户可以在这里配置远程扩展字典，可以是自己搭建的项目，返回对应的分词结果，也可以是nginx搭建的词库 -->
	<entry key="remote_ext_dict">http://192.168.56.10/es/fenci.txt</entry> 
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

修改完成后，需要重启elasticsearch容器，否则修改不生效。docker restart elasticsearch，对于新增的分词，就可以走我们的自定义的词库了。

GET _analyze
{
   "analyzer": "ik_max_word", 
   "text":"樱桃萨其马，带你甜蜜入夏"
}

输出结果：

{
  "tokens" : [
    {
      "token" : "樱桃",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "萨其马",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "带你",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "甜蜜",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "入夏",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

更新完成后，es只会对于新增的数据用更新分词。历史数据是不会重新分词的。如果想要历史数据重新分词，需要执行：

POST my_index/_update_by_query?conflicts=proceed

SpringBoot整合ElasticSearch

创建一个新的module，如下：

引入依赖：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.3.5.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.hs.gulimall</groupId>
    <artifactId>gulimall-search</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>gulimall-search</name>
    <description>es搜索服务</description>
    <properties>
        <java.version>1.8</java.version>
        <elasticsearch.version>7.4.2</elasticsearch.version>
    </properties>
    <dependencies>
        <!--引入es依赖-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.4.2</version>
        </dependency>
        <dependency>
            <groupId>com.hs.gulimall</groupId>
            <artifactId>gulimall-common</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>RELEASE</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

编写es配置信息GuliESConfig，请求测试项，比如es添加了安全访问规则，访问es需要添加一个安全头，就可以通过requestOptions设置，官方建议把requestOptions创建成单实例

/**
 * @Auther: huangshuai
 * @Date: 2023/11/4 23:36
 * @Description： 请求测试项，比如es添加了安全访问规则，访问es需要添加一个安全头，就可以通过requestOptions设置
 *官方建议把requestOptions创建成单实例
 * @Version：
 */
@Configuration
public class GuliESConfig {

    public static final RequestOptions COMMON_OPTIONS;

    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();

        COMMON_OPTIONS = builder.build();
    }

    @Bean
    public RestHighLevelClient esRestClient() {
        RestClientBuilder builder = null;
        // 可以指定多个es
        builder = RestClient.builder(new HttpHost("192.168.56.10", 9200, "http"));
        RestHighLevelClient client = new RestHighLevelClient(builder);
        return client;
    }
}

非必须项：将服务注册到nacos，application.properties配置文件如下：

spring.cloud.nacos.discovery.server-addr=127.0.0.1:8848
spring.application.name=gulimall-search

主启动类加上服务发现注解

保存数据

@RunWith(SpringRunner.class)
@SpringBootTest
class GulimallSearchApplicationTests {

    @Autowired
    private RestHighLevelClient client;

    /**
     * es保存数据
     */
    @Test
    void indexData() throws IOException {
        // 设置索引名称
        IndexRequest indexRequest = new IndexRequest ("users");
        //设置要存储的文档id
        indexRequest.id("1");
        User user = new User();
        user.setUserName("张三");
        user.setAge(20);
        user.setGender("男");
        //转成json类型数据
        String jsonString = JSON.toJSONString(user);
        //设置要保存的内容，指定数据和类型
        indexRequest.source(jsonString, XContentType.JSON);
        //执行创建索引和保存数据
        IndexResponse index = client.index(indexRequest, GuliESConfig.COMMON_OPTIONS);
        System.out.println(index);
    }
}
@Data
class User {
    private String userName;
    private Integer age;
    private String gender;

}

测试结果如下：

查询数据

例子：我们想要查询address中包含mill的数据

GET /bank/_search
{
  "query": {
    "match": {
      "address": "mill"
    }
  }
}

我们使用RestHighLevelClient查询代码如下：

@Test
public void find() throws IOException {
    // 1 创建检索请求
    SearchRequest searchRequest = new SearchRequest();
    //指定索引
    searchRequest.indices("bank");
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    // 构造检索条件
    //sourceBuilder.query();
    //设置分页
//  sourceBuilder.from();
//  sourceBuilder.size();
    //设置聚合
//  sourceBuilder.aggregation();
    sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
    //打印检索条件
    System.out.println(sourceBuilder.toString());
    searchRequest.source(sourceBuilder);
    // 2 执行检索
    SearchResponse response = client.search(searchRequest, GuliESConfig.COMMON_OPTIONS);
    // 3 分析响应结果
    System.out.println(response.toString());
}

聚合查询：

# 分别为包含mill、，平均年龄、
GET bank/_search
{
  "query": { # 查询出包含mill的
    "match": {
      "address": "Mill"
    }
  },
  "aggs": { #基于查询聚合
    "ageAgg": {  # 聚合的名字，随便起
      "terms": { # 看值的可能性分布
        "field": "age",
        "size": 10
      }
    },
    "balanceAvg": { 
      "avg": { # 看balance值的平均
        "field": "balance"
      }
    }
  }
}

我们使用RestHighLevelClient查询代码如下：

 @Test
public void find() throws IOException {
    // 1 创建检索请求
    SearchRequest searchRequest = new SearchRequest();
    searchRequest.indices("bank");
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    // 构造检索条件
    //sourceBuilder.query();
    //设置分页
//  sourceBuilder.from();
//  sourceBuilder.size();
    //设置聚合
//  sourceBuilder.aggregation();
    sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
    //AggregationBuilders工具类构建AggregationBuilder
    // 构建第一个聚合条件:按照年龄的值分布
    TermsAggregationBuilder agg1 = AggregationBuilders.terms("ageAgg").field("age").size(10);// 聚合名称
// 参数为AggregationBuilder
    sourceBuilder.aggregation(agg1);
    // 构建第二个聚合条件:平均薪资
    AvgAggregationBuilder agg2 = AggregationBuilders.avg("balanceAvg").field("balance");
    sourceBuilder.aggregation(agg2);
    System.out.println("检索条件"+sourceBuilder.toString());
    searchRequest.source(sourceBuilder);
    // 2 执行检索
    SearchResponse response = client.search(searchRequest, GuliESConfig.COMMON_OPTIONS);
    // 3 分析响应结果
    System.out.println(response.toString());
}

我们如果想要将查询es的数据转成JavaBean对象，即将 _source中的数据转成JavaBean对象

可以这样编写

// 3.1 获取java bean
SearchHits hits = response.getHits();
SearchHit[] hits1 = hits.getHits();
for (SearchHit hit : hits1) {
    hit.getId();
    hit.getIndex();
    String sourceAsString = hit.getSourceAsString();
    Account account = JSON.parseObject(sourceAsString, Account.class);
    System.out.println(account);
}

打印控制台如下：

Account(accountNumber=970, balance=19648, firstname=Forbes, lastname=Wallace, age=28, gender=M, address=990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
Account(accountNumber=136, balance=45801, firstname=Winnie, lastname=Holland, age=38, gender=M, address=198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
Account(accountNumber=345, balance=9812, firstname=Parker, lastname=Hines, age=38, gender=M, address=715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
Account(accountNumber=472, balance=25571, firstname=Lee, lastname=Long, age=32, gender=F, address=288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)

Buckets分析信息

// 3.2 获取检索到的分析信息
Aggregations aggregations = response.getAggregations();
Terms agg21 = aggregations.get("agg2");
for (Terms.Bucket bucket : agg21.getBuckets()) {
    String keyAsString = bucket.getKeyAsString();
    System.out.println(keyAsString);
}

elasticsearch入门看这篇就够了

基本介绍

index索引

Type类型

Document文档

安装elasticsearch

初步检索

新增文档

查看文档

更新文档

删除文档

批量操作

进阶检索

Search API

Query DSL

query/match匹配查询

query/match_phrase 【不拆分匹配】

query/multi_math【多字段匹配】

query/bool/must复合查询

query/filter【结果过滤】

query/term

aggs/agg1（聚合)

Mapping字段映射

创建映射PUT /my_index

查看映射GET /my_index

添加映射字段

更新映射

分词

安装ik分词器

测试分词器

自定义词库

SpringBoot整合ElasticSearch

保存数据

查询数据

`query/match_phrase` 【不拆分匹配】

创建映射`PUT /my_index`

查看映射`GET /my_index`

安装`ik`分词器