《大厂面试常用技术栈》系列-ES搜索引擎0基础实践，本地部署、启动、使用背景 Elasticsearch（以下简称ES）

“智慧的开始，是对自己无知的承认。” ——苏格拉底

点关注，不迷路

大家好，我是侠风，以后会持续更新在大厂实践中的一些硬核技术知识分享，希望小可爱们"一键三连"呀。

背景

Elasticsearch（以下简称ES）是我接触的最早的大数据组件之一，基本上在我待过的每一家公司项目上都有使用，不管是大厂还是小厂。

我们学习使用某个技术组件，肯定要非常清楚的知道它的使用场景，那么多开源组件，选型的时候为什么选择ES，而不选择MySQL、Doris、Clickhouse等等呢，他们都有类似的一些功能，但是各个开源组件，所擅长的肯定是不一样的，所以我们结合业务场景，来选择合适的技术，是程序猿一个非常重要的能力。

ES非常擅长的事情就是检索，那可能有同学就会有疑问了，我们最常用的MySQL也可以用来检索呀，比如检索漂亮妹妹的名字

select beautiful_girl_name from human_tabel where face like '%非常漂亮%'

但这种方式有两个很大的缺点，

第一个缺点：凡是这个漂亮妹子的脸部形容词改成了“她的眼睛非常好看，整体五官巨漂亮，赛过刘亦菲”，这样是检索不到，可能你的人生中就会错过一个巨漂亮的妹子图片。

第二个缺点：就是当数据量很大的时候，使用MySQL这种关系型数据库根本写不动，也查不动，无法满足业务诉求；

为什么还要用ES呢？

检索的效率问题，MySQL是基于索引的，而ES是基于倒排索引的，倒排索引的查询效率是非常高的。
检索的性能问题，MySQL是基于磁盘的，而ES是基于内存的，内存的读取速度是磁盘的几十倍，所以ES的检索性能是非常高的。
检索的扩展性问题，MySQL是基于表的，而ES是基于索引的，索引的扩展性是非常高的，可以非常方便的进行横向扩展。
检索的灵活性问题，MySQL是基于SQL的，而ES是基于JSON的，JSON的灵活性是非常高的，可以非常方便的进行灵活的查询。

ES生态：

当前ES的生态组成了Elastic Stack技术服务栈，囊括了大数据处理领域的方方面面，包括数据收集、写入、检索、监控、处理、分析、安全等。

Elasticsearch、Kibana、Beats 和 Logstash 四大金刚负责不同领域。后续有机会再一一分享（求关注）。

本地部署单机ES环境

1、下载地址：www.elastic.co/cn/download…

2、解压缩下载的文件：

# 解压缩文件
tar -zxvf elasticsearch-7.10.0-darwin-x86_64.tar.gz
# 修改文件配置
vim elasticsearch-7.10.0/config/elasticsearch.yml

3、我们将修改对应的配置文件：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: localhost
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
node.master: true
node.data: true
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /Users/zengxuefeng/elasticsearch-7.10.0/data
#
# Path to log files:
#
path.logs: /Users/zengxuefeng/elasticsearch-7.10.0/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 127.0.0.1 #当前节点ip
#
# Set a custom port for HTTP:
#
http.port: 8080
transport.tcp.port: 8081
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["localhost"] #[{当前节点ip}]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["localhost"] #{初始化master节点列表}
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

启动集群

执行命令：

./bin/elasticsearch

启动完成后，我们可以通过浏览器访问: http://localhost:8080 可以看到对应的ES返回信息

集群使用

我们现在已经构建好了一个ES集群，可以进行测试，首先我们需要创建一个索引（类似于MySQL的一个表，仅仅只是类似哈），写入数据，和查询数据。

1、创建索引

curl -H "Content-Type: application/json" -X PUT 'http://127.0.0.1:8080/beautifule_people?pretty' -d '
{
  "mappings": {
    "_source": {
      "enabled": true
    },
    "properties": {
      "id": {
        "type": "integer"
      },
      "address": {
        "type": "keyword"
      },
      "sex": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      },
      "age":{
        "type":"integer"
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "2",
      "number_of_replicas": "1"
    }
  }
}'

2、查询创建的索引

curl 'http://127.0.0.1:8080/_cat/indices?v'

显示已经创建成功。

3、添加数据

curl -H "Content-Type: application/json" -X POST  '127.0.0.1:8080/beautifule_people/_doc/1000' -d '{"id":1000,"address":"胜辛路426号2层222-3号商铺","sex":"male","name":"张三","age":20}'
curl  -H "Content-Type: application/json" -X POST  '127.0.0.1:8080/beautifule_people/_doc/1001' -d '{"id":1001,"address":"浦江镇召楼路1976号","sex":"male","name":"李四","age":21}'
curl  -H "Content-Type: application/json" -X POST  '127.0.0.1:8080/beautifule_people/_doc/1002' -d '{"id":1002,"address":"长宁区北新泾四村5号","sex":"male","name":"王五","age":22}'
curl  -H "Content-Type: application/json" -X POST  '127.0.0.1:8080/beautifule_people/_doc/1003' -d '{"id":1003,"address":"铁路上海虹桥站出发层2F-26","sex":"male","name":"赵六","age":23}'
curl  -H "Content-Type: application/json" -X POST  '127.0.0.1:8080/beautifule_people/_doc/1004' -d '{"id":1004,"address":"金沙江路956号","sex":"female","name":"孙七真漂亮","age":24}'
curl  -H "Content-Type: application/json" -X POST  '127.0.0.1:8080/beautifule_people/_doc/1005' -d '{"id":1005,"address":"川沙路4839号","sex":"male","name":"周八非常好看真美丽","age":25}'

4、查询数据

1、查询所有数据


curl -H "Content-Type: application/json" -X GET '127.0.0.1:8080/beautifule_people/_search' -d '
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ],
      "must_not": [],
      "should": [],
      "filter": []
    }
  },
  "from": 0,
  "size": 10,
  "sort": [],
  "profile": false
}'

2、查询指定条件 name like "非常美丽"

curl -H "Content-Type: application/json" -X GET '127.0.0.1:8080/beautifule_people/_search' -d '
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "非常美丽"
          }
        }
      ],
      "must_not": [],
      "should": [],
      "filter": []
    }
  },
  "from": 0,
  "size": 10,
  "sort": [],
  "profile": false
}'

从上图的返回结果可以看到，我们没有错过这个美丽的姑娘，哈哈，希望生活中的你也不会错过自己喜欢的美丽女孩。

总结

这篇文章简单的介绍了一下ES集群在本地的部署和简单的使用，包括创建索引，查看索引，往索引中写入数据，查询索引中的数据，让大家对ES有一个非常直观的了解。

还有Lucene、ES的写入过程、查询过程、分片、副本复制、ES的慢查询的原因以及如何优化、如何进行性能优化、常用的查询语法等等内容，将在后续的文章中持续输出。

能看到这里的小可爱肯定是超级爱学习的人（牛马），欢迎大家点赞、转发、关注，一键三连呀，这会鼓励我更好的创作。也欢迎大家扫一扫关注我的公众号，可以私下交流。