记一次ElasticSearch从安装到使用的过程最近在写公司官网，发现存在一个搜索功能，本人之前开发还没有使用过es，

背景

最近在写公司官网，发现存在一个搜索功能，本人之前开发还没有使用过es，搜集了各种文章以及ai辅助之后决定出一份完整的从安装到使用的文章，也为方便自己以后再次使用

因本人初学es所以后续发现有问题会及时修正，暂时只研究了怎么使用

公司服务器是虚拟机隔出来的windows系统，因此本文章只会记录windows系统的相关内容

官网项目数据库使用的是mysql，因此需要同步mysql数据到es

ElasticSearch环境准备

下载

ElasticSearch本体：

从官网下载需要版本，es自带jdk下哪个版本都可以，本人下载了7.17.28

Kibana：

从官网下载，版本必须与es下载的版本一致

LogStash

从官网下载，版本也是必须与es一致

Python

python主要用来写生成logstash同步配置文件的脚本

官网下载Miniconda，下载这个主要是不需要python那么多库，只需要安装自己要的即可

环境变量

系统变量添加ES_HOME,ES_JAVA_HOME,LS_HOME,LS_JAVA_HOME

参考：

ES_HOME        D:\elasticsearch\elasticsearch-7.17.28
ES_JAVA_HOME   D:\elasticsearch\elasticsearch-7.17.28\jdk
LS_HOME        D:\elasticsearch\logstash-7.17.28
LS_JAVA_HOME   D:\elasticsearch\logstash-7.17.28\jdk

启动

ElashSearch启动，插件安装

进入ElashSearch解压目录，在config目录下修改yml文件内容，添加

  #开启远程访问 
  network.host: 0.0.0.0
  #单节点模式 初学者建议设置为此模式
  discovery.type: single-node

修改config/jvm.options文件，设置堆内存大小
```
  -Xms4g
  -Xmx4g
```
在bin目录下双击bat启动

启动完成访问localhost:9200启动成功，如果是linux系统不允许使用root用户，需设置启动用户

分词插件安装（可选）

操作均在bin目录下开启CMD，执行指令即可

#查看已安装插件
elasticsearch-plugin list

#安装插件
elasticsearch-plugin install analysis-icu

#安装ik分词器
elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-ik/7.17.28

#删除插件
elasticsearch-plugin remove analysis-icu

Kibana启动

修改config/kibana.yml，添加如下配置

  server.port: 5601   #指定Kibana服务器监听的端口号
  server.host: "localhost"  #指定Kibana服务器绑定的主机地址
  elasticsearch.hosts: ["http://localhost:9200"]  #指定Kibana连接到的Elasticsearch实例的访问地址
  i18n.locale: "zh-CN"   #将 Kibana 的界面语言设置为简体中文

等待启动完成访问kibana

http://localhost:5601/app/dev_tools#/console

创建索引模板，符合的索引在创建时将自动按照模板创建，本模板是将t_开头的索引在创建时会使用ik分词器，增加两个状态的检索

PUT /_index_template/t_logstash_template
{
  "index_patterns": ["t_*"],
  "template": {
    "settings": {
      "analysis": {
        "analyzer": {
          "default": {
            "type": "ik_max_word"
          }
        }
      }
    },
    "mappings": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_smart"
        },
        "status": {
          "type": "keyword" 
        },
        "del_flag": {
          "type": "keyword" 
        }
      }
    }
  }
}

Python编写配置文件

python所需库,需提前安装：mysql-connecter;os

脚本文件内容如下，需修改内容：数据库连接，mysql驱动路径，last_run路径

生成完成的配置文件根据需要删除多余的表

import mysql.connector
import os

# 数据库配置，请修改对应内容
db_config = {
    'user': '数据库连接用户',
    'password': '数据库连接密码',
    'host': 'localhost',
    'database': '库名',
    'port': 3306
}

# JDBC驱动路径
jdbc_driver_library = r'D:\elasticsearch\mysql-connector-j-9.2.0.jar'

# 连接数据库
conn = mysql.connector.connect(**db_config)

# 清理未读取的结果
while conn.unread_result:
    conn.get_rows()  # 读取并丢弃未读取的结果

cursor = conn.cursor(buffered=True)  # 启用 buffered 模式

# 定义last_run路径
last_run_base_dir = r"D:\elasticsearch\logstash-7.17.28\last_run"
os.makedirs(last_run_base_dir, exist_ok=True)  # 确保目录存在

# 获取所有表名
cursor.execute("SHOW TABLES")
tables = [table[0] for table in cursor.fetchall()]

# 生成 Logstash 配置文件
with open('generated.conf', 'w') as f:
    # 写入 input 部分
    f.write("input {\n")
    for table in tables:
        # 获取主键列名
        cursor.execute(f"""
            SELECT column_name 
            FROM information_schema.key_column_usage 
            WHERE table_schema = '{db_config['database']}' 
            AND table_name = '{table}' 
            AND constraint_name = 'PRIMARY'
        """)
        pk_result = cursor.fetchone()
        if not pk_result:
            print(f"Table {table} has no primary key, skipping...")
            continue
        pk_column = pk_result[0]  # 假设单列主键

        # 写入 jdbc input 配置
        f.write(f"""
          jdbc {{
            jdbc_driver_library => "{jdbc_driver_library}"
            jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
            jdbc_connection_string => "jdbc:mysql://{db_config['host']}:{db_config['port']}/{db_config['database']}?serverTimezone=Asia/Shanghai"
            jdbc_user => "{db_config['user']}"
            jdbc_password => "{db_config['password']}"
            schedule => "* * * * *"
            statement => "SELECT a.{pk_column} AS id, a.*, '{table}' AS table_name FROM {table} a WHERE update_time > :sql_last_value"
            use_column_value => true
            tracking_column => "update_time"
            tracking_column_type => "timestamp"
            last_run_metadata_path => "{os.path.join(last_run_base_dir, f"{table}_metadata")}"
          }}\n""")
        f.write("}\n\n")

        # 写入 filter 部分
        f.write("""
            filter {
              date {
                match => ["update_time", "ISO8601"]
                timezone => "Asia/Shanghai"
              }
              mutate {
                add_field => { "[@metadata][target_index]" => "%{table_name}" }
              }
            }\n\n""")

        # 写入 output 部分
        f.write("""
            output {
              elasticsearch {
                hosts => ["localhost:9200"]
                index => "%{[@metadata][target_index]}"
                document_id => "%{id}"
              }
            }\n""")

cursor.close()
conn.close()

生成的conf文件在LogStash启动时指定，本人移动到了LogStash的config目录下

LogStash启动

mysql驱动包

在maven下载mysql-connector-j，本人下载的最新驱动包9.2.0

放入指定目录，目录需与python脚本所写的目录一致

启动LogStash

在LogStash的bin目录打开cmd

执行指令启动

logstash -f "D:\elasticsearch\logstash-7.17.28\config\generated.conf"

执行完后数据将同步到es,后续根据配置文件将一分钟同步一次数据

Kibana设置

点击左上角三横线打开菜单，点击最底下的Stack Management进入管理菜单

点击Kibana下的索引模式，界面内点击创建索引模式，右侧数据为同步的表，每一张表都是一条索引

t_*是我已经创建的索引模式，在查询时将会匹配所有符合的索引进行查询

查询测试：

Springboot接入ES查询数据

引入pom

注意springboot版本，我的是2.5.X，支持7.17.28,具体需参考es官网配置图

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.17.28</version>
</dependency>

yml配置

spring:
    elasticsearch:
        uris: http://localhost:9200
        connection-timeout: 3s

创建实体

import lombok.Data;
import nonapi.io.github.classgraph.json.Id;
import org.springframework.data.elasticsearch.annotations.Document;

@Data
@Document(indexName = "t_*")
public class TJnswTitleElastic {

    @Id
    private Long id;

    private String title;

    private String index;
}

查询

部分代码：

@Autowired
private ElasticsearchRestTemplate elasticsearchRestTemplate;

@Override
public List<TJnswTitleElastic> selectTJnswTitleElasticByPkTitle(String title) {

    // 构建布尔查询
    BoolQueryBuilder boolQuery = new BoolQueryBuilder();

    // 添加 match 查询（使用 ik_max_word 分词器）
    MatchQueryBuilder titleQuery = new MatchQueryBuilder("title", title)
            .analyzer("ik_max_word");
    boolQuery.must(titleQuery);

    // 添加 term 查询到 filter 中（不计算相关性分数，性能更高）
    boolQuery.filter(new TermQueryBuilder("status", "1"));  // 使用 filter 替代 must
    boolQuery.filter(new TermQueryBuilder("del_flag", false)); // 使用 filter 替代 must

    // 构建查询
    NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
            .withQuery(boolQuery)
            .withSourceFilter(new FetchSourceFilter(new String[]{"title"}, null))
            .withPageable(PageRequest.of(0, 10))
            .build();

    // 执行查询
    SearchHits<TJnswTitleElastic> searchHits = elasticsearchRestTemplate.search(searchQuery, TJnswTitleElastic.class);

    // 提取结果并设置 _index 字段
    return searchHits.stream()
            .map(hit -> {
                TJnswTitleElastic entity = hit.getContent();
                entity.setIndex(hit.getIndex()); // 设置 _index 字段
                return entity;
            })
            .collect(Collectors.toList());
}

}

查询验证

后记

我发现删除的数据不会被同步，因此所有的表如果涉及删除需要逻辑删除，字段需统一

官网文章存在是否发布的功能，未发布的不可以显示，es又是所有数据同步，搜索时需注意这部分

学习之路任重道远，在es上还是起步阶段，共勉

问题发现

经过上方的处理，发现一个很致命的问题，就是首次查询速度过慢，因为mysql表越多，索引就越多，因此分片就越多，在搜索时如果没缓存则会扫描所有分片，我30多个表，搜索要花费6-7秒，实在是太长了

于是经过一整天的研究，在一开始想办法延长es缓存无果后，决定在logstash同步时，将所有表数据全部同步到一个索引内，且只保留需要的字段，多余的不同步，现在首次查询时间在2秒左右

那就开始修改吧

Python脚本

改动主要在filter跟output内，input修改了查询语句，仅保留需要的字段

我这边参与搜索的字段只有title字段

因我直接在前面生成的generated.conf文件内做了修改，这个脚本没有验证，理论上不会有问题（手动滑稽）

import mysql.connector
import os

# 数据库配置
db_config = {
    'user': 'root',
    'password': 'k8OLOg&%',
    'host': 'localhost',
    'database': 'jnsw_official_website',
    'port': 3306
}

# JDBC驱动路径
jdbc_driver_library = r'D:\elasticsearch\mysql-connector-j-9.2.0.jar'

# 连接数据库
conn = mysql.connector.connect(**db_config)

# 清理未读取的结果
while conn.unread_result:
    conn.get_rows()  # 读取并丢弃未读取的结果

cursor = conn.cursor(buffered=True)  # 启用 buffered 模式

# 定义last_run路径
last_run_base_dir = r"D:\elasticsearch\logstash-7.17.28\last_run"
os.makedirs(last_run_base_dir, exist_ok=True)  # 确保目录存在

# 获取所有表名
cursor.execute("SHOW TABLES")
tables = [table[0] for table in cursor.fetchall()]

# 生成 Logstash 配置文件
with open('generated.conf', 'w') as f:
    # 写入 input 部分
    f.write("input {\n")
    for table in tables:
        # 获取主键列名
        cursor.execute(f"""
            SELECT column_name 
            FROM information_schema.key_column_usage 
            WHERE table_schema = '{db_config['database']}' 
            AND table_name = '{table}' 
            AND constraint_name = 'PRIMARY'
        """)
        pk_result = cursor.fetchone()
        if not pk_result:
            print(f"Table {table} has no primary key, skipping...")
            continue
        pk_column = pk_result[0]  # 假设单列主键

        # 写入 jdbc input 配置
        f.write(f"""
          jdbc {{
            jdbc_driver_library => "{jdbc_driver_library}"
            jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
            jdbc_connection_string => "jdbc:mysql://{db_config['host']}:{db_config['port']}/{db_config['database']}?serverTimezone=Asia/Shanghai"
            jdbc_user => "{db_config['user']}"
            jdbc_password => "{db_config['password']}"
            schedule => "* * * * *"
            statement => "SELECT a.{pk_column} AS pk,'{pk_column}' as keyId, a.title,a.status,a.del_flag,a.update_time, '{table}' AS table_name FROM {table} a WHERE update_time > :sql_last_value"
            use_column_value => true
            tracking_column => "update_time"
            tracking_column_type => "timestamp"
            last_run_metadata_path => "{os.path.join(last_run_base_dir, f"{table}_metadata")}"
          }}\n""")
        f.write("}\n\n")

        # 写入 filter 部分
        f.write("""
            filter {
              date {
                match => ["update_time", "ISO8601"]
                timezone => "Asia/Shanghai"
                target => "@timestamp"
              }

              # 生成唯一文档ID
              mutate {
                add_field => {
                  "[@metadata][document_id]" => "%{table_name}_%{pk}"
                  "doc_type" => "%{table_name}"
                }
              }
            }""")

        # 写入 output 部分
        f.write("""
            elasticsearch {
                hosts => ["localhost:9200"]
                index => "jnsw_unified_data"
                document_id => "%{[@metadata][document_id]}"
                template => "D:/elasticsearch/logstash-7.17.28/config/jnsw_template.json"
                template_name => "jnsw_template"
                template_overwrite => true
              }\n""")

cursor.close()
conn.close()

kibana操作

在开发工具内执行删除语句删除原有索引，也可以保留

delete t_*

新增索引模板json文件，按照output路径存放

{
  "index_patterns": ["jnsw_unified_data"],
  "settings": {
    "number_of_shards": 3,
    "analysis": {
      "analyzer": {
        "default": {
          "type": "ik_max_word"
        }
      }
    }
  },
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "doc_type": {
        "type": "keyword"
      },
      "title": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "status": {
        "type": "keyword"
      },
      "del_flag": {
        "type": "boolean"
      },
      "update_time": {
        "type": "date"
      },
      "id": {
        "type": "keyword"
      }
    }
  }
}

logstash启动

首先删除last_run目录下所有文件，以防同步数据出现问题
启动 logstash -f "D:\elasticsearch\logstash-7.17.28\config\generated.conf"

java使用

实体修改

    package com.jnsw.official.domain;

    import lombok.Data;
    import org.springframework.data.elasticsearch.annotations.Document;

    @Data
    @Document(indexName = "jnsw_unified_data")
    public class TJnswTitleElastic {

        private String id;

        private Long pk;

        private String title;

        private String keyid;

        private String table_name;

        private String index;
    }

查询修改

// 构建布尔查询
BoolQueryBuilder boolQuery = new BoolQueryBuilder();

// 添加 match 查询（使用 ik_max_word 分词器）
MatchQueryBuilder titleQuery = new MatchQueryBuilder("title", title)
        .analyzer("ik_max_word");
boolQuery.must(titleQuery);

// 添加 term 查询到 filter 中（不计算相关性分数，性能更高）
boolQuery.filter(new TermQueryBuilder("status", "1"));  // 使用 filter 替代 must
boolQuery.filter(new TermQueryBuilder("del_flag", false)); // 使用 filter 替代 must

// 指定返回的字段
String[] includes = {"pk","title", "keyid", "table_name"}; // 需要返回的字段
String[] excludes = {}; // 不需要排除的字段

// 构建查询
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
        .withQuery(boolQuery)
        .withSourceFilter(new FetchSourceFilter(includes, excludes))
        .withPageable(PageRequest.of(0, 10))
        .build();



// 执行查询
SearchHits<TJnswTitleElastic> searchHits = elasticsearchRestTemplate.search(searchQuery, TJnswTitleElastic.class);

// 提取结果并设置 _index 字段
return searchHits.stream()
        .map(hit -> {
            TJnswTitleElastic entity = hit.getContent();
            entity.setIndex(hit.getIndex()); // 设置 _index 字段
            return entity;
        })
        .collect(Collectors.toList());

关于经常es缓存失效导致查询过慢的问题

根本原因是logstash的同步，虽然数据可能并不会有更新，但还是会触发段合并，导致缓存失效

简单的办法只能是时间换空间，拉长同步的时间间隔，搜索功能并不需要特别及时的数据，一小时的间隔也足够使用，并在期间进行查询预热进行补偿

鉴于项目情况与数据量，采用logstash同步方案是最合理且经济有效的方法

还有一种办法可以搞定这种问题，就是通过程序，向es推送数据，这种增量更新的方法，不会导致logstash的问题，缓存长期有效。

如果数据量大，要考虑耦合的情况，还要考虑双写的问题，需要异步推送数据，还要引入mq，还要考虑失败重试等等问题，技术门槛更高了，数据大的情况再考虑这个方案

总结

本次es使用算是告一段落了，现在的结果比较满意，后续的优化应针对该索引，在内存上做文章，优化后会继续更新