ES 查询性能优化ES 搜索性能优化，索引设计，查询优化，聚合优化，硬件和集群优化，缓存策略，监控和诊断，批量操作优化，

ES 查询性能优化

1. 索引设计优化

1.1 分片策略

合理设置分片数量：过多分片增加开销，过少分片影响并发
分片大小控制：单个分片建议控制在 10GB-50GB 之间
副本策略：根据查询负载和容错需求调整副本数

{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "index.codec": "best_compression"
  }
}

1.2 字段映射优化

禁用不必要的字段存储：只存储需要的字段
使用合适的数据类型：避免使用 text 类型存储不需要分词的字段
启用 Doc Values：对需要排序和聚合的字段启用

{
  "mappings": {
    "properties": {
      "user_id": {
        "type": "keyword",
        "doc_values": true
      },
      "timestamp": {
        "type": "date",
        "doc_values": true
      },
      "content": {
        "type": "text",
        "analyzer": "standard"
      }
    }
  }
}

2. 查询优化

2.1 查询类型选择

Term Query：精确匹配，性能最优
Range Query：数值范围查询，利用倒排索引
Bool Query：组合查询，注意子查询顺序

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "status.keyword": "published"
          }
        },
        {
          "range": {
            "created_at": {
              "gte": "2023-01-01"
            }
          }
        }
      ],
      "should": [
        {
          "match": {
            "title": "elasticsearch"
          }
        }
      ]
    }
  }
}

2.2 使用 Filter 上下文

将不需要评分的条件移到 filter 中
filter 条件会被缓存，提高重复查询性能

2.3 避免深度分页

使用 search_after 替代 from/size
或使用 scroll API 处理大量数据

{
  "query": {
    "match_all": {}
  },
  "search_after": [1541498400000, "id123"],
  "sort": [
    {"timestamp": "asc"},
    {"_id": "asc"}
  ]
}

3. 聚合优化

3.1 减少聚合桶数量

设置合理的 size 参数
使用 composite 聚合处理大数据集

{
  "aggs": {
    "sales_per_month": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "date": {
              "date_histogram": {
                "field": "timestamp",
                "calendar_interval": "1d"
              }
            }
          }
        ]
      }
    }
  }
}

3.2 预聚合

使用 index lifecycle management (ILM) 预聚合历史数据
考虑使用 transform API 预计算聚合结果

4. 硬件和集群优化

4.1 JVM 调优

堆内存设置不超过 32GB 或物理内存的 50%
启用 G1GC 垃圾收集器

4.2 磁盘 I/O 优化

使用 SSD 存储
合理设置 translog 设置

{
  "settings": {
    "translog.flush_threshold_size": "1gb",
    "index.translog.sync_interval": "30s"
  }
}

5. 缓存策略

5.1 查询缓存

针对频繁查询使用 filter 上下文
合理设置 indices.queries.cache.size

5.2 请求缓存

对于相同查询的重复请求启用缓存

6. 监控和诊断

6.1 性能分析

使用 _profile API 分析查询性能瓶颈
监控慢查询日志

{
  "profile": true,
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}

6.2 使用 Profile API

识别查询中的耗时部分
优化慢查询

7. 批量操作优化

7.1 Bulk 操作

批量大小控制在 5-15MB
并发批量请求数控制

{
  "settings": {
    "refresh_interval": "-1",
    "number_of_replicas": 0
  }
}

7.2 刷新策略

索引期间增大 refresh_interval
索引完成后恢复默认值

8. 高级优化技巧

8.1 预过滤

在聚合前使用 filter 减少数据量
使用 post_filter 对聚合结果进一步过滤

8.2 索引模板

使用索引模板确保一致的映射配置
根据数据特点选择合适的分析器

8.3 字段别名

使用 field aliases 提供灵活的查询接口
避免频繁修改映射结构

这些优化策略可以根据具体业务场景灵活应用，持续监控和调优是保持 ES 性能的关键。