ElasticSearch使用过程中遇到的问题

693 阅读3分钟

“携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第10天,点击查看活动详情

总结了一下在ElasticSearch使用过程中遇到的问题,做个记录,避免重复踩坑!!!

1、连接超时

org.springframework.dao.DataAccessResourceFailureException: 30,000 milliseconds timeout on connection

系统默认的请求连接时间是3s,这个参数可以自定义设置:

#连接超时时间
spring.elasticsearch.rest.connection-timeout=100000

2、The number of nested documents has exceeded the allowed limit of [1000].

企业微信截图_b39bbdba-05b5-4928-baa4-5d1a5253c914(1).png 在我构建的索引模型中用到了Nested类型,没有设置Nested字段中对象的数目,往里面存值的时候,超过了系统设置的10000。 为避免Nested造成的索引膨胀,需要灵活结合如下两个参数,合理设置索引或者模板。 更新mapping操作如下:

PUT search_test/_settings
{
  "index.mapping.nested_fields.limit":10
  "index.mapping.nested_objects.limit":5000000
}

index.mapping.nested_fields.limit含义:一个索引中不同Nested字段类型个数。 index.mapping.nested_objects.limit含义:一个Nested字段中对象数目,多了可能导致内存泄漏。

3、深度分页的问题

查询语句:

GET search_testw/_search
{
  "from":10000,
  "size":20
}

返回报错信息:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "search_test",
        "node" : "H6jqhoJlQ9qv-ozWy4ZWRA",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
        }
      }
    ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [10020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
      }
    }
  },
  "status" : 400
}

每次有序的查询都会在每个分片中执行单独的查询,然后进行数据的二次排序,而这个二次排序的过程是发生在heap中的,也就是说当你单次查询的数量越大,那么堆内存中汇总的数据也就越多,对内存的压力也就越大。这里的单次查询的数据量取决于你查询的是第几条数据而不是查询了几条数据,比如你希望查询的是第10001-10100这一百条数据,但是ES必须将前10100全部取出进行二次查询。因此,如果查询的数据排序越靠后,就越容易导致OOM(Out Of Memory)情况的发生,频繁的深分页查询会导致频繁的FGC。
对结果排序的成本随分页的深度成指数上升
ES为了避免用户在不了解其内部原理的情况下而做出错误的操作,设置了一个阈值,即 max_result_window,其默认值为10000,其作用是为了保护堆内存不被错误操作导致溢出。

PUT _settings 
{ 
    "index": { 
        "max_result_window": "10000000" 
    } 
}