ES Trying to create too many scroll contexts 问题记录

6,921 阅读2分钟

ElasticSearch scroll

遇到的问题

ES出现Trying to create too many scroll contexts. Must be less than or equal to: [500]异常

产生原因

当高并发场景下,大量需要使用scroll的请求向es获取数据,系统默认最大scroll_id数量为500(可简单理解为最大请求连接数),且一般情况下设置5m(5分钟)为系统自动释放scroll_id时间,导致部分请求没有scroll_id可用,产生报错。

Scroll 是什么

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one data stream or index into a new data stream or index with a different configuration.

当需要搜索大批量数据时,可以使用scroll 进行分批获取。但是分批获取的数据时一次性获取,在分批读取过程中修改的数据是不会被更新到获取结果中。所以不适用于实时请求。

如何使用scroll

通过循环向es获取数据,es返回值中携带scroll_id(唯一值,可能与上一次一样也可能会变更),使用最新scroll_id传递给es,获取到下一批数据。

Note

The initial search request and each subsequent scroll request each return a _scroll_id. While the _scroll_id may change between requests, it doesn’t always change — in any case, only the most recently received _scroll_id should be used.

scroll_id 不一定会在请求过程中改变,我们使用时用es返回数据中最新的scroll_id即可

scroll API

获取 scroll API

{
  "scroll": "1m"  
  "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
  • scroll 设置超时释放时间,1m= 1min
  • scroll_id 查询的id Retrieves the next batch of results for a scrolling search
python 实现(伪):
ES_CLIENT = Elasticsearch([{"host": "192.168.1.1", "port": 9200}])
res = ES_CLIENT.scroll(scroll_id=scroll_id, scroll="5m", request_timeout=100)
# res es单次获取的数据

清除 scroll API

DELETE /_search/scroll
{
  "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}

Clears the search context and results for a scrolling search.

python 实现(伪):
ES_CLIENT = Elasticsearch([{"host": "192.168.1.1", "port": 9200}])
res = ES_CLIENT.clear_scroll(scroll_id=scroll_id)
# res = {"succeeded": True, "num_freed": 3}

注意:我们在使用游标后,需要记得清理游标,特别是在拥有高并发的场景。因为默认游标数量为500.

查看游标状态

GET _nodes/stats/indices/search 
#增加最大游标支持数量
curl -x "" -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d'{
    "persistent" : {
        "search.max_open_scroll_context": 1024
    },
    "transient": {
        "search.max_open_scroll_context": 1024
    }
}'

游标最好的处理方式还是使用后手动销毁,像我们使用的内存,http请求等等需要释放一样,养成良好习惯。

参考

www.elastic.co/guide/en/el…