ElasticSearch异常之 I/O reactor status:STOPPED

4,692 阅读8分钟

同事在使用ES的performRequest方法访问时线上报错:

java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
	at org.apache.http.util.Asserts.check(Asserts.java:46)
	at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
	at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
	at org.elasticsearch.client.RestClient.performRequestAsync(RestClient.java:524)
	at org.elasticsearch.client.RestClient.performRequestAsyncNoCatch(RestClient.java:501)
	at org.elasticsearch.client.RestClient.performRequestAsync(RestClient.java:232)

排查思路: 1.根据错误信息定位代码:

CloseableHttpAsyncClientBase
protected void ensureRunning() {
    final Status currentStatus = this.status.get();
    Asserts.check(currentStatus == Status.ACTIVE, "Request cannot be executed; " +
            "I/O reactor status: %s", currentStatus);
}

从这里得知使用client发起请求时状态已经是STOP,被拒绝。

2.什么情况会client状态会被修改为STOP?

CloseableHttpAsyncClientBase
public void close() {
    if (this.status.compareAndSet(Status.ACTIVE, Status.STOPPED)) {
        if (this.reactorThread != null) {
            try {
                this.connmgr.shutdown();
            } catch (final IOException ex) {
                this.log.error("I/O error shutting down connection manager", ex);
            }
            try {
                this.reactorThread.join();
            } catch (final InterruptedException ex) {
                Thread.currentThread().interrupt();
            }
        }
    }
}

查看CloseableHttpAsyncClientBase源码得知只有链接关闭的时候会将状态置为STOP,当然还要留意子类(CloseableHttpAsyncClientBase是abstract class)中是否有其他操作会改变。

接下来根据项目中的代码查看是否有改变client状态为STOP的方法。

首先是构建客户端:

public static ZSearchRestClient builder(String url, String userName, String passwd) {
    return new ZSearchRestClient.Builder(HttpHost.create(url)).setMaxRetryTimeoutMillis(30 * 1000)
            .setHttpClientConfigCallback(httpClientBuilder -> {
                httpClientBuilder.setMaxConnTotal(5);
                httpClientBuilder.setMaxConnPerRoute(5);
                CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
                credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(userName, passwd));
                httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
                return httpClientBuilder;
            }).setRequestConfigCallback(requestConfigBuilder -> {
                requestConfigBuilder.setConnectTimeout(1000);
                requestConfigBuilder.setSocketTimeout(30000);
                return requestConfigBuilder;
            })
            .build();
}

项目中使用的是内部封装的客户端,在创建出EsClient对象时,使用 InternalHttpAsyncClient作为CloseableHttpAsyncClient的实现,封装到RestClient中,并进一步封装到EsClient。

InternalHttpAsyncClient继承关系.jpg

Es客户端构造:

1. CloseableHttpAsyncClient httpClient = AccessController.doPrivileged(new PrivilegedAction<CloseableHttpAsyncClient>() {
2.     @Override
3.     public CloseableHttpAsyncClient run() {
4.         return createHttpClient();
5.     }
6. });
7. RestClient restClient = new RestClient(httpClient, maxRetryTimeout, defaultHeaders, hosts, pathPrefix, hostsRearrangeTime, reachableTimeOut, nearbyFlag, failureListener);
8. httpClient.start();
9. return new ZSearchRestClient(restClient);

上方代码快第4行createHttpClient方法中最终实例化了一个InternalHttpAsyncClient。

在创建InternalHttpAsyncClient实例时,会先调用CloseableHttpAsyncClientBase的构造方法

1. public InternalHttpAsyncClient(
2.         final NHttpClientConnectionManager connmgr,
3.         final ConnectionReuseStrategy connReuseStrategy,
4.         final ConnectionKeepAliveStrategy keepaliveStrategy,
5.         final ThreadFactory threadFactory,
6.         final NHttpClientEventHandler handler,
7.         final InternalClientExec exec,
8.         final Lookup<CookieSpecProvider> cookieSpecRegistry,
9.         final Lookup<AuthSchemeProvider> authSchemeRegistry,
10.         final CookieStore cookieStore,
11.         final CredentialsProvider credentialsProvider,
12.         final RequestConfig defaultConfig) {
13.     super(connmgr, threadFactory, handler);
14.     this.connmgr = connmgr;
15.     this.connReuseStrategy = connReuseStrategy;
16.     this.keepaliveStrategy = keepaliveStrategy;
17.     this.exec = exec;
18.     this.cookieSpecRegistry = cookieSpecRegistry;
19.     this.authSchemeRegistry = authSchemeRegistry;
20.     this.cookieStore = cookieStore;
21.     this.credentialsProvider = credentialsProvider;
22.     this.defaultConfig = defaultConfig;
23. }

点进第13行的super方法:

CloseableHttpAsyncClientBase
1. public CloseableHttpAsyncClientBase(
2.         final NHttpClientConnectionManager connmgr,
3.         final ThreadFactory threadFactory,
4.         final NHttpClientEventHandler handler) {
5.     super();
6.     this.connmgr = connmgr;
7.     if (threadFactory != null && handler != null) {
8.         this.reactorThread = threadFactory.newThread(new Runnable() {
9. 
10.             @Override
11.             public void run() {
12.                 try {
13.                     final IOEventDispatch ioEventDispatch = new InternalIODispatch(handler);
14.                     connmgr.execute(ioEventDispatch);
15.                 } catch (final Exception ex) {
16.                     log.error("I/O reactor terminated abnormally", ex);
17.                 } finally {
18.                     status.set(Status.STOPPED);
19.                 }
20.             }
21. 
22.         });
23.     } else {
24.         this.reactorThread = null;
25.     }
26.     this.status = new AtomicReference<Status>(Status.INACTIVE);
27. }
28. 
29. @Override
30. public void start() {
31.     if (this.status.compareAndSet(Status.INACTIVE, Status.ACTIVE)) {
32.         if (this.reactorThread != null) {
33.             this.reactorThread.start();
34.         }
35.     }
36. }

在这里可以看到,实例化InternalHttpAsyncClient时构造了一个runnable过程,执行内容已经初始化完毕,只是还没开始run。在调用start方法时才会开始执行这个runnable过程。

回去看Es客户端构造的代码第8行,触发了CloseableHttpAsyncClientBase中reactorThread的执行。 看到这里可以知道,初始化的客户端中connmgr.execute(ioEventDispatch)执行异常或返回,就会将client状态置为STOP。

继续跟进这里14行的connmgr.execute方法,留意这里是委托给PoolingNHttpClientConnectionManager内部的ioreactor来执行execute的:

PoolingNHttpClientConnectionManager
@Override
public void execute(final IOEventDispatch eventDispatch) throws IOException {
    this.ioreactor.execute(eventDispatch);
}

跟进this.ioreactor.execute

AbstractMultiworkerIOReactor
1. /**
2.  * Activates the main I/O reactor as well as all worker I/O reactors.
3.  * The I/O main reactor will start reacting to I/O events and triggering
4.  * notification methods. The worker I/O reactor in their turn will start
5.  * reacting to I/O events and dispatch I/O event notifications to the given
6.  * {@link IOEventDispatch} interface.
7.  * <p>
8.  * This method will enter the infinite I/O select loop on
9.  * the {@link Selector} instance associated with this I/O reactor and used
10.  * to manage creation of new I/O channels. Once a new I/O channel has been
11.  * created the processing of I/O events on that channel will be delegated
12.  * to one of the worker I/O reactors.
13.  * <p>
14.  * The method will remain blocked unto the I/O reactor is shut down or the
15.  * execution thread is interrupted.
16.  *
17.  * @see #processEvents(int)
18.  * @see #cancelRequests()
19.  *
20.  * @throws InterruptedIOException if the dispatch thread is interrupted.
21.  * @throws IOReactorException in case if a non-recoverable I/O error.
22.  */
23. @Override
24. public void execute(
25.         final IOEventDispatch eventDispatch) throws InterruptedIOException, IOReactorException {
26.     Args.notNull(eventDispatch, "Event dispatcher");
27.     synchronized (this.statusLock) {
28.         if (this.status.compareTo(IOReactorStatus.SHUTDOWN_REQUEST) >= 0) {
29.             this.status = IOReactorStatus.SHUT_DOWN;
30.             this.statusLock.notifyAll();
31.             return;
32.         }
33.         Asserts.check(this.status.compareTo(IOReactorStatus.INACTIVE) == 0,
34.                 "Illegal state %s", this.status);
35.         this.status = IOReactorStatus.ACTIVE;
36.         // Start I/O dispatchers
37.         for (int i = 0; i < this.dispatchers.length; i++) {
38.             final BaseIOReactor dispatcher = new BaseIOReactor(this.selectTimeout, this.interestOpsQueueing);
39.             dispatcher.setExceptionHandler(exceptionHandler);
40.             this.dispatchers[i] = dispatcher;
41.         }
42.         for (int i = 0; i < this.workerCount; i++) {
43.             final BaseIOReactor dispatcher = this.dispatchers[i];
44.             this.workers[i] = new Worker(dispatcher, eventDispatch);
45.             this.threads[i] = this.threadFactory.newThread(this.workers[i]);
46.         }
47.     }
48.     try {
49. 
50.         for (int i = 0; i < this.workerCount; i++) {
51.             if (this.status != IOReactorStatus.ACTIVE) {
52.                 return;
53.             }
54.             this.threads[i].start();
55.         }
56. 
57.         for (;;) {
58.             final int readyCount;
59.             try {
60.                 readyCount = this.selector.select(this.selectTimeout);
61.             } catch (final InterruptedIOException ex) {
62.                 throw ex;
63.             } catch (final IOException ex) {
64.                 throw new IOReactorException("Unexpected selector failure", ex);
65.             }
66. 
67.             if (this.status.compareTo(IOReactorStatus.ACTIVE) == 0) {
68.                 processEvents(readyCount);
69.             }
70. 
71.             // Verify I/O dispatchers
72.             for (int i = 0; i < this.workerCount; i++) {
73.                 final Worker worker = this.workers[i];
74.                 final Throwable ex = worker.getThrowable();
75.                 if (ex != null) {
76.                     throw new IOReactorException(
77.                             "I/O dispatch worker terminated abnormally", ex);
78.                 }
79.             }
80. 
81.             if (this.status.compareTo(IOReactorStatus.ACTIVE) > 0) {
82.                 break;
83.             }
84.         }
85. 
86.     } catch (final ClosedSelectorException ex) {
87.         addExceptionEvent(ex);
88.     } catch (final IOReactorException ex) {
89.         if (ex.getCause() != null) {
90.             addExceptionEvent(ex.getCause());
91.         }
92.         throw ex;
93.     } finally {
94.         doShutdown();
95.         synchronized (this.statusLock) {
96.             this.status = IOReactorStatus.SHUT_DOWN;
97.             this.statusLock.notifyAll();
98.         }
99.     }
100. }

观察到上述代码分别在第31行、52行进行了return,62行、64行、76行、92行进行throw,82,87会导致跳出死循环而return。

1.方法内部先同步的锁定状态,如果状态不符合要求,设置为关闭,并返回-31行。

2.状态符合要求则设置为ACTIVE,并为dispatchers设置携带exceptionHandler的BaseIOReactor封装到worker中,最终被进一步封装到了threads中,并依次start,start前也会检查状态,不符合要求则返回-52行。

static class Worker implements Runnable {

    final BaseIOReactor dispatcher;
    final IOEventDispatch eventDispatch;

    private volatile Throwable exception;

    public Worker(final BaseIOReactor dispatcher, final IOEventDispatch eventDispatch) {
        super();
        this.dispatcher = dispatcher;
        this.eventDispatch = eventDispatch;
    }

    @Override
    public void run() {
        try {
            this.dispatcher.execute(this.eventDispatch);
        } catch (final Error ex) {
            this.exception = ex;
            throw ex;
        } catch (final Exception ex) {
            this.exception = ex;
        }
    }

    public Throwable getThrowable() {
        return this.exception;
    }

}

跟进dispatcher的executr方法,最终是AbstractIOReactor中的execute,看到了死循环,表示被开启的worker会一直处理:


1. protected void execute() throws InterruptedIOException, IOReactorException {
2.     this.status = IOReactorStatus.ACTIVE;
3. 
4.     try {
5.         for (;;) {
6. 
7.             final int readyCount;
8.             try {
9.                 readyCount = this.selector.select(this.selectTimeout);
10.             } catch (final InterruptedIOException ex) {
11.                 throw ex;
12.             } catch (final IOException ex) {
13.                 throw new IOReactorException("Unexpected selector failure", ex);
14.             }
15. 
16.             ...
17. 
18.             // Process selected I/O events
19.             if (readyCount > 0) {
20.                 processEvents(this.selector.selectedKeys());
21.             }
22. 
23.            ...
24. 
25.     } catch (final ClosedSelectorException ignore) {
26.     } finally {
27.         hardShutdown();
28.         synchronized (this.statusMutex) {
29.             this.statusMutex.notifyAll();
30.         }
31.     }
32. }
33. 
34. private void processEvents(final Set<SelectionKey> selectedKeys) {
35.     for (final SelectionKey key : selectedKeys) {
36. 
37.         processEvent(key);
38. 
39.     }
40.     selectedKeys.clear();
41. }
42. 
43. 
44. protected void processEvent(final SelectionKey key) {
45.     final IOSessionImpl session = (IOSessionImpl) key.attachment();
46.     try {
47.         if (key.isAcceptable()) {
48.             acceptable(key);
49.         }
50.         if (key.isConnectable()) {
51.             connectable(key);
52.         }
53.         if (key.isReadable()) {
54.             session.resetLastRead();
55.             readable(key);
56.         }
57.         if (key.isWritable()) {
58.             session.resetLastWrite();
59.             writable(key);
60.         }
61.     } catch (final CancelledKeyException ex) {
62.         queueClosedSession(session);
63.         key.attach(null);
64.     }
65. }

以收到可读事件举例:

BaseIOReactor
@Override
protected void readable(final SelectionKey key) {
    final IOSession session = getSession(key);
    try {
        // Try to gently feed more data to the event dispatcher
        // if the session input buffer has not been fully exhausted
        // (the choice of 5 iterations is purely arbitrary)
        for (int i = 0; i < 5; i++) {
            this.eventDispatch.inputReady(session);
            if (!session.hasBufferedInput()
                    || (session.getEventMask() & SelectionKey.OP_READ) == 0) {
                break;
            }
        }
        if (session.hasBufferedInput()) {
            this.bufferingSessions.add(session);
        }
    } catch (final CancelledKeyException ex) {
        throw ex;
    } catch (final RuntimeException ex) {
        handleRuntimeException(ex);
    }
}

protected void handleRuntimeException(final RuntimeException ex) {
    if (this.exceptionHandler == null || !this.exceptionHandler.handle(ex)) {
        throw ex;
    }
}

可以看出在处理事件异常逻辑最终会取决于BaseIOReactor中的exceptionHandler是否向上抛出异常。

为什么不怀疑是第9行的this.selector.select抛出异常呢,看下面代码

/**
 * Selects a set of keys whose corresponding channels are ready for I/O
 * operations.
 *
 * <p> This method performs a blocking <a href="#selop">selection
 * operation</a>.  It returns only after at least one channel is selected,
 * this selector's {@link #wakeup wakeup} method is invoked, the current
 * thread is interrupted, or the given timeout period expires, whichever
 * comes first.
 *
 * <p> This method does not offer real-time guarantees: It schedules the
 * timeout as if by invoking the {@link Object#wait(long)} method. </p>
 *
 * @param  timeout  If positive, block for up to <tt>timeout</tt>
 *                  milliseconds, more or less, while waiting for a
 *                  channel to become ready; if zero, block indefinitely;
 *                  must not be negative
 *
 * @return  The number of keys, possibly zero,
 *          whose ready-operation sets were updated
 *
 * @throws  IOException
 *          If an I/O error occurs
 *
 * @throws  ClosedSelectorException
 *          If this selector is closed
 *
 * @throws  IllegalArgumentException
 *          If the value of the timeout argument is negative
 */
public abstract int select(long timeout)
    throws IOException;

3.接下来是个死循环,60行涉及到了IO Reactor知识,简单点说就是在返回指定时间内发生了多少IO事件,通过注释可以看出selectTimout内没有事件发生返回0而不是抛异常,仅当发生IO 异常时会触发62行、64行,并在selector关闭后触发87行(这里的selector。select与上述worker中最终死循环调用的是同一个):

4.接下来到了68行处理接收到的IO事件,实际上这里只会对接收到的connectable事件进行处理,并在链接中附带的sessionRequest为处理完毕时,委托到worker中进行处理。

5.72行开始依次检查worker在本轮是否抛出异常,如果子线程抛出了异常,则在这一层包装为IOReactorException抛出。也就是说如果worker中异常未被IOReactor的ExcelHandler处理,就会被AbstractMultiworkerIOReactor感知,跳出IO事件处理的死循环,最终导致client关闭。

如何为IOReactor设置exceptionHandler呢?

public CloseableHttpAsyncClient build() {

    ...

    NHttpClientConnectionManager connManager = this.connManager;
    if (connManager == null) {
        ...
        final ConnectingIOReactor ioreactor = IOReactorUtils.create(
            defaultIOReactorConfig != null ? defaultIOReactorConfig : IOReactorConfig.DEFAULT, threadFactory);
        final PoolingNHttpClientConnectionManager poolingmgr = new PoolingNHttpClientConnectionManager(
                ioreactor,
                RegistryBuilder.<SchemeIOSessionStrategy>create()
                    .register("http", NoopIOSessionStrategy.INSTANCE)
                    .register("https", sslStrategy)
                    .build());
        ...
        connManager = poolingmgr;
    }
    
    ...
    
    return new InternalHttpAsyncClient(
        connManager,
        reuseStrategy,
        keepAliveStrategy,
        threadFactory,
        eventHandler,
        exec,
        cookieSpecRegistry,
        authSchemeRegistry,
        defaultCookieStore,
        defaultCredentialsProvider,
        defaultRequestConfig);
}

还记得前面说过Es客户端初始化时是通过PoolingNHttpClientConnectionManager委托给内部的ioReactor执行execute方法的,原来在实例化InternalHttpAsyncClient时可以通过设置connManager来设置IOReactor,否则会使用默认的ioreactor,看一下默认实现:

final class IOReactorUtils {

    private IOReactorUtils() {
    }

    public static ConnectingIOReactor create(final IOReactorConfig config, final ThreadFactory threadFactory) {
        try {
            return new DefaultConnectingIOReactor(config, threadFactory);
        } catch (final IOReactorException ex) {
            throw new IllegalStateException(ex);
        }
    }

}
public DefaultConnectingIOReactor(
        final IOReactorConfig config,
        final ThreadFactory threadFactory) throws IOReactorException {
    super(config, threadFactory);
    this.requestQueue = new ConcurrentLinkedQueue<SessionRequestImpl>();
    this.lastTimeoutCheck = System.currentTimeMillis();
}

很明显默认的ioreactor实现类DefaultConnectingIOReactor中exceptionHandler为null,当处理IO时间过程中出现RuntimeException,会直接上抛,最终导致client不可用。

为了解决文档开头的异常,可以参考这里手动指定带异常处理的IOReactor

public ESHighLevelClient build() throws IOException {
        RestClientBuilder builder = RestClient.builder(new HttpHost(host, port, "http"))
                .setRequestConfigCallback(
                        config -> config.setConnectTimeout(connectTimeout)
                                .setConnectionRequestTimeout(connectionRequestTimeout)
                                .setSocketTimeout(socketTimeout))
                .setHttpClientConfigCallback(
                        httpClientBuilder -> {
                            final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
                            if (ESHighLevelClient.this.certification) {
                                credentialsProvider.setCredentials(AuthScope.ANY,
                                        new UsernamePasswordCredentials(username, password));
                                httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
                            }
                            httpClientBuilder.setKeepAliveStrategy(CustomConnectionKeepAliveStrategy.INSTANCE);

                            try {
                                DefaultConnectingIOReactor ioReactor = new DefaultConnectingIOReactor();
                                ioReactor.setExceptionHandler(new IOReactorExceptionHandler() {
                                    @Override
                                    public boolean handle(IOException e) {
                                        return true; 
                                    }

                                    @Override
                                    public boolean handle(RuntimeException e) {
                                        return true; 
                                    }
                                });
                                httpClientBuilder.setConnectionManager(new PoolingNHttpClientConnectionManager(ioReactor));
                            } catch (IOReactorException e) {
                                throw new RuntimeException(e);
                            }
                            return httpClientBuilder;
                        }
                );
        return new RestHighLevelClient(builder);
        
    }

到这里基本上就把ES客户端启动相关的源码看完了。

总结一下:

未命名文件 (11).jpg

查到的一些有用的信息:

[1](Elasticsearch " Request cannot be executed; I/O reactor status:",如何处理? - 云+社区 - 腾讯云 (tencent.com))

[2](elasticsearch client偶现I/O reactor错误分析 - 知乎 (zhihu.com))

[3](Catch Exceptions thrown in LLRC callbacks to protect the client IO Reactor Thread · Issue #45115 · elastic/elasticsearch · GitHub)