故事是这样的,我在1个公共库中添加了如下依赖来引入es。
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
然后,这个版本进入联调阶段后我也都把服务都部署到开发环境了。
结果,钉钉上prometheus的告警群一直在报警,我开始没当回事。
但同时发现nacos服务部署出问题,如下是平时正常1个服务1个实例, 而现在会有2个实例且端口都是10开头的。 这可如何是好! 老的服务没有被替换还在运行中,这样负载均衡会请求到老服务。
由于我是个菜鸡,只能暂时先下线老服务了。 同时我本地启动时也发现一直有以下报错:
logger:org.springframework.boot.actuate.elasticsearch.ElasticsearchRestHealthIndicator message:Elasticsearch health check failed
java.net.ConnectException: Timeout connecting to [localhost/127.0.0.1:9200]
at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:943) ~[elasticsearch-rest-client-6.4.3.jar:6.4.3]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:227) ~[elasticsearch-rest-client-6.4.3.jar:6.4.3]
at org.springframework.boot.actuate.elasticsearch.ElasticsearchRestHealthIndicator.doHealthCheck(ElasticsearchRestHealthIndicator.java:61) ~[spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
at org.springframework.boot.actuate.health.AbstractHealthIndicator.health(AbstractHealthIndicator.java:84) ~[spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
at org.springframework.boot.actuate.health.CompositeHealthIndicator.health(CompositeHealthIndicator.java:98) ~[spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
at org.springframework.boot.actuate.health.HealthEndpoint.health(HealthEndpoint.java:50) ~[spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_301]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_301]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_301]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_301]
at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:282) ~[spring-core-5.1.6.RELEASE.jar:5.1.6.RELEASE]
at org.springframework.boot.actuate.endpoint.invoke.reflect.ReflectiveOperationInvoker.invoke(ReflectiveOperationInvoker.java:76) ~[spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
at org.springframework.boot.actuate.endpoint.annotation.AbstractDiscoveredOperation.invoke(AbstractDiscoveredOperation.java:61) ~[spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
at org.springframework.boot.actuate.endpoint.jmx.EndpointMBean.invoke(EndpointMBean.java:126) ~[spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
at org.springframework.boot.actuate.endpoint.jmx.EndpointMBean.invoke(EndpointMBean.java:99) ~[spring-boot-actuator-2.1.4.RELEASE.jar:2.1.4.RELEASE]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) ~[?:1.8.0_301]
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) ~[?:1.8.0_301]
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) ~[?:1.8.0_301]
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) ~[?:1.8.0_301]
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) ~[?:1.8.0_301]
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401) ~[?:1.8.0_301]
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) ~[?:1.8.0_301]
at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_301]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_301]
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) ~[?:1.8.0_301]
at sun.rmi.transport.Transport$1.run(Transport.java:200) ~[?:1.8.0_301]
at sun.rmi.transport.Transport$1.run(Transport.java:197) ~[?:1.8.0_301]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_301]
at sun.rmi.transport.Transport.serviceCall(Transport.java:196) ~[?:1.8.0_301]
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) ~[?:1.8.0_301]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) ~[?:1.8.0_301]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) ~[?:1.8.0_301]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_301]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) ~[?:1.8.0_301]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_301]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_301]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_301]
Caused by: java.net.ConnectException: Timeout connecting to [localhost/127.0.0.1:9200]
at org.apache.http.nio.pool.RouteSpecificPool.timeout(RouteSpecificPool.java:169) ~[httpcore-nio-4.4.11.jar:4.4.11]
at org.apache.http.nio.pool.AbstractNIOConnPool.requestTimeout(AbstractNIOConnPool.java:628) ~[httpcore-nio-4.4.11.jar:4.4.11]
at org.apache.http.nio.pool.AbstractNIOConnPool$InternalSessionRequestCallback.timeout(AbstractNIOConnPool.java:894) ~[httpcore-nio-4.4.11.jar:4.4.11]
at org.apache.http.impl.nio.reactor.SessionRequestImpl.timeout(SessionRequestImpl.java:183) ~[httpcore-nio-4.4.11.jar:4.4.11]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processTimeouts(DefaultConnectingIOReactor.java:210) ~[httpcore-nio-4.4.11.jar:4.4.11]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:155) ~[httpcore-nio-4.4.11.jar:4.4.11]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) ~[httpcore-nio-4.4.11.jar:4.4.11]
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[httpasyncclient-4.1.4.jar:4.1.4]
... 1 more
但是很奇怪的是我这个服务根本没有引入es的配置! 查看s环境日志也是如此:
开发
带着这些疑惑我继续进行分析。
排查
问题汇总
- 告警报错
- k8s部署出问题
- 服务一直报错es健康检查失败
但这个时候发现好几个服务都出现了这种情况,但不是所有这次部署的服务出现这种现象,
于是观察这几个出问题的服务的共同点。发现它们的pom都重新加载了1个公共模块。 而这个公共模块中最近被我添加了1个新依赖:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
我引入这个es依赖的原因是有个实体类 需要用到 es的@Doucment这个类。
于是发现问题不能引入starter,改为 @Doucment类直接依赖的
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-elasticsearch</artifactId>
<version>3.1.6.RELEASE</version>
</dependency>
但发现还是会检查,这个就得看源码为什么会这样了。
解决
最终按照同事的建议,设置这个依赖的optional元素为true,如下:
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-elasticsearch</artifactId>
<version>3.1.6.RELEASE</version>
<optional>true</optional>
</dependency>
经过测试模拟,部署到开发环境后也正常。问题解决。 K8S也不报警了。 最后的报警也是解决