问题场景
在Spring中用 @Scheduled 注解启动一个定时任务,每隔一秒向 mqtt Broker发送消息,因网络原因mqtt Client 与 Broker之间的连接断开了(见下面连接断开异常栈),然后定时任务就不再执行,也不报任何异常,后面通过工具分析线程发现执行任务的线程被阻塞(见下面线程阻塞异常栈)。
连接断开异常栈
org.eclipse.paho.client.mqttv3.MqttException: 已断开连接
at org.eclipse.paho.client.mqttv3.internal.CommsSender.handleRunException(CommsSender.java:194)
at org.eclipse.paho.client.mqttv3.internal.CommsSender.run(CommsSender.java:171)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.SocketException: Connection reset by peer
at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035)
at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
at org.eclipse.paho.client.mqttv3.internal.wire.MqttOutputStream.flush(MqttOutputStream.java:49)
at org.eclipse.paho.client.mqttv3.internal.CommsSender.run(CommsSender.java:149)
... 1 common frames omitted
线程阻塞异常栈
堆栈跟踪:
java.base@17.0.8/java.lang.Object.wait(Native Method)
java.base@17.0.8/java.lang.Object.wait(Object.java:338)
org.eclipse.paho.client.mqttv3.internal.Token.waitForResponse(Token.java:143)
org.eclipse.paho.client.mqttv3.internal.Token.waitForCompletion(Token.java:108)
org.eclipse.paho.client.mqttv3.MqttToken.waitForCompletion(MqttToken.java:67)
org.eclipse.paho.client.mqttv3.MqttClient.publish(MqttClient.java:570)
......后面还有具体的业务调用日志
问题分析
通过线程阻塞栈日志
org.eclipse.paho.client.mqttv3.internal.Token.waitForResponse(Token.java:143) org.eclipse.paho.client.mqttv3.internal.Token.waitForCompletion(Token.java:108) org.eclipse.paho.client.mqttv3.MqttToken.waitForCompletion(MqttToken.java:67) org.eclipse.paho.client.mqttv3.MqttClient.publish(MqttClient.java:570)
分析结合源码发现是在连接断开那一瞬publis的消息一直在等待响应,按道理等待响应总有一个超时,然后MqttClient默认的超时时间是-1,代表永久等待(请看下面MqttClient源码)
MqttClient源码
上图中publish后面调用了
waitForCompletion 方法,并设置了一个等待时间,而等待时间就是MqttClient默认的 -1,后面我们再看看 waitForCompletion 方法。
上面代码正好跟线程阻塞栈中的日志对应上了。
解决办法
给MqttClient中的 timeToWait 设置一个有效的等待时间,mqttClient.setTimeToWait(1000)