mqtt连接丢失导致线程阻塞

525 阅读1分钟

问题场景

在Spring中用 @Scheduled 注解启动一个定时任务,每隔一秒向 mqtt Broker发送消息,因网络原因mqtt Client 与 Broker之间的连接断开了(见下面连接断开异常栈),然后定时任务就不再执行,也不报任何异常,后面通过工具分析线程发现执行任务的线程被阻塞(见下面线程阻塞异常栈)。

连接断开异常栈

org.eclipse.paho.client.mqttv3.MqttException: 已断开连接
	at org.eclipse.paho.client.mqttv3.internal.CommsSender.handleRunException(CommsSender.java:194)
	at org.eclipse.paho.client.mqttv3.internal.CommsSender.run(CommsSender.java:171)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.SocketException: Connection reset by peer
	at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
	at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
	at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
	at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035)
	at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
	at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
	at org.eclipse.paho.client.mqttv3.internal.wire.MqttOutputStream.flush(MqttOutputStream.java:49)
	at org.eclipse.paho.client.mqttv3.internal.CommsSender.run(CommsSender.java:149)
	... 1 common frames omitted

线程阻塞异常栈

堆栈跟踪: 
java.base@17.0.8/java.lang.Object.wait(Native Method)
java.base@17.0.8/java.lang.Object.wait(Object.java:338)
org.eclipse.paho.client.mqttv3.internal.Token.waitForResponse(Token.java:143)
org.eclipse.paho.client.mqttv3.internal.Token.waitForCompletion(Token.java:108)
org.eclipse.paho.client.mqttv3.MqttToken.waitForCompletion(MqttToken.java:67)
org.eclipse.paho.client.mqttv3.MqttClient.publish(MqttClient.java:570)
......后面还有具体的业务调用日志

问题分析

通过线程阻塞栈日志 org.eclipse.paho.client.mqttv3.internal.Token.waitForResponse(Token.java:143) org.eclipse.paho.client.mqttv3.internal.Token.waitForCompletion(Token.java:108) org.eclipse.paho.client.mqttv3.MqttToken.waitForCompletion(MqttToken.java:67) org.eclipse.paho.client.mqttv3.MqttClient.publish(MqttClient.java:570)

分析结合源码发现是在连接断开那一瞬publis的消息一直在等待响应,按道理等待响应总有一个超时,然后MqttClient默认的超时时间是-1,代表永久等待(请看下面MqttClient源码)

MqttClient源码

image.png 上图中publish后面调用了 waitForCompletion 方法,并设置了一个等待时间,而等待时间就是MqttClient默认的 -1,后面我们再看看 waitForCompletion 方法。

image.png

image.png

image.png

上面代码正好跟线程阻塞栈中的日志对应上了。

解决办法

给MqttClient中的 timeToWait 设置一个有效的等待时间,mqttClient.setTimeToWait(1000)