需求
系统运行的时候,会出现几种情况:
1、正常运行
这个时候,dubbo线程池的活跃线程数量比较小。
2、dubbo线程池超过阈值
接近阈值的时候,我们需要告警出来,第一个是可以提前知道线程池是否快要满了,第二个是以这个为依据看是否要调大线程池数量。
3、dubbo线程池满
如果线程池满,dubbo会抛出异常。
我们现在要做的就是,解决第二种情况的问题,即监控dubbo线程池的运行情况,如果线程池的活跃线程数量超过阈值要告警。
实现原理
1、基于dubbo自动的工具类
可以读线程池数据
2、和阈值比较
dubbo自带的工具类
可以读线程池数据
/**
* ThreadPoolStatusChecker
*/
@Activate
public class ThreadPoolStatusChecker implements StatusChecker {
@Override
public Status check() {
DataStore dataStore = ExtensionLoader.getExtensionLoader(DataStore.class).getDefaultExtension();
Map<String, Object> executors = dataStore.get(Constants.EXECUTOR_SERVICE_COMPONENT_KEY);
StringBuilder msg = new StringBuilder();
Status.Level level = Status.Level.OK;
for (Map.Entry<String, Object> entry : executors.entrySet()) {
String port = entry.getKey();
ExecutorService executor = (ExecutorService) entry.getValue();
if (executor != null && executor instanceof ThreadPoolExecutor) { //校验是否是线程池
ThreadPoolExecutor tp = (ThreadPoolExecutor) executor;
boolean ok = tp.getActiveCount() < tp.getMaximumPoolSize() - 1;
Status.Level lvl = Status.Level.OK;
if (!ok) {
level = Status.Level.WARN;
lvl = Status.Level.WARN;
}
if (msg.length() > 0) {
msg.append(";");
}
msg.append("Pool status:" + lvl
+ ", max:" + tp.getMaximumPoolSize()
+ ", core:" + tp.getCorePoolSize()
+ ", largest:" + tp.getLargestPoolSize()
+ ", active:" + tp.getActiveCount()
+ ", task:" + tp.getTaskCount()
+ ", service port: " + port);
}
}
return msg.length() == 0 ? new Status(Status.Level.UNKNOWN) : new Status(level, msg.toString());
}
}
和阈值比较
刚才上面只是dubbo提供的工具类,但是具体怎么使用呢?
步骤,
1、自定义dubbo拦截器
2、把dubbo工具类的代码复制粘贴出来
然后在dubbo工具类的代码的基础之上,加一点和发纸比较的代码即可。
源码
自定义dubbo拦截器
@Activate(group = {Constants.PROVIDER, Constants.CONSUMER})
public class CatTransaction implements Filter {
public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
...
CatExecutor.catDubboThreadPool(); //具体的监控类
}
}
具体的监控类:活跃线程数量超过50%,就告警。当然,具体的阈值,可以自定义。
public class CatExecutor {
public static final String DUBBO_THREADPOOL = "DUBBO.ThreadPoolExecutor";
private static int[] thresholdList = new int[] {50, 55, 60, 65, 70, 75, 80, 85, 90, 95}; //阈值:超过50%就告警
public static void catDubboThreadPool() {
try {
DataStore dataStore = ExtensionLoader.getExtensionLoader(DataStore.class).getDefaultExtension();
Map<String, Object> executors = dataStore.get(Constants.EXECUTOR_SERVICE_COMPONENT_KEY);
for (Map.Entry<String, Object> entry : executors.entrySet()) {
ExecutorService executor = (ExecutorService) entry.getValue();
if (executor != null && executor instanceof ThreadPoolExecutor) {
ThreadPoolExecutor tp = (ThreadPoolExecutor) executor;
StringBuffer sb = new StringBuffer();
sb.append("activePoolSize:").append(tp.getActiveCount()).append("&");
sb.append("maxPoolSize:").append(tp.getMaximumPoolSize()).append("&");
sb.append("corePoolSize:").append(tp.getCorePoolSize()).append("&");
sb.append("completedTask:").append(tp.getCompletedTaskCount()).append("&");
sb.append("totalTaskCount:").append(tp.getTaskCount());
double threshold = new BigDecimal((float)tp.getActiveCount()/tp.getMaximumPoolSize() * 100).setScale(2, BigDecimal.ROUND_HALF_UP).doubleValue();
for (int i=0;i<thresholdList.length;i++) {
//和阈值比较
if (threshold >= thresholdList[i]) {
Cat.logEvent(DUBBO_THREADPOOL, "EXCEED_"+thresholdList[i]+"%", Event.SUCCESS, sb.toString());
}
}
}
}
} catch(Exception e) {
Cat.logError(e);
}
}
}
如果只需要简单的打印日志
如果不需要监控和告警,即不需要和阈值比较,那么就只需要简单的打印日志即可。
实现原理
1.获取dubbo提供的工具类的对象
2.读数据即可
当然,也需要自定义dubbo拦截器。然后获取dubbo工具类对象,并且调用dubbo工具类对象对应的读数据的方法。最后把线程池数据打印出来即可。其实和上面监控告警差不多,唯一的区别就是这里只是简单的打印在日志里面,并没有告警。
源码
//dubbo线程池数量监控
Class<?> clazz = Class.forName("com.alibaba.dubbo.rpc.protocol.dubbo.status.ThreadPoolStatusChecker");
Method check = clazz.getMethod("check");
Object result = check.invoke(clazz.newInstance());
logger.info(JSONObject.toJSONString(result)); //打印日志
线程池的各个字段说明
测试数据
2020-07-09 17:27:02.893|INFO |dlct2FhXhVFR-39-81|xxx.common.filter.dubbo.AccessLogExtFilter.invoke:175||
{"level":"OK",
"message":"Pool status:OK, //线程池状态:正常
max:500, //最大数量
core:500, //core数量
largest:51, //线程池线程数量的峰值,线程池中曾经有过的最大线程数量
active:1, //活跃数量,一直在变化
task:51, //总任务数量=已完成任务数量+未完成任务数量
service port: 12029"}
dubbo源码-ThreadPoolStatusChecker
msg.append("Pool status:" + lvl
+ ", max:" + tp.getMaximumPoolSize()
+ ", core:" + tp.getCorePoolSize()
+ ", largest:" + tp.getLargestPoolSize()
+ ", active:" + tp.getActiveCount()
+ ", task:" + tp.getTaskCount()
+ ", service port: " + port);
jdk源码-ThreadPoolExecutor
1、getLargestPoolSize
- largestPoolSize是worker集合的历史最大值,只增不减。largestPoolSize的大小是线程池曾创建的线程个数,跟线程池的容量无关;
- largestPoolSize<=maximumPoolSize。
//读方法
/**
* Returns the largest number of threads that have ever
* simultaneously been in the pool.
*
* @return the number of threads
*/
public int getLargestPoolSize() {
final ReentrantLock mainLock = this.mainLock;
mainLock.lock();
try {
return largestPoolSize;
} finally {
mainLock.unlock();
}
}
//写方法
/**
* Checks if a new worker can be added with respect to current
* pool state and the given bound (either core or maximum). If so,
* the worker count is adjusted accordingly, and, if possible, a
* new worker is created and started, running firstTask as its
* first task. This method returns false if the pool is stopped or
* eligible to shut down. It also returns false if the thread
* factory fails to create a thread when asked. If the thread
* creation fails, either due to the thread factory returning
* null, or due to an exception (typically OutOfMemoryError in
* Thread#start), we roll back cleanly.
*
* @param firstTask the task the new thread should run first (or
* null if none). Workers are created with an initial first task
* (in method execute()) to bypass queuing when there are fewer
* than corePoolSize threads (in which case we always start one),
* or when the queue is full (in which case we must bypass queue).
* Initially idle threads are usually created via
* prestartCoreThread or to replace other dying workers.
*
* @param core if true use corePoolSize as bound, else
* maximumPoolSize. (A boolean indicator is used here rather than a
* value to ensure reads of fresh values after checking other pool
* state).
* @return true if successful
*/
private boolean addWorker(Runnable firstTask, boolean core) {
retry:
for (;;) {
int c = ctl.get();
int rs = runStateOf(c);
// Check if queue empty only if necessary.
if (rs >= SHUTDOWN &&
! (rs == SHUTDOWN &&
firstTask == null &&
! workQueue.isEmpty()))
return false;
for (;;) {
int wc = workerCountOf(c);
if (wc >= CAPACITY ||
wc >= (core ? corePoolSize : maximumPoolSize))
return false;
if (compareAndIncrementWorkerCount(c))
break retry;
c = ctl.get(); // Re-read ctl
if (runStateOf(c) != rs)
continue retry;
// else CAS failed due to workerCount change; retry inner loop
}
}
boolean workerStarted = false;
boolean workerAdded = false;
Worker w = null;
try {
final ReentrantLock mainLock = this.mainLock;
w = new Worker(firstTask);
final Thread t = w.thread;
if (t != null) {
mainLock.lock();
try {
// Recheck while holding lock.
// Back out on ThreadFactory failure or if
// shut down before lock acquired.
int c = ctl.get();
int rs = runStateOf(c);
if (rs < SHUTDOWN ||
(rs == SHUTDOWN && firstTask == null)) {
if (t.isAlive()) // precheck that t is startable
throw new IllegalThreadStateException();
workers.add(w);
int s = workers.size();
if (s > largestPoolSize)
largestPoolSize = s; //写数据
workerAdded = true;
}
} finally {
mainLock.unlock();
}
if (workerAdded) {
t.start();
workerStarted = true;
}
}
} finally {
if (! workerStarted)
addWorkerFailed(w);
}
return workerStarted;
}
2、getTaskCount 总的任务数量=已完成任务数量 + 任务集合里未完成任务数量
/**
* Counter for completed tasks. Updated only on termination of
* worker threads. Accessed only under mainLock.
*/
private long completedTaskCount; //已完成任务数量
/**
* Returns the approximate total number of tasks that have ever been
* scheduled for execution. Because the states of tasks and
* threads may change dynamically during computation, the returned
* value is only an approximation.
*
* @return the number of tasks
*/
public long getTaskCount() {
final ReentrantLock mainLock = this.mainLock;
mainLock.lock();
try {
long n = completedTaskCount;
for (Worker w : workers) {
n += w.completedTasks;
if (w.isLocked())
++n;
}
return n + workQueue.size(); //已完成任务数量 + 任务集合里未完成任务数量
} finally {
mainLock.unlock();
}
}
官方api
long getTaskCount()
返回曾计划执行的近似任务总数。