dubbo-线程池监控

2,085 阅读4分钟

需求

系统运行的时候,会出现几种情况:
1、正常运行
这个时候,dubbo线程池的活跃线程数量比较小。

2、dubbo线程池超过阈值
接近阈值的时候,我们需要告警出来,第一个是可以提前知道线程池是否快要满了,第二个是以这个为依据看是否要调大线程池数量。

3、dubbo线程池满
如果线程池满,dubbo会抛出异常。


我们现在要做的就是,解决第二种情况的问题,即监控dubbo线程池的运行情况,如果线程池的活跃线程数量超过阈值要告警。

实现原理

1、基于dubbo自动的工具类
可以读线程池数据

2、和阈值比较

dubbo自带的工具类

可以读线程池数据

/**
 * ThreadPoolStatusChecker
 */
@Activate
public class ThreadPoolStatusChecker implements StatusChecker {

    @Override
    public Status check() {
        DataStore dataStore = ExtensionLoader.getExtensionLoader(DataStore.class).getDefaultExtension();
        Map<String, Object> executors = dataStore.get(Constants.EXECUTOR_SERVICE_COMPONENT_KEY);

        StringBuilder msg = new StringBuilder();
        Status.Level level = Status.Level.OK;
        for (Map.Entry<String, Object> entry : executors.entrySet()) {
            String port = entry.getKey();
            ExecutorService executor = (ExecutorService) entry.getValue();

            if (executor != null && executor instanceof ThreadPoolExecutor) { //校验是否是线程池
                ThreadPoolExecutor tp = (ThreadPoolExecutor) executor;
                boolean ok = tp.getActiveCount() < tp.getMaximumPoolSize() - 1;
                Status.Level lvl = Status.Level.OK;
                if (!ok) {
                    level = Status.Level.WARN;
                    lvl = Status.Level.WARN;
                }

                if (msg.length() > 0) {
                    msg.append(";");
                }
                msg.append("Pool status:" + lvl
                        + ", max:" + tp.getMaximumPoolSize()
                        + ", core:" + tp.getCorePoolSize()
                        + ", largest:" + tp.getLargestPoolSize()
                        + ", active:" + tp.getActiveCount()
                        + ", task:" + tp.getTaskCount()
                        + ", service port: " + port);
            }
        }
        return msg.length() == 0 ? new Status(Status.Level.UNKNOWN) : new Status(level, msg.toString());
    }

}

和阈值比较

刚才上面只是dubbo提供的工具类,但是具体怎么使用呢?
步骤,
1、自定义dubbo拦截器
2、把dubbo工具类的代码复制粘贴出来
然后在dubbo工具类的代码的基础之上,加一点和发纸比较的代码即可。


源码

自定义dubbo拦截器

@Activate(group = {Constants.PROVIDER, Constants.CONSUMER})
public class CatTransaction implements Filter {

public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
   ...
   CatExecutor.catDubboThreadPool(); //具体的监控类
}

}

具体的监控类:活跃线程数量超过50%,就告警。当然,具体的阈值,可以自定义。

public class CatExecutor {

    public static final String DUBBO_THREADPOOL = "DUBBO.ThreadPoolExecutor";

    private static int[] thresholdList = new int[] {50, 55, 60, 65, 70, 75, 80, 85, 90, 95}; //阈值:超过50%就告警

    public static void catDubboThreadPool() {
        try {
            DataStore dataStore = ExtensionLoader.getExtensionLoader(DataStore.class).getDefaultExtension();
            Map<String, Object> executors = dataStore.get(Constants.EXECUTOR_SERVICE_COMPONENT_KEY);

            for (Map.Entry<String, Object> entry : executors.entrySet()) {
                ExecutorService executor = (ExecutorService) entry.getValue();

                if (executor != null && executor instanceof ThreadPoolExecutor) {
                    ThreadPoolExecutor tp = (ThreadPoolExecutor) executor;
                    StringBuffer sb = new StringBuffer();
                    sb.append("activePoolSize:").append(tp.getActiveCount()).append("&");
                    sb.append("maxPoolSize:").append(tp.getMaximumPoolSize()).append("&");
                    sb.append("corePoolSize:").append(tp.getCorePoolSize()).append("&");
                    sb.append("completedTask:").append(tp.getCompletedTaskCount()).append("&");
                    sb.append("totalTaskCount:").append(tp.getTaskCount());

                    double threshold = new BigDecimal((float)tp.getActiveCount()/tp.getMaximumPoolSize() * 100).setScale(2, BigDecimal.ROUND_HALF_UP).doubleValue();
                    for (int i=0;i<thresholdList.length;i++) {
                        //和阈值比较
                        if (threshold >= thresholdList[i]) {
                            Cat.logEvent(DUBBO_THREADPOOL, "EXCEED_"+thresholdList[i]+"%", Event.SUCCESS, sb.toString());
                        }
                    }
                }
            }
        } catch(Exception e) {
            Cat.logError(e);
        }
    }
}

如果只需要简单的打印日志

如果不需要监控和告警,即不需要和阈值比较,那么就只需要简单的打印日志即可。


实现原理

1.获取dubbo提供的工具类的对象

2.读数据即可
当然,也需要自定义dubbo拦截器。然后获取dubbo工具类对象,并且调用dubbo工具类对象对应的读数据的方法。最后把线程池数据打印出来即可。其实和上面监控告警差不多,唯一的区别就是这里只是简单的打印在日志里面,并没有告警。


源码

//dubbo线程池数量监控
                Class<?> clazz = Class.forName("com.alibaba.dubbo.rpc.protocol.dubbo.status.ThreadPoolStatusChecker");
                Method check = clazz.getMethod("check");
                Object result = check.invoke(clazz.newInstance());
                logger.info(JSONObject.toJSONString(result)); //打印日志

线程池的各个字段说明

测试数据

2020-07-09 17:27:02.893|INFO |dlct2FhXhVFR-39-81|xxx.common.filter.dubbo.AccessLogExtFilter.invoke:175||
{"level":"OK",
"message":"Pool status:OK, //线程池状态:正常
max:500, //最大数量
core:500, //core数量
largest:51, //线程池线程数量的峰值,线程池中曾经有过的最大线程数量
active:1, //活跃数量,一直在变化
task:51, //总任务数量=已完成任务数量+未完成任务数量
service port: 12029"}

dubbo源码-ThreadPoolStatusChecker

msg.append("Pool status:" + lvl
                        + ", max:" + tp.getMaximumPoolSize()
                        + ", core:" + tp.getCorePoolSize()
                        + ", largest:" + tp.getLargestPoolSize()
                        + ", active:" + tp.getActiveCount()
                        + ", task:" + tp.getTaskCount()
                        + ", service port: " + port);

jdk源码-ThreadPoolExecutor

1、getLargestPoolSize

  • largestPoolSize是worker集合的历史最大值,只增不减。largestPoolSize的大小是线程池曾创建的线程个数,跟线程池的容量无关;
  • largestPoolSize<=maximumPoolSize。

//读方法

/**
     * Returns the largest number of threads that have ever
     * simultaneously been in the pool.
     *
     * @return the number of threads
     */
    public int getLargestPoolSize() {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            return largestPoolSize;
        } finally {
            mainLock.unlock();
        }
    }

//写方法

/**
     * Checks if a new worker can be added with respect to current
     * pool state and the given bound (either core or maximum). If so,
     * the worker count is adjusted accordingly, and, if possible, a
     * new worker is created and started, running firstTask as its
     * first task. This method returns false if the pool is stopped or
     * eligible to shut down. It also returns false if the thread
     * factory fails to create a thread when asked.  If the thread
     * creation fails, either due to the thread factory returning
     * null, or due to an exception (typically OutOfMemoryError in
     * Thread#start), we roll back cleanly.
     *
     * @param firstTask the task the new thread should run first (or
     * null if none). Workers are created with an initial first task
     * (in method execute()) to bypass queuing when there are fewer
     * than corePoolSize threads (in which case we always start one),
     * or when the queue is full (in which case we must bypass queue).
     * Initially idle threads are usually created via
     * prestartCoreThread or to replace other dying workers.
     *
     * @param core if true use corePoolSize as bound, else
     * maximumPoolSize. (A boolean indicator is used here rather than a
     * value to ensure reads of fresh values after checking other pool
     * state).
     * @return true if successful
     */
    private boolean addWorker(Runnable firstTask, boolean core) {
        retry:
        for (;;) {
            int c = ctl.get();
            int rs = runStateOf(c);

            // Check if queue empty only if necessary.
            if (rs >= SHUTDOWN &&
                ! (rs == SHUTDOWN &&
                   firstTask == null &&
                   ! workQueue.isEmpty()))
                return false;

            for (;;) {
                int wc = workerCountOf(c);
                if (wc >= CAPACITY ||
                    wc >= (core ? corePoolSize : maximumPoolSize))
                    return false;
                if (compareAndIncrementWorkerCount(c))
                    break retry;
                c = ctl.get();  // Re-read ctl
                if (runStateOf(c) != rs)
                    continue retry;
                // else CAS failed due to workerCount change; retry inner loop
            }
        }

        boolean workerStarted = false;
        boolean workerAdded = false;
        Worker w = null;
        try {
            final ReentrantLock mainLock = this.mainLock;
            w = new Worker(firstTask);
            final Thread t = w.thread;
            if (t != null) {
                mainLock.lock();
                try {
                    // Recheck while holding lock.
                    // Back out on ThreadFactory failure or if
                    // shut down before lock acquired.
                    int c = ctl.get();
                    int rs = runStateOf(c);

                    if (rs < SHUTDOWN ||
                        (rs == SHUTDOWN && firstTask == null)) {
                        if (t.isAlive()) // precheck that t is startable
                            throw new IllegalThreadStateException();
                        workers.add(w);
                        int s = workers.size();
                        if (s > largestPoolSize)
                            largestPoolSize = s; //写数据
                        workerAdded = true;
                    }
                } finally {
                    mainLock.unlock();
                }
                if (workerAdded) {
                    t.start();
                    workerStarted = true;
                }
            }
        } finally {
            if (! workerStarted)
                addWorkerFailed(w);
        }
        return workerStarted;
    }

www.jianshu.com/p/d05c488b8…

2、getTaskCount 总的任务数量=已完成任务数量 + 任务集合里未完成任务数量

/**
     * Counter for completed tasks. Updated only on termination of
     * worker threads. Accessed only under mainLock.
     */
    private long completedTaskCount; //已完成任务数量
    
/**
     * Returns the approximate total number of tasks that have ever been
     * scheduled for execution. Because the states of tasks and
     * threads may change dynamically during computation, the returned
     * value is only an approximation.
     *
     * @return the number of tasks
     */
    public long getTaskCount() {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            long n = completedTaskCount;
            for (Worker w : workers) {
                n += w.completedTasks;
                if (w.isLocked())
                    ++n;
            }
            return n + workQueue.size(); //已完成任务数量 + 任务集合里未完成任务数量
        } finally {
            mainLock.unlock();
        }
    }

官方api

long	getTaskCount()
          返回曾计划执行的近似任务总数。

doc.yonyoucloud.com/doc/jdk6-ap…