一、问题
最近线上有一个故障,原因竟然是new JedisPool太慢。有多慢呢?长达分钟级别。主要原因是:commons pool默认开启了JMX(Java Management Extensions)功能,而JMX会为每个管理的bean创建一个名字,为了防止名字重复,为每个名字加了后缀编号,每次从编号0开始尝试这个名字有没有被用过,如果用过的话,递增编号。
update20210530这个服务(记为服务A)比较特殊,会调用很多的下游服务,该服务A本身很少发生变更,但是依赖的下游服务都是实例比较多的服务而且经常因版本迭代而上下线。某次,一个大的下游服务上下线时,导致服务A,耗时增长,线程耗尽。
排查方法:既然线程耗尽,可以当抛出Reject异常时,将线程堆栈打印出来(即,执行jstack命令)。
二、源码分析
1. GenericObjectPool
看下super构造方法
问题的入口点就是这里。默认JMX是enable的,需要进行JMX注册。
2. JMX注册
/**
* Registers the pool with the platform MBean server.
* The registered name will be
* <code>jmxNameBase + jmxNamePrefix + i</code> where i is the least
* integer greater than or equal to 1 such that the name is not already
* registered. Swallows MBeanRegistrationException, NotCompliantMBeanException
* returning null.
*
* @param config Pool configuration
* @param jmxNameBase default base JMX name for this pool
* @param jmxNamePrefix name prefix
* @return registered ObjectName, null if registration fails
*/
private ObjectName jmxRegister(BaseObjectPoolConfig config,
String jmxNameBase, String jmxNamePrefix) {
ObjectName objectName = null;
// 单例的MBeanServer,所有的被管理的对象都会注册到里面
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
int i = 1;
boolean registered = false;
String base = config.getJmxNameBase();
if (base == null) {
base = jmxNameBase;
}
// 不停的尝试,直到获得未被使用过的名字
while (!registered) {
try {
ObjectName objName;
// Skip the numeric suffix for the first pool in case there is
// only one so the names are cleaner.
if (i == 1) {
objName = new ObjectName(base + jmxNamePrefix);
} else {
objName = new ObjectName(base + jmxNamePrefix + i);
}
// 这里进行具体的名字校验逻辑,如果被使用过抛出InstanceAlreadyExistsException
mbs.registerMBean(this, objName);
objectName = objName;
registered = true;
} catch (MalformedObjectNameException e) {
if (BaseObjectPoolConfig.DEFAULT_JMX_NAME_PREFIX.equals(
jmxNamePrefix) && jmxNameBase.equals(base)) {
// Shouldn't happen. Skip registration if it does.
registered = true;
} else {
// Must be an invalid name. Use the defaults instead.
jmxNamePrefix =
BaseObjectPoolConfig.DEFAULT_JMX_NAME_PREFIX;
base = jmxNameBase;
}
} catch (InstanceAlreadyExistsException e) {
// Increment the index and try again
i++;
} catch (MBeanRegistrationException e) {
// Shouldn't happen. Skip registration if it does.
registered = true;
} catch (NotCompliantMBeanException e) {
// Shouldn't happen. Skip registration if it does.
registered = true;
}
}
return objectName;
}
代码看到这里基本上就可以发现问题了。可以看到,mBean的name编码是从0开始尝试。如果JVM已经有很多的此类mBean了,将会可能很慢。
3. mbs.registerMBean
需要注意的是:上面代码创建的MBeanServer是单例的。
mbs.registerMBean(该mbs实现是JmxMBeanServer)会调用到DefaultMBeanServerInterceptor的registerMBean方法。最终来到DefaultMBeanServerInterceptor的registerObject方法:
private ObjectInstance registerObject(String classname,
Object object, ObjectName name)
throws InstanceAlreadyExistsException,
MBeanRegistrationException,
NotCompliantMBeanException {
if (object == null) {
final RuntimeException wrapped =
new IllegalArgumentException("Cannot add null object");
throw new RuntimeOperationsException(wrapped,
"Exception occurred trying to register the MBean");
}
// 这里会将object封装成一个MBean对象。针对GenericObjectPool会被封装成MXBeanSupport
DynamicMBean mbean = Introspector.makeDynamicMBean(object);
// 进行具体的注册逻辑
return registerDynamicMBean(classname, mbean, name);
}
这个方法有两个核心逻辑:
-
将被注册的对象封装成MBean对象。生成的MXBeanSupport实现了DynamicMBean2, MBeanRegistration接口。注意,由于存在名字冲突的可能,同一个被注册的对象会被封装成多个MBean对象,在后面的分析中可以发现,这些MBean对象都没有被释放,造成内存泄漏。
-
进行具体的注册逻辑,详见下面的分析
4. registerDynamicMBean
private ObjectInstance registerDynamicMBean(String classname,
DynamicMBean mbean,
ObjectName name)
throws InstanceAlreadyExistsException {
// 删除非关键逻辑
try {
if (mbean instanceof DynamicMBean2) {
try {
// 此处缓存MBean到name的映射
((DynamicMBean2) mbean).preRegister2(server, logicalName);
registerFailed = true; // until we succeed
} catch (Exception e) {
// 异常处理
}
}
final Object resource = getResource(mbean);
// Register the MBean with the repository.
// Returns the resource context that was used.
// The returned context does nothing for regular MBeans.
// For ClassLoader MBeans the context makes it possible to register these
// objects with the appropriate framework artifacts, such as
// the CLR, from within the repository lock.
// In case of success, we also need to call context.done() at the
// end of this method.
//
// 此处缓存name到MBean的映射
context = registerWithRepository(resource, mbean, logicalName);
registerFailed = false;
registered = true;
} finally {
try {
// 这里很关键!!
postRegister(logicalName, mbean, registered, registerFailed);
} finally {
if (registered && context!=null) context.done();
}
}
return new ObjectInstance(logicalName, classname);
}
这个方法有两个关键逻辑:
- 缓存MBean到name的映射
这里会将每次封装的MXBean对象与本次的name进行一次映射。没有判断名字是否被使用。
- 缓存name到MBean的映射
这里会对name进行唯一性校验。会抛出InstanceAlreadyExistsException异常。
下面分别看这两个方法的具体实现。
5. preRegister2
该方法缓存MBean到name的映射。具体实现是MXBeanSupport的register方法。
public void register(MBeanServer server, ObjectName name)
throws InstanceAlreadyExistsException {
if (name == null)
throw new IllegalArgumentException("Null object name");
// eventually we could have some logic to supply a default name
synchronized (lock) {
// 每个MBeanServer有唯一一个mxbeanLookup
this.mxbeanLookup = MXBeanLookup.lookupFor(server);
// 此场景getResource()返回的就是GenericObjectPool,与其name进行映射
this.mxbeanLookup.addReference(name, getResource());
this.objectName = name;
}
}
mxbeanLookup.addReference
synchronized void addReference(ObjectName name, Object mxbean)
throws InstanceAlreadyExistsException {
ObjectName existing = mxbeanToObjectName.get(mxbean);
if (existing != null) {
String multiname = AccessController.doPrivileged(
new GetPropertyAction("jmx.mxbean.multiname"));
if (!"true".equalsIgnoreCase(multiname)) {
throw new InstanceAlreadyExistsException(
"MXBean already registered with name " + existing);
}
}
mxbeanToObjectName.put(mxbean, name);
}
注意这里还有个小问题:如果mbean到name的映射没有问题,但是name到mbean的映射有问题,进行下次重试循环,岂不是mbean到name的映射会很多?
很妙的一个地方是:
// 此处缓存MBean到name的映射
((DynamicMBean2) mbean).preRegister2(server, logicalName);
registerFailed = true; // until we succeed
这个地方竟然赋值registerFailed为true,表示注册失败了。在registerDynamicMBean方法中有个finally的postRegister方法。
private static void postRegister(
ObjectName logicalName, DynamicMBean mbean,
boolean registrationDone, boolean registerFailed) {
if (registerFailed && mbean instanceof DynamicMBean2)
// 此处进行一下mbean到name的移除
((DynamicMBean2) mbean).registerFailed();
try {
if (mbean instanceof MBeanRegistration)
((MBeanRegistration) mbean).postRegister(registrationDone);
} catch (RuntimeException e) {
//
} catch (Error er) {
//
}
}
@Override
public void unregister() {
synchronized (lock) {
if (mxbeanLookup != null) {
if (mxbeanLookup.removeReference(objectName, getResource()))
objectName = null;
}
}
}
6. registerWithRepository
该方法缓存name到MBean的映射。
private ResourceContext registerWithRepository(
final Object resource,
final DynamicMBean object,
final ObjectName logicalName)
throws InstanceAlreadyExistsException,
MBeanRegistrationException {
// 删除部分逻辑
repository.addMBean(object, logicalName, context);
// May throw InstanceAlreadyExistsException
// 删除部分逻辑
return context;
}
repository.addMBean
public void addMBean(final DynamicMBean object, ObjectName name,
final RegistrationContext context)
throws InstanceAlreadyExistsException {
// 删除部分逻辑
lock.writeLock().lock();
try {
// If domain does not already exist, add it to the hash table
final Map<String,NamedObject> moiTb = domainTb.get(dom);
if (moiTb == null) {
addNewDomMoi(object, dom, name, context);
return;
} else {
// 此处是关键
// Add instance if not already present
String cstr = name.getCanonicalKeyPropertyListString();
NamedObject elmt= moiTb.get(cstr);
if (elmt != null) {
throw new InstanceAlreadyExistsException(name.toString());
} else {
nbElements++;
addMoiToTb(object,name,cstr,moiTb,context);
}
}
} finally {
lock.writeLock().unlock();
}
}
从上面的代码可以看到如果name已经映射了mBean就会抛出异常InstanceAlreadyExistsException。
至此问题的根本原因已经找到。这面进行问题复现。
三、问题复现
看了commons pool的源码之后,理论分析出问题的原因后,复现下问题。
复现思路:这个问题出现在JedisPool的构建时期,可以多线程并发构建JedisPool看下效果。
代码依赖如下:
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>2.9.0</version>
</dependency>
代码如下:
package org.example.pool;
import redis.clients.jedis.JedisPool;
import java.io.IOException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class PoolTest {
private static volatile long time = System.currentTimeMillis();
private static AtomicInteger count = new AtomicInteger(0);
public static void main(String[] args) {
Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
time = System.currentTimeMillis();
}
}, 1, 1, TimeUnit.SECONDS);
int size = 1000;
ExecutorService executorService = Executors.newFixedThreadPool(size);
for (int i = 0; i < size; i++) {
executorService.execute(new Runnable() {
@Override
public void run() {
while (true) {
count.incrementAndGet();
long startTime = time;
JedisPool jedisPool = new JedisPool();
long cost = time - startTime;
if (cost > 2 * 60 * 1000) {
System.out.println("timeout, count: " + count.get());
break;
}
if (count.get() % 500 == 0) {
System.out.println("count: " + count.get() + " cost: " + cost);
}
}
}
});
}
try {
System.in.read();
} catch (IOException e) {
e.printStackTrace();
}
}
}
执行结果如下:
可以发现,当创建2000个对象的时候,在多并发情况下就会耗时长达15s;当创建9000个对象的时候,耗时已经耗费2分钟。
四、JMX验证
既然打开了JMX,那可以用jconsole进行分析一下。
1. 概览
内存在持续增长。
2. MBean
可以看到GenericObjectPool多达1W个。
3. 内存
内存持续增长。
4. 线程
可以看到线程的堆栈,分析线程都卡在了什么地方。
五、JMAP验证
使用jmap dump一下堆内存,看下内存对象是否符合预期。
jmap -dump:format=b,file=heapdump.phrof pid
维护name到mxBean的映射。
再看下这个HashMap中的value字段。是个MXBeanSupport。这个对象也很重。维护者mxBean到name的映射。
堆dump中没有太多的JedisPool,都被GC了。不是很多的GenericObjectPool对象都被引用着吗?为什么会被GC了呢?不是GC只会收集未被引用的对象?主要是因为JedisPool并不是继承自GenericObjectPool。代码如下:
GenericObjectPool很多,而且里面有3个长度为100的AtomicLong数据组。
每个GenericObjectPool都有3个长度为100的AtomicLong数据组。这个也很坑。在工程场景下,如果用很多JedisPool进行连接池管理,会很耗费内存。
六、问题解决
经过分析要解决这个问题就很简单了。只需要在pool config的时候关闭JMX功能就行了。
JedisPoolConfig jedisPoolConfig = new JedisPoolConfig();
jedisPoolConfig.setJmxEnabled(false);
JedisPool jedisPool = new JedisPool(jedisPoolConfig, "localhost");
七、进一步思考
在RPC场景下,使用JedisPool作为连接池,由于会有频繁的实例上下线,实例上线的话自然会创建连接,无话可说。如果是实例下线了,自然会有实例下线事件被感知,负载均衡策略不会再发现这个实例了,但是依然需要将该实例的连接池进行销毁。
jedisPool.destroy(); 这个方法最终会将上面的两个映射缓存移除。具体实现在DefaultMBeanServerInterceptor#exclusiveUnregisterMBean方法中:
private void exclusiveUnregisterMBean(ObjectName name)
throws InstanceNotFoundException, MBeanRegistrationException {
DynamicMBean instance = getMBean(name);
// may throw InstanceNotFoundException
checkMBeanPermission(instance, null, name, "unregisterMBean");
if (instance instanceof MBeanRegistration)
preDeregisterInvoke((MBeanRegistration) instance);
final Object resource = getResource(instance);
// Unregisters the MBean from the repository.
// Returns the resource context that was used.
// The returned context does nothing for regular MBeans.
// For ClassLoader MBeans and JMXNamespace (and JMXDomain)
// MBeans - the context makes it possible to unregister these
// objects from the appropriate framework artifacts, such as
// the CLR or the dispatcher, from within the repository lock.
// In case of success, we also need to call context.done() at the
// end of this method.
//
// 取消name到mBean的映射
final ResourceContext context =
unregisterFromRepository(resource, instance, name);
try {
// 取消mBean到name的映射
if (instance instanceof MBeanRegistration)
postDeregisterInvoke(name,(MBeanRegistration) instance);
} finally {
context.done();
}
}
硬广告
欢迎关注公众号:double6