今天分享一下之前在排查anr的时候遇到的一个卡顿问题。因为隔得时间有点久了,所以堆栈找不到了。只能记得这个卡顿的堆栈是长时间block在 QueuedWork.waitToFinish 的调用处,业务触发点则是SharedPreference 的 apply。
SharedPreference apply不是运行在子线程吗,为什么还会导致主线程卡顿?我们从apply的流程看起:
问题分析
sp在commit的时候会直接在当前线程执行commitToMemory和enqueueDiskWrite:
// SharedPreferenceImpl.java
@Override
public boolean commit() {
long startTime = 0;
MemoryCommitResult mcr = commitToMemory();
SharedPreferencesImpl.this.enqueueDiskWrite(mcr, null);
try {
mcr.writtenToDiskLatch.await();
} catch (InterruptedException e) {
return false;
} finally {
}
notifyListeners(mcr);
return mcr.writeToDiskResult;
}
apply逻辑如下:
@Override
public void apply() {
final long startTime = System.currentTimeMillis();
final MemoryCommitResult mcr = commitToMemory();
final Runnable awaitCommit = new Runnable() {
@Override
public void run() {
mcr.writtenToDiskLatch.await();
}
};
QueuedWork.addFinisher(awaitCommit);
Runnable postWriteRunnable = new Runnable() {
@Override
public void run() {
awaitCommit.run();
QueuedWork.removeFinisher(awaitCommit);
}
};
SharedPreferencesImpl.this.enqueueDiskWrite(mcr, postWriteRunnable);
notifyListeners(mcr);
}
这里有2个Runnable:
- 添加给QueueWork
public static void addFinisher(Runnable finisher) {
synchronized (sLock) {
sFinishers.add(finisher);
}
}
这里就是把Runnable存在一个List里,
- enqueueDiskWrite传入的Runnable
private void enqueueDiskWrite(final MemoryCommitResult mcr,final Runnable postWriteRunnable) {
final boolean isFromSyncCommit = (postWriteRunnable == null);
final Runnable writeToDiskRunnable = new Runnable() {
@Override
public void run() {
synchronized (mWritingToDiskLock) {
writeToFile(mcr, isFromSyncCommit);
}
synchronized (mLock) {
mDiskWritesInFlight--;
}
if (postWriteRunnable != null) {
postWriteRunnable.run();
}
}
};
if (isFromSyncCommit) {
// 当前线程执行runable
// ... ignore
return
}
QueuedWork.queue(writeToDiskRunnable, !isFromSyncCommit);
}
可以看到写入本地文件的任务也是提交到QueuedWork执行的:
public static void queue(Runnable work, boolean shouldDelay) {
Handler handler = getHandler();
synchronized (sLock) {
sWork.add(work);
if (shouldDelay && sCanDelay) {
handler.sendEmptyMessageDelayed(QueuedWorkHandler.`, DELAY);
} else {
handler.sendEmptyMessage(QueuedWorkHandler.MSG_RUN);
}
}
}
QueuedWork里面用swork保存任务,然后在HandlerThread里通过消息触发执行:
// QueuedWorkHandler
public void handleMessage(Message msg) {
if (msg.what == MSG_RUN) {
processPendingWork();
}
}
private static void processPendingWork() {
synchronized (sProcessingWork) {
LinkedList<Runnable> work;
synchronized (sLock) {
work = sWork;
sWork = new LinkedList<>();
handlerRemoveMessages(QueuedWorkHandler.MSG_RUN);
}
if (work.size() > 0) {
for (Runnable w : work) {
w.run();
}
}
}
}
在ActivityThread 里,处理Activity pause、stop的时候也会执行waitToFinish:
// ActivityThread
@Override
public void handleStopActivity(
ActivityClientRecord r,
int configChanges,
PendingTransactionActions pendingActions,
boolean finalStateRequest,
String reason
) {
// ...
if (!r.isPreHoneycomb()) {
QueuedWork.waitToFinish();
}
// ...
}
waitToFinish 会执行 sFinishers 里面的Runnable:
public static void waitToFinish() {
// ...
processPendingWork();
while (true) {
Runnable finisher;
synchronized (sLock) {
finisher = sFinishers.poll();
}
if (finisher == null) {
break;
}
finisher.run();
}
// ...
}
我们把apply流程画一下:
这里能发现2个问题:
- 主线程会在页面退出前阻塞等待sp完成,造成block等待,甚至造成anr。
- 主线程退出之前会直接去主线程执行sp,主线程io,anr风险更大了。
sp如此设计的原因分析一下应该是:
- 保证页面关闭前sp写入完成
- 页面关闭前拿到主线程来执行,提高优先级,能更大概率完成
解决思路
正向思路
从流程分析我们可以知道,Activity stop阻塞太久导致anr,本质还是因为sp操作太慢了,大概率是你写入的内容太多。比较正向的思路是简化你的数据存储。如果比价复杂的缓存数据,可以考虑存到数据库,而不是一股脑往sp写。
篡改思路
现实情况下sp读写的地方比较多,本地存储配置、标记等也不是业务重点,花费大量时间去简化治理得不偿失,还容易引入新问题,所以可以考虑通过反射篡改一下,思路如下:
- 我们可以不要阻塞等待完成的逻辑,讲道理其实也没有什么很强烈的需求说一定要等待保证页面结束之前sp写入完成。(修改sFinishers字段,保证每次获取都是空列表)
- 主线程会调用processPendingWork,遍历执行sWork里的Runnable(修改sWork字段,让每次执行的时候在子线程启动)
如何修改?
- sFinishers 替换成一个我们自己的 LinkedList,重写poll返回null:
class ProxyFinishList(private val finishs:LinkedList<Runnable>):LinkedList<Runnable>() {
override fun poll(): Runnable? {
return null
}
override fun add(element: Runnable): Boolean {
return finishs.add(element)
}
override fun remove(): Runnable {
return finishs.remove()
}
override fun isEmpty(): Boolean {
return true
}
}
- sWork是hide api,你需要找一个支持反射hide api的框架来支持一下。也替换为一个我们自己定义的LinkedList,这里有个问题,在android12之前,执行sWork的时候是clone一个新的:
在android12之后是直接赋值一个新对象:
所以hook策略上我们要区别一下,Android12以上在调用size的时候我们重新hook下,在调用size()的时候去触发启动任务:
class WorkProxyList(private val wrapper:LinkedList<Runnable>,private val handler:Handler,private val reHook:()->Unit):LinkedList<Runnable>() {
override fun isEmpty(): Boolean {
return wrapper.isEmpty()
}
override fun add(element: Runnable): Boolean {
return wrapper.add(WorkRunnableProxy(handler,element))
}
override fun remove(element: Runnable): Boolean {
return wrapper.remove(element)
}
override val size: Int
get() {
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O && Build.VERSION.SDK_INT < Build.VERSION_CODES.S) {
runWorks()
reHook()
return 0;
} else {
return wrapper.size
}
}
override fun clone(): Any {
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O && Build.VERSION.SDK_INT < Build.VERSION_CODES.S) {
runWorks()
return WorkProxyList(LinkedList(), handler, reHook)
} else{
return wrapper.clone()
}
}
private fun runWorks() {
if (wrapper.size==0) {
return
}
val works:LinkedList<Runnable> = wrapper.clone() as LinkedList<Runnable>
wrapper.clear()
handler.post {
works.forEach { it.run() }
}
}
}