记一次TooManyRequestsException崩溃排查历程

3,043 阅读8分钟

背景

4月份开始,公司的几个Android项目线上突然出现了很多华为鸿蒙的设备TooManyRequestsException的问题,在社区上也零零散散看到一些关于华为鸿蒙设备出现TooManyRequestsException的反馈,但是都没有明确的解决方案。

排查过程

1、从崩溃堆栈看,问题发生在WorkManager库里的注册网络监听的代码里。

2、我们立刻排查了项目里用到WorkManager的地方,发现在闪屏广告里用到了WorkManager下载图片的逻辑,但是这块的代码一直都没有改动,咨询了业务方,在发生异常前确实有上线一些城市的闪屏广告,于是寻求业务方配合临时下线部份城市的闪屏广告,崩溃异常有所下降。

3、进一步分析了项目中使用WorkManager的地方,发现这一业务逻辑并不是最近上线的新功能,之前也从未出现过类似问题。同时,我还查看了应用中所有注册网络监听的地方,因为从堆栈信息来看,问题是由于注册了过多的网络监听,触发了系统的阈值限制。在对第三方库的分析中,我确实发现了一些注册网络监听的代码,但这些地方都已经加上了try-catch块。

如glide图片加载库的注册逻辑

4、为了解决这个问题,我们首先对使用了WorkManager的业务逻辑进行了改造,同时在项目中所有使用了registerDefaultNetworkCallback的地方添加了try-catch保护。这一措施暂时缓解了崩溃问题,但可能会影响一些依赖网络监听的业务。我们正在进一步分析和优化,以确保业务的正常运行。

于是对 registerDefaultNetworkCallback 的源码逻辑进行了分析:

源码分析

为了更全面地解决问题,我们深入分析了registerDefaultNetworkCallback在不同Android版本中的源码逻辑。尽管在Android 12以下和Android 12及以上版本中存在一些差异,但整体核心逻辑并未发生重大变化。以下是具体的源码分析:

Android11

源码

registerDefaultNetworkCallback是一个跨进程的操作,首先从ConnectivityManager类的registerDefaultNetworkCallback开始,调用了sendRequestForNetwork,之后跨进程调用了ConnectivityServicerequestNetwork方法

ConnectivityManager

ConnectivityManager cm = (ConnectivityManager) this.getSystemService(Context.CONNECTIVITY_SERVICE);
cm.registerDefaultNetworkCallback(new ConnectivityManager.NetworkCallback(){
});
@RequiresPermission(android.Manifest.permission.ACCESS_NETWORK_STATE)
public void registerDefaultNetworkCallback(@NonNull NetworkCallback networkCallback,
        @NonNull Handler handler) {
    // This works because if the NetworkCapabilities are null,
    // ConnectivityService takes them from the default request.
    //
    // Since the capabilities are exactly the same as the default request's
    // capabilities, this request is guaranteed, at all times, to be
    // satisfied by the same network, if any, that satisfies the default
    // request, i.e., the system default network.
    CallbackHandler cbHandler = new CallbackHandler(handler);
    sendRequestForNetwork ( null  /* NetworkCapabilities need */ , networkCallback, 0 ,
 REQUEST , TYPE_NONE , cbHandler);
}
private NetworkRequest sendRequestForNetwork(NetworkCapabilities need, NetworkCallback callback,
        int timeoutMs, int action, int legacyType, CallbackHandler handler) {
    printStackTrace();
    checkCallbackNotNull(callback);
    Preconditions.checkArgument(action == REQUEST || need != null, "null NetworkCapabilities");
    final NetworkRequest request;
    final String callingPackageName = mContext.getOpPackageName();
    try {
        synchronized(sCallbacks) {
            ...
            if (action == LISTEN) {
                request = mService.listenForNetwork(
                        need, messenger, binder, callingPackageName);
            } else {
                request = mService. requestNetwork (
need, messenger, timeoutMs, binder, legacyType, callingPackageName);
            }
            if (request != null) {
                sCallbacks.put(request, callback);
            }
            callback.networkRequest = request;
        }
    } catch (RemoteException e) {
        throw e.rethrowFromSystemServer();
    } catch (ServiceSpecificException e) {
        throw convertServiceException(e);
    }
    return request;
}

ConnectivityService

ConnectivityServicerequestNetwork中创建了一个NetworkRequestInfo对象,在NetworkRequestInfo的构造函数调用了enforceRequestCountLimit检查数量上限,如果通过uid的注册数量大于等于100,则抛出TOO_MANY_REQUESTS异常,由ConnectivityManagersendRequestForNetwork捕获并再次抛出

@Override
public NetworkRequest requestNetwork(NetworkCapabilities networkCapabilities,
        Messenger messenger, int timeoutMs, IBinder binder, int legacyType,
        @NonNull String callingPackageName) {
    ...
    
    NetworkRequest networkRequest = new NetworkRequest(networkCapabilities, legacyType,
            nextNetworkRequestId(), type);
    NetworkRequestInfo nri = new  NetworkRequestInfo (messenger, networkRequest, binder);
    if (DBG) log("requestNetwork for " + nri);

    mHandler.sendMessage(mHandler.obtainMessage(EVENT_REGISTER_NETWORK_REQUEST, nri));
    if (timeoutMs > 0) {
        mHandler.sendMessageDelayed(mHandler.obtainMessage(EVENT_TIMEOUT_NETWORK_REQUEST,
                nri), timeoutMs);
    }
    return networkRequest;
}

NetworkRequestInfo

NetworkRequestInfo(Messenger m, NetworkRequest r, IBinder binder) {
    super();
    ...
    enforceRequestCountLimit ();

    try {
        mBinder.linkToDeath(this, 0);
    } catch (RemoteException e) {
        binderDied();
    }
}
private void enforceRequestCountLimit() {
    synchronized (mUidToNetworkRequestCount) {
        int networkRequests = mUidToNetworkRequestCount.get(mUid, 0) + 1;
        if (networkRequests >= MAX_NETWORK_REQUESTS_PER_UID ) {
 throw  new  ServiceSpecificException (
 ConnectivityManager . Errors . TOO_MANY_REQUESTS );
}
        mUidToNetworkRequestCount.put(mUid, networkRequests);
    }
}
时序图

Android14

源码

www.aospxref.com/android-14.…

在Android14中 ConnectivityManager里的调用逻辑和Android11中没有太大区别,只是Android多了registerDefaultNetworkCallbackForUid方法,但是在应用层的调用传入的uid是固定的值

ConnectivityManager

ConnectivityManager cm = (ConnectivityManager) this.getSystemService(Context.CONNECTIVITY_SERVICE);
cm.registerDefaultNetworkCallback(new ConnectivityManager.NetworkCallback(){
});
@RequiresPermission(android.Manifest.permission.ACCESS_NETWORK_STATE)
public void registerDefaultNetworkCallback(@NonNull NetworkCallback networkCallback,
        @NonNull Handler handler) {
    registerDefaultNetworkCallbackForUid(Process.INVALID_UID, networkCallback, handler);
}
public void registerDefaultNetworkCallbackForUid(int uid,
        @NonNull NetworkCallback networkCallback, @NonNull Handler handler) {
    CallbackHandler cbHandler = new CallbackHandler(handler);
    sendRequestForNetwork(uid, null /* need */, networkCallback, 0 /* timeoutMs */,
            TRACK_DEFAULT, TYPE_NONE, cbHandler);
}
private NetworkRequest sendRequestForNetwork(int asUid, NetworkCapabilities need,
        NetworkCallback callback, int timeoutMs, NetworkRequest.Type reqType, int legacyType,
        CallbackHandler handler) {
    ....
    try {
        synchronized(sCallbacks) {
            ......
            if (reqType == LISTEN) {
                request = mService.listenForNetwork(
                        need, messenger, binder, callbackFlags, callingPackageName,
                        getAttributionTag());
            } else {
                // 走到ConnectivityService里
               request = mService. requestNetwork (
asUid, need, reqType. ordinal (), messenger, timeoutMs, binder,
legacyType, callbackFlags, callingPackageName, getAttributionTag ());
            }
            if (request != null) {
                sCallbacks.put(request, callback);
            }
            callback.networkRequest = request;
        }
    } catch (RemoteException e) {
        throw e.rethrowFromSystemServer();
    } catch (ServiceSpecificException e) {
        //抛异常的点
        throw convertServiceException(e);
    }
    return request;
}

ConnectivityService

在Android14中,ConnectivityService的源码位置发生了改变,在 www.aospxref.com/android-14.…

 @Override
    public NetworkRequest requestNetwork(int asUid, NetworkCapabilities networkCapabilities,
            int reqTypeInt, Messenger messenger, int timeoutMs, final IBinder binder,
            int legacyType, int callbackFlags, @NonNull String callingPackageName,
            @Nullable String callingAttributionTag) {
        ...

        final NetworkRequest networkRequest = new NetworkRequest(networkCapabilities, legacyType,
                nextNetworkRequestId(), reqType);
        final NetworkRequestInfo nri = getNriToRegister (
asUid, networkRequest, messenger, binder, callbackFlags,
callingAttributionTag);
        if (DBG) log("requestNetwork for " + nri);
        trackUidAndRegisterNetworkRequest(EVENT_REGISTER_NETWORK_REQUEST, nri);
        if (timeoutMs > 0) {
            mHandler.sendMessageDelayed(mHandler.obtainMessage(EVENT_TIMEOUT_NETWORK_REQUEST,
                    nri), timeoutMs);
        }
        return networkRequest;
    }
 private NetworkRequestInfo getNriToRegister(final int asUid, @NonNull final NetworkRequest nr,
            @Nullable final Messenger msgr, @Nullable final IBinder binder,
            @NetworkCallback.Flag int callbackFlags,
            @Nullable String callingAttributionTag) {
        ....
        return  new  NetworkRequestInfo (
 asUid , requests , nr , msgr , binder , callbackFlags , callingAttributionTag );
    }
 NetworkRequestInfo(int asUid, @NonNull final List<NetworkRequest> r,
                @NonNull final NetworkRequest requestForCallback, @Nullable final Messenger m,
                @Nullable final IBinder binder,
                @NetworkCallback.Flag int callbackFlags,
                @Nullable String callingAttributionTag) {
            super();
            ....
          mPerUidCounter = getRequestCounter ( this );
 mPerUidCounter . incrementCountOrThrow ( mUid );
            ....
        }
private RequestInfoPerUidCounter getRequestCounter(NetworkRequestInfo nri) {
        return hasAnyPermissionOf(mContext,
                nri.mPid, nri.mUid, NetworkStack.PERMISSION_MAINLINE_NETWORK_STACK)
                ? mSystemNetworkRequestCounter :  mNetworkRequestCounter ;
    }

public static class RequestInfoPerUidCounter extends PerUidCounter {
        RequestInfoPerUidCounter(int maxCountPerUid) {
            super(maxCountPerUid);
        }

        @Override
        public synchronized void incrementCountOrThrow(int uid) {
            try {
                super . incrementCountOrThrow ( uid );
            } catch (IllegalStateException e) {
                throw  new  ServiceSpecificException (
 ConnectivityManager . Errors . TOO_MANY_REQUESTS ,
 "Uid " + uid + " exceeded its allotted requests limit" );
            }
        }

        @Override
        public synchronized void decrementCountOrThrow(int uid) {
            throw new UnsupportedOperationException("Use decrementCount instead.");
        }

        public synchronized void decrementCount(int uid) {
            try {
                super.decrementCountOrThrow(uid);
            } catch (IllegalStateException e) {
                logwtf("Exception when decrement per uid request count: ", e);
            }
        }
    }
 public synchronized void incrementCountOrThrow(final int uid) {
        final long newCount = ((long) mUidToCount.get(uid, 0)) + 1;
        if (newCount > mMaxCountPerUid) {
            throw new IllegalStateException("Uid " + uid + " exceeded its allowed limit");
        }
        // Since the count cannot be greater than Integer.MAX_VALUE here since mMaxCountPerUid
        // is an integer, it is safe to cast to int.
        mUidToCount.put(uid, (int) newCount);
    }
时序图

基于对源码的深入分析,我们发现,为了避免出现问题,需要采取以下措施:

  1. 保证 registerDefaultNetworkCallback unregisterNetworkCallback 成对出现: 确保每次调用registerDefaultNetworkCallback时,都有对应的unregisterNetworkCallback调用,以释放资源,防止累积过多的网络请求。
  2. 控制调用数量或使用 try-catch 保护代码: 在调用registerDefaultNetworkCallback的地方,要么进行try-catch保护,以捕捉并处理可能的异常,要么通过逻辑控制调用的数量,避免超出系统限制。

对于我们项目自身的调用,我们可以严格控制并管理这些调用方式。然而,对于第三方SDK,我们无法直接控制其内部实现方式。因此,为了彻底解决这个问题,我们采取了插桩的方法,将所有调用registerDefaultNetworkCallback和unregisterNetworkCallback的地方收拢起来,进行统一处理。这样不仅能有效监控和管理调用,还能降低出错的风险,确保应用的稳定性。

插桩方案

通过插桩,我们可以动态地注入代码来监控和管理registerDefaultNetworkCallbackunregisterNetworkCallback的调用。例如,可以使用ASM或AspectJ等工具对代码进行字节码级别的修改。

方案思路

为了降低程序中对registerDefaultNetworkCallback方法的频繁使用,我们通过插桩技术将所有使用registerDefaultNetworkCallback方法的代码集中管理。这不仅有助于简化代码结构,还可以方便后续维护和问题排查

方案源码

registerDefaultNetworkCallbackunregisterNetworkCallback的收拢管理类

public class MyConnectivityManager {

    public static List<ConnectivityManager.NetworkCallback> callbacks = new ArrayList<>();

    //降级开关
    public static boolean downgrading = false;

    public static void initConnectivityManager(Context context){

        Log.d("MyConnectivityManager","initConnectivityManager");

        ConnectivityManager cm = (ConnectivityManager) context.getSystemService(Context.CONNECTIVITY_SERVICE);
        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
            cm.registerDefaultNetworkCallback(new ConnectivityManager.NetworkCallback(){
                @Override
                public void onAvailable(@NonNull Network network) {
                    super.onAvailable(network);
                    Log.d("MyConnectivityManager","分发 onAvailable "+callbacks.size());
                    for (int i = 0; i < callbacks.size(); i++) {
                        callbacks.get(i).onAvailable(network);
                    }
                }

                @Override
                public void onLosing(@NonNull Network network, int maxMsToLive) {
                    super.onLosing(network, maxMsToLive);
                    Log.d("MyConnectivityManager","分发 onLosing "+callbacks.size());
                    for (int i = 0; i < callbacks.size(); i++) {
                        callbacks.get(i).onLosing(network,maxMsToLive);
                    }
                }

                @Override
                public void onLost(@NonNull Network network) {
                    super.onLost(network);
                    Log.d("MyConnectivityManager","分发 onLost "+callbacks.size());
                    for (int i = 0; i < callbacks.size(); i++) {
                        callbacks.get(i).onLost(network);
                    }
                }

                @Override
                public void onUnavailable() {
                    super.onUnavailable();
                    Log.d("MyConnectivityManager","分发 onUnavailable "+callbacks.size());
                    for (int i = 0; i < callbacks.size(); i++) {
                        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
                            callbacks.get(i).onUnavailable();
                        }
                    }
                }

                @Override
                public void onCapabilitiesChanged(@NonNull Network network, @NonNull NetworkCapabilities networkCapabilities) {
                    super.onCapabilitiesChanged(network, networkCapabilities);
                    Log.d("MyConnectivityManager","分发 onCapabilitiesChanged "+callbacks.size());
                    for (int i = 0; i < callbacks.size(); i++) {
                        callbacks.get(i).onCapabilitiesChanged(network,networkCapabilities);
                    }
                }

                @Override
                public void onLinkPropertiesChanged(@NonNull Network network, @NonNull LinkProperties linkProperties) {
                    super.onLinkPropertiesChanged(network, linkProperties);
                    Log.d("MyConnectivityManager","分发 onLinkPropertiesChanged "+callbacks.size());
                    for (int i = 0; i < callbacks.size(); i++) {
                        callbacks.get(i).onLinkPropertiesChanged(network,linkProperties);
                    }
                }

                @Override
                public void onBlockedStatusChanged(@NonNull Network network, boolean blocked) {
                    super.onBlockedStatusChanged(network, blocked);
                    Log.d("MyConnectivityManager","分发 onBlockedStatusChanged "+callbacks.size());
                    for (int i = 0; i < callbacks.size(); i++) {
                        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.Q) {
                            callbacks.get(i).onBlockedStatusChanged(network,blocked);
                        }
                    }
                }
            });
        }
    }

    public static void registerDefaultNetworkCallback(ConnectivityManager cm,ConnectivityManager.NetworkCallback callback){
        Log.d("MyConnectivityManager","registerDefaultNetworkCallback:"+callback);

        if (downgrading){
            if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
                cm.registerDefaultNetworkCallback(callback);
            }
        }else {
            callbacks.add(callback);
        }


    }

    public static void unregisterNetworkCallback(ConnectivityManager cm,ConnectivityManager.NetworkCallback callback){
        Log.d("MyConnectivityManager","unregisterNetworkCallback");

        if (downgrading){
            cm.unregisterNetworkCallback(callback);
        }else {
            callbacks.remove(callback);
        }

    }

}

插桩代码核心代码,在插桩的代码中,要注意过滤到com/example/gradledemo/MyConnectivityManager管理类,避免出现死循环

if (methodInsnNode.owner.equals("android/net/ConnectivityManager")&&methodInsnNode.name.equals("registerDefaultNetworkCallback") && "(Landroid/net/ConnectivityManager$NetworkCallback;)V".equals(methodInsnNode.desc)){
    methodInsnNode.owner = "com/example/gradledemo/MyConnectivityManager";
    methodInsnNode.desc = "(Landroid/net/ConnectivityManager;Landroid/net/ConnectivityManager$NetworkCallback;)V";
    methodInsnNode.name = "registerDefaultNetworkCallback";
    methodInsnNode.setOpcode(INVOKESTATIC);
}
if (methodInsnNode.owner.equals("android/net/ConnectivityManager")&&methodInsnNode.name.equals("unregisterNetworkCallback") && "(Landroid/net/ConnectivityManager$NetworkCallback;)V".equals(methodInsnNode.desc)){
    methodInsnNode.owner = "com/example/gradledemo/MyConnectivityManager";
    methodInsnNode.desc = "(Landroid/net/ConnectivityManager;Landroid/net/ConnectivityManager$NetworkCallback;)V";
    methodInsnNode.name = "unregisterNetworkCallback";
    methodInsnNode.setOpcode(INVOKESTATIC);
}

最终结果是cm.registerDefaultNetworkCallback 转换成 MyConnectivityManager.registerDefaultNetworkCallback

插桩前

ConnectivityManager cm = (ConnectivityManager) this.getSystemService(Context.CONNECTIVITY_SERVICE);
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
    cm.registerDefaultNetworkCallback(new ConnectivityManager.NetworkCallback(){
        @Override
        public void onAvailable(@NonNull Network network) {
            super.onAvailable(network);
            Log.d("MyConnectivityManager","onAvailable");
        }

        @Override
        public void onCapabilitiesChanged(@NonNull Network network, @NonNull NetworkCapabilities networkCapabilities) {
            super.onCapabilitiesChanged(network, networkCapabilities);
            Log.d("MyConnectivityManager","onCapabilitiesChanged");
        }

        @Override
        public void onUnavailable() {
            super.onUnavailable();
            Log.d("MyConnectivityManager","onUnavailable");
        }
    });
}

插桩后

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
    MyConnectivityManager.registerDefaultNetworkCallback(cm,new ConnectivityManager.NetworkCallback(){
        @Override
        public void onAvailable(@NonNull Network network) {
            super.onAvailable(network);
            Log.d("MyConnectivityManager","onAvailable");
        }

        @Override
        public void onCapabilitiesChanged(@NonNull Network network, @NonNull NetworkCapabilities networkCapabilities) {
            super.onCapabilitiesChanged(network, networkCapabilities);
            Log.d("MyConnectivityManager","onCapabilitiesChanged");
        }

        @Override
        public void onUnavailable() {
            super.onUnavailable();
            Log.d("MyConnectivityManager","onUnavailable");
        }
    });
}

监控指标

由于我们无法确定华为是否对源码进行了改动,目前只能推测问题与华为设备相关,可能的原因包括:

  1. 华为对源码进行了改动:华为可能对Android源码进行了某些改动,这些改动可能影响了registerDefaultNetworkCallback方法的正常使用。
  2. 华为的厂商推送:华为可能通过系统更新或其他厂商推送,改变了系统服务的行为,导致该方法在华为设备上出现异常。

因此我们需要在线上监控registerDefaultNetworkCallback的使用情况,记录registerDefaultNetworkCallback的调用次数,通过收拢使用场景排查出现多次调用registerDefaultNetworkCallback的地方优化逻辑