“我报名参加金石计划1期挑战——瓜分10万奖池,这是我的第一篇文章,点击查看活动详情”
What happened
我们的项目是一个图片浏览的 App,为了用户有良好的体验效果,会定时在后台下载壁纸数据,且同时会将对应的图片提前下载到本地,等到用户打开App时,不需要再等待图片下载,就可以立即看到图片。但在开发和测试的过程中,会出现下载图片超时错误的情况,而且在最近一次需求中,出现的概率大大增加,所以决心深入 Glide 源码探究。
下载超时 log:
GlideHelper: img download error. url = https://xxxx/wbp,q_auto,h_2276,w_1080,c_fill/https%3A%2F%2Fcdn.xxx.com%2Fstatic%2Fwallpaper%2F3518313389605031946_GETTY_996206592.jpg%3Fut%3D1624552936
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException : Read timed out
at com.bumptech.glide.request.RequestFutureTarget.doGet(RequestFutureTarget.java:189)
at com.bumptech.glide.request.RequestFutureTarget.get(RequestFutureTarget.java:100)
at com.xxx.common.glide.GlideHelper.download(GlideHelper.java:61)
at com.xxx.manager.KSyncStrategyManager.downloadWallpapers(KSyncStrategyManager.java:239)
at com.xxx.manager.KSyncStrategyManager.startDownloadWallpaperJobs(KSyncStrategyManager.java:60)
at com.xxx.manager.KSyncStrategyManager.startSyncWallpaperListByUserLikeTag(KSyncStrategyManager.java:117)
...
源码分析
先附上我总结的整个源码分析的逻辑图,以下是分析的过程,可能会有冗长,如果想知道如何设置超时时间,可以直接跳到解决方案。
由于我们对 Glide 进行了升级,我对旧版本(3.7.0)和新版本(4.13.0)都进行了分析,两个版本区别不是特别大,以下主要以 4.13.0 来做分析的,最后会把有区别的地方标记在逻辑图中。
我们项目中调用 Glide 的 downloadOnly() 去下载图片:
public static File download(Context context, String url, int timeout) {
...
File file = null;
try {
file = Glide.with(context)
.load(glideUrl)
.downloadOnly(Target.SIZE_ORIGINAL, Target.SIZE_ORIGINAL)
.get();
} catch (Exception e) {
LogUtils.e(TAG, "img download error. url = " + url, e);
}
return file;
}
首先,我的方式是从 Glide.xxx.get() 开始分析,接下来会调用到 RequestFutureTarget 中的 doGet(xxx),其中是获取 resource,没有涉及到 SocketTimeoutException 相关超时异常,所以我们还得往上一层去追。
public class RequestFutureTarget<R> implements FutureTarget<R>, RequestListener<R> {
...
@Override
public R get() throws InterruptedException, ExecutionException {
try {
return doGet(null);
} catch (TimeoutException e) {
throw new AssertionError(e);
}
}
...
private synchronized R doGet(Long timeoutMillis) throws ExecutionException, InterruptedException, TimeoutException {
if (assertBackgroundThread && !isDone()) {
Util.assertBackgroundThread();
}
if (isCancelled) {
throw new CancellationException();
} else if (loadFailed) {
throw new ExecutionException(exception);
} else if (resultReceived) {
return resource;
}
// 省略了异常判断
...
return resource;
}
}
紧接着就开始分析 submit(xxx),它会调用到 RequestBuilder 中 submit(xxx),最终调用到 4 个参数的 into(xxx),通过一步一步分析,其中调用另一个类 RequestManager.track(xxx),最终会调用到 RequestTracker.runRequest(xxx),从方法名看感觉像是开始执行请求操作了。
public class RequestBuilder<TranscodeType> extends BaseRequestOptions<RequestBuilder<TranscodeType>> implements Cloneable, ModelTypes<RequestBuilder<TranscodeType>> {
...
public FutureTarget<TranscodeType> submit(int width, int height) {
final RequestFutureTarget<TranscodeType> target = new RequestFutureTarget<>(width, height);
return into(target, target, Executors.directExecutor());
}
private <Y extends Target<TranscodeType>> Y into(
@NonNull Y target,
@Nullable RequestListener<TranscodeType> targetListener,
BaseRequestOptions<?> options,
Executor callbackExecutor) {
...
Request request = buildRequest(target, targetListener, options, callbackExecutor);
...
requestManager.clear(target);
target.setRequest(request);
requestManager.track(target, request);
return target;
}
}
public class RequestManager implements ComponentCallbacks2, LifecycleListener, ModelTypes<RequestBuilder<Drawable>> {
...
synchronized void track(@NonNull Target<?> target, @NonNull Request request) {
targetTracker.track(target);
requestTracker.runRequest(request);
}
}
接下来就跟到了 request.begin(),继续往 begin() 中跟进,
public class RequestTracker implements ComponentCallbacks2, LifecycleListener, ModelTypes<RequestBuilder<Drawable>> {
...
public void runRequest(@NonNull Request request) {
requests.add(request);
if (!isPaused) {
request.begin();
} else {
request.clear();
if (Log.isLoggable(TAG, Log.VERBOSE)) {
Log.v(TAG, "Paused, delaying request");
}
pendingRequests.add(request);
}
}
}
可以看出 Request 是一个接口,那么我们需要找到它的实现类,所以需要继续回溯来找到对应 Request 对象的实现类。
public interface Request {
/** Starts an asynchronous load. */
void begin();
...
}
我们可以从 RequestBuilder 中看到 buildRequest() 创建的 Request 对象,它会先调用 buildRequestRecursive(xxx),返回了 mainRequest 和 errorRequest,从命名上可以看出 mainRequest 更像是正常成功逻辑返回的 Request,继续往里跟进就到了 buildThumbnailRequestRecursive() 中, 可以看出其中调用了obtainRequest(), 最终调用 SingleRequest.obtain(xxx), 它返回的就是 SingleRequest, 这样就找到了我们想要的 Request。
public class RequestBuilder<TranscodeType> extends BaseRequestOptions<RequestBuilder<TranscodeType>>implements Cloneable, ModelTypes<RequestBuilder<TranscodeType>> {
...
private Request buildRequest(...) {
return buildRequestRecursive(...);
}
private Request buildRequestRecursive(...) {
...
// Success request
Request mainRequest =
buildThumbnailRequestRecursive(
requestLock,
target,
targetListener,
parentCoordinator,
transitionOptions,
priority,
overrideWidth,
overrideHeight,
requestOptions,
callbackExecutor);
if (errorRequestCoordinator == null) {
return mainRequest;
}
...
// Error request
Request errorRequest =
errorBuilder.buildRequestRecursive(
requestLock,
target,
targetListener,
errorRequestCoordinator,
errorBuilder.transitionOptions,
errorBuilder.getPriority(),
errorOverrideWidth,
errorOverrideHeight,
errorBuilder,
callbackExecutor);
errorRequestCoordinator.setRequests(mainRequest, errorRequest);
return errorRequestCoordinator;
}
private buildThumbnailRequestRecursive(...) {
if (thumbnailBuilder != null) {
...
ThumbnailRequestCoordinator coordinator = new ThumbnailRequestCoordinator(requestLock, parentCoordinator);
Request fullRequest = obtainRequest(...);
...
return coordinator;
} else if (thumbSizeMultiplier != null) {
// Base case: thumbnail multiplier generates a thumbnail request, but cannot recurse.
ThumbnailRequestCoordinator coordinator =
new ThumbnailRequestCoordinator(requestLock, parentCoordinator);
Request fullRequest = obtainRequest(...);
...
Request thumbnailRequest = obtainRequest(xxx);
coordinator.setRequests(fullRequest, thumbnailRequest);
return coordinator;
} else {
// Base case: no thumbnail.
return obtainRequest(...);
}
}
private Request obtainRequest(...) {
return SingleRequest. obtain (...);
}
}
赶紧来了解 SingleRequest 中对应的 begin() 干了什么事,通过分析可以看出,调用到了onSizeReady(xxx),然后可以看出其中调用的 Engine.load(xxx),从名字看,感觉我们离真正的请求越来越近了。
public final class SingleRequest<R> implements Request, SizeReadyCallback, ResourceCallback {
...
@Override
public void begin() {
...
if (Util.isValidDimensions(overrideWidth, overrideHeight)) {
onSizeReady(overrideWidth, overrideHeight);
} else {
target.getSize(this);
}
...
}
@Override
public void onSizeReady(int width, int height) {
...
loadStatus =
engine.load(
glideContext,
model,
requestOptions.getSignature(),
this.width,
this.height,
requestOptions.getResourceClass(),
transcodeClass,
priority,
requestOptions.getDiskCacheStrategy(),
requestOptions.getTransformations(),
requestOptions.isTransformationRequired(),
requestOptions.isScaleOnlyOrNoTransform(),
requestOptions.getOptions(),
requestOptions.isMemoryCacheable(),
requestOptions.getUseUnlimitedSourceGeneratorsPool(),
requestOptions.getUseAnimationPool(),
requestOptions.getOnlyRetrieveFromCache(),
this,
callbackExecutor);
...
}
}
}
从逻辑可以看出,如果 memoryResource = null(也就是内存缓存数据为空),就会去请求网络数据;然后就调用到 waitForExistingOrStartNewJob(xxx) 中,然后其中关键代码就是 engineJob.start(decodeJob),接下来就到了 EngineJob 中。
public class Engine implements EngineJobListener, MemoryCache.ResourceRemovedListener, EngineResource.ResourceListener {
...
public <R> LoadStatus load(...) {
...
synchronized (this) {
memoryResource = loadFromMemory(key, isMemoryCacheable, startTime);
if (memoryResource == null) {
// 发起网络请求
return waitForExistingOrStartNewJob(
...
);
}
}
...
}
private <R> LoadStatus waitForExistingOrStartNewJob(...) {
...
EngineJob<R> engineJob =
engineJobFactory.build(
...
);
DecodeJob<R> decodeJob =
decodeJobFactory.build(
...
);
jobs.put(key, engineJob);
engineJob.addCallback(cb, callbackExecutor);
engineJob.start(decodeJob);
...
}
}
从下面代码可以看出 EngineJob.start(xxx) 干的事非常简单,利用ExecuteService.execute(xxx) 启动线程池来执行异步任务,很明显 DecodeJob 肯定实现了 Runnable,所以就需要去 DecodeJob 中查看 run() 的具体逻辑。
class EngineJob<R> implements DecodeJob.Callback<R>, Poolable {
...
public synchronized void start(DecodeJob<R> decodeJob) {
this.decodeJob = decodeJob;
GlideExecutor executor = decodeJob.willDecodeFromCache() ? diskCacheExecutor : getActiveSourceExecutor();
executor.execute(decodeJob);
}
...
}
经过梳理可以看出 DecodeJob 核心代码就只有 runWrapped(),而 runWrapped() 中逻辑有了分支,然后对其中方法进行查看分析,最终可以确认 runGenerators() 是主要逻辑,其中 while 循环体中只是做了赋值操作,肯定还有其他地方执行了网络的数据获取,最终分析找到了currentGenerator.startNext(), 而 currentGenerator 对象的类型是 DataFetcherGenerator,从命名看太像是做网络数据的获取类了,所以我们得找到其实现类,而 currentGenerator 是由 getNextGenerator() 返回 , 从中根据 case 命名分析 SourceGenerator(xxx) 更像和网络请求相关的逻辑。
class DecodeJob<R> implements DataFetcherGenerator.FetcherReadyCallback, Runnable,Comparable<DecodeJob<?>>,Poolable {
public void run() {
...
try {
if (isCancelled) {
notifyFailed();
return;
}
runWrapped();
} catch (CallbackException e) {
}
...
}
private void runWrapped() {
switch (runReason) {
case INITIALIZE:
stage = getNextStage(Stage.INITIALIZE);
currentGenerator = getNextGenerator();
runGenerators();
break;
case SWITCH_TO_SOURCE_SERVICE:
runGenerators();
break;
case DECODE_DATA:
decodeFromRetrievedData();
break;
default:
throw new IllegalStateException("Unrecognized run reason: " + runReason);
}
}
private void runGenerators() {
...
while (!isCancelled
&& currentGenerator != null
&& !(isStarted = currentGenerator.startNext())) {
// 赋值
stage = getNextStage(stage);
currentGenerator = getNextGenerator();
...
}
...
}
private DataFetcherGenerator getNextGenerator() {
switch (stage) {
case RESOURCE_CACHE:
return new ResourceCacheGenerator(decodeHelper, this);
case DATA_CACHE:
return new DataCacheGenerator(decodeHelper, this);
case SOURCE:
return new SourceGenerator(decodeHelper, this);
case FINISHED:
return null;
default:
throw new IllegalStateException("Unrecognized stage: " + stage);
}
}
}
进入到 SourceGenerator中,查看 startNext(),可以看到其中有 loadData.fetcher.loadData(xxx),感觉和网络请求很相似了。
class SourceGenerator implements DataFetcherGenerator, DataFetcherGenerator.FetcherReadyCallback {
...
public boolean startNext() {
...
while (!started && hasNextModelLoader()) {
loadData = helper.getLoadData().get(loadDataListIndex++);
if (loadData != null
&& (helper.getDiskCacheStrategy().isDataCacheable(loadData.fetcher.getDataSource())
|| helper.hasLoadPath(loadData.fetcher.getDataClass()))) {
started = true;
startNextLoad(loadData);
}
}
return started;
}
private void startNextLoad(final LoadData<?> toStart) {
// 请求数据
loadData.fetcher.loadData(
helper.getPriority(),
new DataCallback<Object>() {
@Override
public void onDataReady(@Nullable Object data) {
if (isCurrentRequest(toStart)) {
onDataReadyInternal(toStart, data);
}
}
@Override
public void onLoadFailed(@NonNull Exception e) {
if (isCurrentRequest(toStart)) {
onLoadFailedInternal(toStart, e);
}
}
});
}
}
从代码中可以看出 loadData.fetcher 是 DataFetcher 接口类型,按正常逻辑就得回退到 loadData 查看它的逻辑,如果没有具体的实现,就得再一级一级回退去寻找,目前我尝试了下,发现很容易就陷入了“无限循环”中,根据个人经验,我推荐两个方法:
- 直接查看 DataFetcher 接口类型,如果实现类少,可以很快判断出;通过快捷方式可以看出实现该接口的类很多,这时候更多凭经验可以看出 HttpUrlFetcher 更像是常规请求方式。
2.【强烈推荐】利用 Debug 调试,现在已经离“真相”非常近了,我们可以直接 Debug 打上断点,非常容易可以看出其实现类为 HttpUrlFetcher。
以下是 HttpUrlFetcher 中核心请求设置超时的代码,不同版本实现略有不同,但可以看出 timeout 默认为 2500ms,且在旧版本中是无法修改的。
// version = 3.7.0
public class HttpUrlFetcher implements DataFetcher<InputStream> {
...
private InputStream loadDataWithRedirects(
URL url, int redirects, URL lastUrl, Map<String, String> headers) throws HttpException {
...
urlConnection.setConnectTimeout(2500);
urlConnection.setReadTimeout(2500);
urlConnection.setUseCaches(false);
urlConnection.setDoInput(true);
...
}
}
// version = 4.13.0
public class HttpUrlFetcher implements DataFetcher<InputStream> {
...
private HttpURLConnection buildAndConfigureConnection(URL url, Map<String, String> headers) throws HttpException {
HttpURLConnection urlConnection;
...
urlConnection.setConnectTimeout(timeout);
urlConnection.setReadTimeout(timeout);
urlConnection.setUseCaches(false);
urlConnection.setDoInput(true);
...
return urlConnection;
}
}
解决方案
由于图片库版本相对老旧,我们首先将图片库从 3.7.0 升级到了 4.13.0,同时我们项目中目前使用的网络库是 OkHttp 库,所以直接将 Glide 底层网络库切换到 OkHttp。
可以参考官网: muyangmin.github.io/glide-docs-…
添加一个对 OkHttp 集成库的依赖
implementation "com.github.bumptech.glide:okhttp3-integration:4.13.0"
替换底层网络库
@GlideModule
public class EmagGlideModule extends AppGlideModule {
...
@Override
public void registerComponents(@NonNull Context context, @NonNull Glide glide, @NonNull Registry registry) {
registry.replace(GlideUrl.class, InputStream.class, new OkHttpUrlLoader.Factory());
}
}
有的同学可能会疑惑,这就完呢?也没见你设置超时时间,不要急,接下来继续分析。 从 OkHttpUrlLoader 可以看出其中可以自定义 OkHttpClient 对象传入,也可以使用默认的,即 new OkHttpClient(),接下来我们就看 OkHttpClient 默认实现是如何设置超时时间的。
public class OkHttpUrlLoader implements ModelLoader<GlideUrl, InputStream> {
...
@SuppressWarnings("WeakerAccess")
public static class Factory implements ModelLoaderFactory<GlideUrl, InputStream> {
private static volatile Call.Factory internalClient;
private final Call.Factory client;
private static Call.Factory getInternalClient() {
if (internalClient == null) {
synchronized (Factory.class) {
if (internalClient == null) {
internalClient = new OkHttpClient();
}
}
}
return internalClient;
}
// 默认使用 internalClient
public Factory() {
this(getInternalClient());
}
// 自定义 OkHttpClient
public Factory(@NonNull Call.Factory client) {
this.client = client;
}
@NonNull
@Override
public ModelLoader<GlideUrl, InputStream> build(MultiModelLoaderFactory multiFactory) {
return new OkHttpUrlLoader(client);
}
...
}
}
OkHttpClient 使用 Kotlin 实现的,可以看出其默认实现中 connectTimeout 为 10s,我认为该超时时间对于我们业务相对合理,也就没有自行实现 OkHttpClient 去设置超时时间了,如果你的业务有需要可自行实现。
open class OkHttpClient internal constructor(builder: Builder) : Cloneable, Call.Factory, WebSocket.Factory {
...
// 默认实现
constructor() : this(Builder())
class Builder constructor() {
...
internal var callTimeout = 0
internal var connectTimeout = 10_000
internal var readTimeout = 10_000
internal var writeTimeout = 10_000
...
}
}