深入 Glide 源码探究解决加载图片超时问题

814 阅读7分钟

“我报名参加金石计划1期挑战——瓜分10万奖池,这是我的第一篇文章,点击查看活动详情

What happened

我们的项目是一个图片浏览的 App,为了用户有良好的体验效果,会定时在后台下载壁纸数据,且同时会将对应的图片提前下载到本地,等到用户打开App时,不需要再等待图片下载,就可以立即看到图片。但在开发和测试的过程中,会出现下载图片超时错误的情况,而且在最近一次需求中,出现的概率大大增加,所以决心深入 Glide 源码探究。

下载超时 log:

 GlideHelper: img download error. url = https://xxxx/wbp,q_auto,h_2276,w_1080,c_fill/https%3A%2F%2Fcdn.xxx.com%2Fstatic%2Fwallpaper%2F3518313389605031946_GETTY_996206592.jpg%3Fut%3D1624552936 
    java.util.concurrent.ExecutionException: java.net.SocketTimeoutException : Read timed out
        at com.bumptech.glide.request.RequestFutureTarget.doGet(RequestFutureTarget.java:189)
        at com.bumptech.glide.request.RequestFutureTarget.get(RequestFutureTarget.java:100)
        at com.xxx.common.glide.GlideHelper.download(GlideHelper.java:61)
        at com.xxx.manager.KSyncStrategyManager.downloadWallpapers(KSyncStrategyManager.java:239)
        at com.xxx.manager.KSyncStrategyManager.startDownloadWallpaperJobs(KSyncStrategyManager.java:60)
        at com.xxx.manager.KSyncStrategyManager.startSyncWallpaperListByUserLikeTag(KSyncStrategyManager.java:117)
        ...

源码分析

先附上我总结的整个源码分析的逻辑图,以下是分析的过程,可能会有冗长,如果想知道如何设置超时时间,可以直接跳到解决方案。

流程图.jpg

由于我们对 Glide 进行了升级,我对旧版本(3.7.0)和新版本(4.13.0)都进行了分析,两个版本区别不是特别大,以下主要以 4.13.0 来做分析的,最后会把有区别的地方标记在逻辑图中。

我们项目中调用 Glide 的 downloadOnly() 去下载图片:

public static File download(Context context, String url, int timeout) {
    ...

    File file = null;

    try {
        file = Glide.with(context)
                .load(glideUrl)
                .downloadOnly(Target.SIZE_ORIGINAL, Target.SIZE_ORIGINAL)
                .get();

    } catch (Exception e) {
        LogUtils.e(TAG, "img download error. url = " + url, e);
    }
    return file;
}

首先,我的方式是从 Glide.xxx.get() 开始分析,接下来会调用到 RequestFutureTarget 中的 doGet(xxx),其中是获取 resource,没有涉及到 SocketTimeoutException 相关超时异常,所以我们还得往上一层去追。

public class RequestFutureTarget<R> implements FutureTarget<R>, RequestListener<R> {
  ...

  @Override
  public R get() throws InterruptedException, ExecutionException {
    try {
      return doGet(null);
    } catch (TimeoutException e) {
      throw new AssertionError(e);
    }
  }

  ...

  private synchronized R doGet(Long timeoutMillis) throws ExecutionException, InterruptedException, TimeoutException {

    if (assertBackgroundThread && !isDone()) {
      Util.assertBackgroundThread();
    }

    if (isCancelled) {
      throw new CancellationException();
    } else if (loadFailed) {
      throw new ExecutionException(exception);
    } else if (resultReceived) {
      return resource;
    }

    // 省略了异常判断
    ...

    return resource;
  }
}

紧接着就开始分析 submit(xxx),它会调用到 RequestBuilder 中 submit(xxx),最终调用到 4 个参数的 into(xxx),通过一步一步分析,其中调用另一个类 RequestManager.track(xxx),最终会调用到 RequestTracker.runRequest(xxx),从方法名看感觉像是开始执行请求操作了。

public class RequestBuilder<TranscodeType> extends BaseRequestOptions<RequestBuilder<TranscodeType>> implements Cloneable, ModelTypes<RequestBuilder<TranscodeType>> {
    ...
    
    public FutureTarget<TranscodeType> submit(int width, int height) {
        final RequestFutureTarget<TranscodeType> target = new RequestFutureTarget<>(width, height);
        return into(target, target, Executors.directExecutor());
    }

    private <Y extends Target<TranscodeType>> Y into(
            @NonNull Y target,
            @Nullable RequestListener<TranscodeType> targetListener,
            BaseRequestOptions<?> options,
            Executor callbackExecutor) {
        ...

        Request request = buildRequest(target, targetListener, options, callbackExecutor);
        ...

        requestManager.clear(target);
        target.setRequest(request);
        requestManager.track(target, request);

        return target;
    }
}

public class RequestManager implements ComponentCallbacks2, LifecycleListener, ModelTypes<RequestBuilder<Drawable>> {
      ...
  
      synchronized void track(@NonNull Target<?> target, @NonNull Request request) {
        targetTracker.track(target);
        requestTracker.runRequest(request);
      }
}

接下来就跟到了 request.begin(),继续往 begin() 中跟进,

public class RequestTracker implements ComponentCallbacks2, LifecycleListener, ModelTypes<RequestBuilder<Drawable>> {
      ...

      public void runRequest(@NonNull Request request) {
        requests.add(request);
        if (!isPaused) {
          request.begin();
        } else {
          request.clear();
          if (Log.isLoggable(TAG, Log.VERBOSE)) {
            Log.v(TAG, "Paused, delaying request");
          }
          pendingRequests.add(request);
        }
      }
}

可以看出 Request 是一个接口,那么我们需要找到它的实现类,所以需要继续回溯来找到对应 Request 对象的实现类。

public interface Request {
  /** Starts an asynchronous load. */
  void begin();

  ...
}

我们可以从 RequestBuilder 中看到 buildRequest() 创建的 Request 对象,它会先调用 buildRequestRecursive(xxx),返回了 mainRequest 和 errorRequest,从命名上可以看出 mainRequest 更像是正常成功逻辑返回的 Request,继续往里跟进就到了 buildThumbnailRequestRecursive() 中, 可以看出其中调用了obtainRequest(), 最终调用 SingleRequest.obtain(xxx), 它返回的就是 SingleRequest, 这样就找到了我们想要的 Request。

public class RequestBuilder<TranscodeType> extends BaseRequestOptions<RequestBuilder<TranscodeType>>implements Cloneable, ModelTypes<RequestBuilder<TranscodeType>> {
    ...

    private Request buildRequest(...) {
        return buildRequestRecursive(...);
    }
    
    private Request buildRequestRecursive(...) {
        ...
        
        // Success request
        Request mainRequest =
            buildThumbnailRequestRecursive(
            requestLock,
            target,
            targetListener,
            parentCoordinator,
            transitionOptions,
            priority,
            overrideWidth,
            overrideHeight,
            requestOptions,
            callbackExecutor);

        if (errorRequestCoordinator == null) {
          return mainRequest;
        }

        ...

        // Error request
        Request errorRequest =
            errorBuilder.buildRequestRecursive(
                requestLock,
                target,
                targetListener,
                errorRequestCoordinator,
                errorBuilder.transitionOptions,
                errorBuilder.getPriority(),
                errorOverrideWidth,
                errorOverrideHeight,
                errorBuilder,
                callbackExecutor);

        errorRequestCoordinator.setRequests(mainRequest, errorRequest);
        return errorRequestCoordinator;
    }

    private buildThumbnailRequestRecursive(...) {
    
        if (thumbnailBuilder != null) {
          ...

          ThumbnailRequestCoordinator coordinator = new ThumbnailRequestCoordinator(requestLock, parentCoordinator);
          Request fullRequest = obtainRequest(...);

          ...
          return coordinator;
        } else if (thumbSizeMultiplier != null) {
          // Base case: thumbnail multiplier generates a thumbnail request, but cannot recurse.
         ThumbnailRequestCoordinator coordinator =
              new ThumbnailRequestCoordinator(requestLock, parentCoordinator);
          Request fullRequest = obtainRequest(...);

          ...

          Request thumbnailRequest = obtainRequest(xxx);
          coordinator.setRequests(fullRequest, thumbnailRequest);
          
          return coordinator;
        } else {
          // Base case: no thumbnail.
          return obtainRequest(...);
        }
    }
    
    private Request obtainRequest(...) {
      return SingleRequest. obtain (...); 
    }
}

赶紧来了解 SingleRequest 中对应的 begin() 干了什么事,通过分析可以看出,调用到了onSizeReady(xxx),然后可以看出其中调用的 Engine.load(xxx),从名字看,感觉我们离真正的请求越来越近了。

public final class SingleRequest<R> implements Request, SizeReadyCallback, ResourceCallback {
    ...

    @Override
    public void begin() {   
        ...

        if (Util.isValidDimensions(overrideWidth, overrideHeight)) {
          onSizeReady(overrideWidth, overrideHeight);
        } else {
          target.getSize(this);
        }
        
        ...
    }
    
    @Override
    public void onSizeReady(int width, int height) {
        ...
        
        loadStatus =
            engine.load(
                glideContext,
                model,
                requestOptions.getSignature(),
                this.width,
                this.height,
                requestOptions.getResourceClass(),
                transcodeClass,
                priority,
                requestOptions.getDiskCacheStrategy(),
                requestOptions.getTransformations(),
                requestOptions.isTransformationRequired(),
                requestOptions.isScaleOnlyOrNoTransform(),
                requestOptions.getOptions(),
                requestOptions.isMemoryCacheable(),
                requestOptions.getUseUnlimitedSourceGeneratorsPool(),
                requestOptions.getUseAnimationPool(),
                requestOptions.getOnlyRetrieveFromCache(),
                this,
                callbackExecutor);
        ...
      }
    }
}

从逻辑可以看出,如果 memoryResource = null(也就是内存缓存数据为空),就会去请求网络数据;然后就调用到 waitForExistingOrStartNewJob(xxx) 中,然后其中关键代码就是 engineJob.start(decodeJob),接下来就到了 EngineJob 中。

public class Engine implements EngineJobListener, MemoryCache.ResourceRemovedListener, EngineResource.ResourceListener {
    ...

    public <R> LoadStatus load(...) {
        ...
        synchronized (this) {
            memoryResource = loadFromMemory(key, isMemoryCacheable, startTime);

            if (memoryResource == null) {
                // 发起网络请求
                return waitForExistingOrStartNewJob(
                    ...
                );
            }
        }
        ...
    }


    private <R> LoadStatus waitForExistingOrStartNewJob(...) {
        ...

        EngineJob<R> engineJob =
        engineJobFactory.build(
            ...
        );

        DecodeJob<R> decodeJob =
        decodeJobFactory.build(
            ...
        );

        jobs.put(key, engineJob);

        engineJob.addCallback(cb, callbackExecutor);
        engineJob.start(decodeJob);

        ...
    }
}

从下面代码可以看出 EngineJob.start(xxx) 干的事非常简单,利用ExecuteService.execute(xxx) 启动线程池来执行异步任务,很明显 DecodeJob 肯定实现了 Runnable,所以就需要去 DecodeJob 中查看 run() 的具体逻辑。

class EngineJob<R> implements DecodeJob.Callback<R>, Poolable {
    ...

    public synchronized void start(DecodeJob<R> decodeJob) {
        this.decodeJob = decodeJob;
        GlideExecutor executor = decodeJob.willDecodeFromCache() ? diskCacheExecutor : getActiveSourceExecutor();
        executor.execute(decodeJob);
    }
    ...
}

经过梳理可以看出 DecodeJob 核心代码就只有 runWrapped(),而 runWrapped() 中逻辑有了分支,然后对其中方法进行查看分析,最终可以确认 runGenerators() 是主要逻辑,其中 while 循环体中只是做了赋值操作,肯定还有其他地方执行了网络的数据获取,最终分析找到了currentGenerator.startNext(), 而 currentGenerator 对象的类型是 DataFetcherGenerator,从命名看太像是做网络数据的获取类了,所以我们得找到其实现类,而 currentGenerator 是由 getNextGenerator() 返回 , 从中根据 case 命名分析 SourceGenerator(xxx) 更像和网络请求相关的逻辑。

class DecodeJob<R> implements DataFetcherGenerator.FetcherReadyCallback, Runnable,Comparable<DecodeJob<?>>,Poolable {

    public void run() {
        ...

        try {
            if (isCancelled) {
                notifyFailed();
                return;
            }
            runWrapped();
        } catch (CallbackException e) {
        }
        ...
    }

    private void runWrapped() {
        switch (runReason) {
            case INITIALIZE:
            stage = getNextStage(Stage.INITIALIZE);
            currentGenerator = getNextGenerator();
            runGenerators();
            break;
            case SWITCH_TO_SOURCE_SERVICE:
            runGenerators();
            break;
            case DECODE_DATA:
            decodeFromRetrievedData();
            break;
            default:
            throw new IllegalStateException("Unrecognized run reason: " + runReason);
        }
    }

    private void runGenerators() {
        ...
        while (!isCancelled
            && currentGenerator != null
            && !(isStarted = currentGenerator.startNext())) {
            // 赋值  
            stage = getNextStage(stage);
            currentGenerator = getNextGenerator();
            ...
        }
        ...
    }

    private DataFetcherGenerator getNextGenerator() {
        switch (stage) {
            case RESOURCE_CACHE:
            return new ResourceCacheGenerator(decodeHelper, this);
            case DATA_CACHE:
            return new DataCacheGenerator(decodeHelper, this);
            case SOURCE:
            return new SourceGenerator(decodeHelper, this);
            case FINISHED:
            return null;
            default:
            throw new IllegalStateException("Unrecognized stage: " + stage);
        }
    }
}

进入到 SourceGenerator中,查看 startNext(),可以看到其中有 loadData.fetcher.loadData(xxx),感觉和网络请求很相似了。

class SourceGenerator implements DataFetcherGenerator, DataFetcherGenerator.FetcherReadyCallback {
    ...

    public boolean startNext() {
      ...
        while (!started && hasNextModelLoader()) {
            loadData = helper.getLoadData().get(loadDataListIndex++);
            if (loadData != null
                    && (helper.getDiskCacheStrategy().isDataCacheable(loadData.fetcher.getDataSource())
                    || helper.hasLoadPath(loadData.fetcher.getDataClass()))) {
                started = true;
                startNextLoad(loadData);
            }
        }
        return started;
    }

    private void startNextLoad(final LoadData<?> toStart) {
        // 请求数据
        loadData.fetcher.loadData(
                helper.getPriority(),
                new DataCallback<Object>() {
                    @Override
                    public void onDataReady(@Nullable Object data) {
                        if (isCurrentRequest(toStart)) {
                            onDataReadyInternal(toStart, data);
                        }
                    }

                    @Override
                    public void onLoadFailed(@NonNull Exception e) {
                        if (isCurrentRequest(toStart)) {
                            onLoadFailedInternal(toStart, e);
                        }
                    }
                });
    }
}

从代码中可以看出 loadData.fetcher 是 DataFetcher 接口类型,按正常逻辑就得回退到 loadData 查看它的逻辑,如果没有具体的实现,就得再一级一级回退去寻找,目前我尝试了下,发现很容易就陷入了“无限循环”中,根据个人经验,我推荐两个方法:

  1. 直接查看 DataFetcher 接口类型,如果实现类少,可以很快判断出;通过快捷方式可以看出实现该接口的类很多,这时候更多凭经验可以看出 HttpUrlFetcher 更像是常规请求方式。

2.【强烈推荐】利用 Debug 调试,现在已经离“真相”非常近了,我们可以直接 Debug 打上断点,非常容易可以看出其实现类为 HttpUrlFetcher。

以下是 HttpUrlFetcher 中核心请求设置超时的代码,不同版本实现略有不同,但可以看出 timeout 默认为 2500ms,且在旧版本中是无法修改的。

// version = 3.7.0
public class HttpUrlFetcher implements DataFetcher<InputStream> {
    ...

    private InputStream loadDataWithRedirects(
            URL url, int redirects, URL lastUrl, Map<String, String> headers) throws HttpException {
        ...
        urlConnection.setConnectTimeout(2500);
        urlConnection.setReadTimeout(2500);
        urlConnection.setUseCaches(false);
        urlConnection.setDoInput(true);
        
        ...
    }
}


// version = 4.13.0
public class HttpUrlFetcher implements DataFetcher<InputStream> {
    ...

    private HttpURLConnection buildAndConfigureConnection(URL url, Map<String, String> headers) throws HttpException {

        HttpURLConnection urlConnection;
          ...

        urlConnection.setConnectTimeout(timeout);
        urlConnection.setReadTimeout(timeout);
        urlConnection.setUseCaches(false);
        urlConnection.setDoInput(true);
          
          ...
        return urlConnection;
    }
}

解决方案

由于图片库版本相对老旧,我们首先将图片库从 3.7.0 升级到了 4.13.0,同时我们项目中目前使用的网络库是 OkHttp 库,所以直接将 Glide 底层网络库切换到 OkHttp。

可以参考官网: muyangmin.github.io/glide-docs-…

添加一个对 OkHttp 集成库的依赖

implementation "com.github.bumptech.glide:okhttp3-integration:4.13.0"

替换底层网络库

@GlideModule
public class EmagGlideModule extends AppGlideModule {
    ...

    @Override
    public void registerComponents(@NonNull Context context, @NonNull Glide glide, @NonNull Registry registry) {
        registry.replace(GlideUrl.class, InputStream.class, new OkHttpUrlLoader.Factory());
    }
}

有的同学可能会疑惑,这就完呢?也没见你设置超时时间,不要急,接下来继续分析。 从 OkHttpUrlLoader 可以看出其中可以自定义 OkHttpClient 对象传入,也可以使用默认的,即 new OkHttpClient(),接下来我们就看 OkHttpClient 默认实现是如何设置超时时间的。

public class OkHttpUrlLoader implements ModelLoader<GlideUrl, InputStream> {
  ...

    @SuppressWarnings("WeakerAccess")
    public static class Factory implements ModelLoaderFactory<GlideUrl, InputStream> {
        private static volatile Call.Factory internalClient;
        private final Call.Factory client;

        private static Call.Factory getInternalClient() {
            if (internalClient == null) {
                synchronized (Factory.class) {
                    if (internalClient == null) {
                        internalClient = new OkHttpClient();
                    }
                }
            }
            return internalClient;
        }

        // 默认使用 internalClient
        public Factory() {
            this(getInternalClient());
        }

        // 自定义 OkHttpClient
        public Factory(@NonNull Call.Factory client) {
            this.client = client;
        }

        @NonNull
        @Override
        public ModelLoader<GlideUrl, InputStream> build(MultiModelLoaderFactory multiFactory) {
            return new OkHttpUrlLoader(client);
        }
    
    ...
    }
}

OkHttpClient 使用 Kotlin 实现的,可以看出其默认实现中 connectTimeout 为 10s,我认为该超时时间对于我们业务相对合理,也就没有自行实现 OkHttpClient 去设置超时时间了,如果你的业务有需要可自行实现。

open class OkHttpClient internal constructor(builder: Builder) : Cloneable, Call.Factory, WebSocket.Factory {
        ...

        // 默认实现
        constructor() : this(Builder())

   class Builder constructor() {
        ...
        internal var callTimeout = 0
        internal var connectTimeout = 10_000
        internal var readTimeout = 10_000
        internal var writeTimeout = 10_000
        ...
    }
}