背景

在一个新的业务方案实施过程中，发现数据上存在较大的差异，而这个差异是 WKWebView 的应用方式不同带来的。通过手工测试和上层代码能模糊的解释一些现象，但想要铁板钉钉的证明这些现象就得从 WebKit 源码去分析，便于将来准确的决策这些场景是对齐还是变更策略，或许还能从技术角度发现一些优化点从而反哺业务。

现在面临两个问题：

WebKit 的常规历史栈缓存策略是怎样的？
WebKit 在跨域、重定向等场景下，历史栈缓存策略有怎样的变化？

其中，第 2 点是比较诡异的，不看 WebKit 源码的情况下难以找到规律，下不了定论。

涉及 WebKit 基础概念

App 内 WKWebView 运行时有三种进程协同工作：UIProcess 进程、WebContent 进程、Networking 进程。

WebContent 进程

网页 DOM 及 JS 所处进程。进程数量可能有多个，取决于一些细节策略。

在该进程初始化时会创建唯一的 WebProcess 实例，并且作为 IPC::Connection 的 client，与其它进程通信的代理，需要关注的内存结构：

-m_frameMap(WebFrame)（树结构，在创建 WebPage 时创建）
-m_pageMap(WebPage)（UIProcess 进程创建 WebPageProxy 时 IPC 通知过来创建）
    -m_mainFrame(WebFrame)

UIProcess 进程

应用程序对应的进程。

初始化 WKWebView 时，需关注的内存结构：

-_processPool(WebProcessPool) 
    -m_processes (WebProcessProxy 数组)
-_page(WebPageProxy，通过 WebProcessProxy 实例创建，WKWebView 实例唯一一个)
    -m_process(WebProcessProxy，会动态切换)

初始化后，WebPageProxy 做为了 IPC::Connection 的 client，与其它进程通信的代理。

WebPageProxy / WebProcessProxy 分别对应了 WebContent 进程的 WebPage / WebProcess。

WebProcessPool（关联 WKWebViewConfiguration 的 WKProcessPool 对象）抽象了 WebContent 进程池，也就是说一个 WKWebView 是可以对应多个 WebContent 进程。

Networking 进程

负责网络相关处理，创建多个 WKWebView 也仅只有一个进程，本文不关注该进程。

历史栈缓存策略简述

WKWebView 可以通过goBack/goForward接口进行历史栈的切换，切换时有一套缓存策略，命中时能省去请求网络的时间。

WebContent 进程的 BackForwardCache 是一个单例，管理着历史栈缓存。

UIProcess 进程的 WebProcessPool 抽象了 WebContent 进程池，每一个 WebProcessPool 都有唯一的 WebBackForwardCache 表示历史栈缓存，对应着 WebContent 进程池子里的各个 BackForwardCache 单例。

BackForwardCache 用了一个有序 hash 表存储缓存元素，并设定了最大缓存数量：

ListHashSet<RefPtr<HistoryItem>> m_items;
unsigned m_maxSize {0};

缓存淘汰策略

BackForwardCache 和 WebBackForwardCache 的策略基本一致，现以 BackForwardCache 为例说明。

WebContent 进程在切换页面时，会将当前页面通过BackForwardCache::singleton().addIfCacheable(...);添加缓存：

bool BackForwardCache::addIfCacheable(HistoryItem& item, Page* page) {
    ...
    item.setCachedPage(makeUnique<CachedPage>(*page));
    item.m_pruningReason = PruningReason::None;
    m_items.add(&item);
    ...
}

最大缓存数量具体代码如下：

namespace WebKit {
    void calculateMemoryCacheSizes(...) {
        uint64_t memorySize = ramSize() / MB; 
        ...
        // back/forward cache capacity (in pages)
        if (memorySize >= 512)
            backForwardCacheCapacity = 2;
        else if (memorySize >= 256)
            backForwardCacheCapacity = 1;
        else
            backForwardCacheCapacity = 0;
        ...
    }
...

基本可以认为 iPhone 上一个 WebContent 进程最多两个历史栈缓存。

在历史栈缓存发生变化的地方，都会命中一个修剪逻辑：

void BackForwardCache::prune(PruningReason pruningReason) {
    while (pageCount() > maxSize()) {
        auto oldestItem = m_items.takeFirst();
        oldestItem->setCachedPage(nullptr);
        oldestItem->m_pruningReason = pruningReason;
    }
}

可以看出是实现了一个简单的 LRU 淘汰策略。

最大缓存数量

前面说到 WebContent 进程最多两个历史栈缓存，实际上这个缓存数量是 UIProcess 进程决定的。在 UIProcess 进程中，WebProcessPool 初始化 WebBackForwardCache 时会设置最大缓存数量，并且在创建 WebProcessProxy 时通过 IPC 通知到对应的 WebContent 进程去设置 BackForwardCache 的m_maxSize。

WebProcessPool 的 WebBackForwardCache 对应了 WebContent 进程池里每一个的 BackForwardCache 单例，是一个一对多的模式，WebBackForwardCache 在修剪缓存元素析构时会自动触发 IPC 通知到 WebContent 进程去清理对应缓存：

WebBackForwardCacheEntry::~WebBackForwardCacheEntry() {
    if (m_backForwardItemID && !m_suspendedPage) {
        auto& process = this->process();
        process.sendWithAsyncReply(Messages::WebProcess::ClearCachedPage(m_backForwardItemID), [] { });
    }
}

所以缓存最大数量取决于 WebProcessPool 的数量，一个 WebProcessPool 就最多两个历史栈缓存，不管它的进程池有多少个 WebContent。

状态同步

在历史栈缓存状态发生变化时，WebContent 进程会调用notifyChanged()通过 IPC 通知到 UIProcess 进程的对应 WebBackForwardCache 去同步状态：

notifyChanged() 最终调用到：
static void WK2NotifyHistoryItemChanged(HistoryItem& item) {
    WebProcess::singleton().parentProcessConnection()->send(Messages::WebProcessProxy::UpdateBackForwardItem(toBackForwardListItemState(item)), 0);
}

重定向、跨域场景分析

请求数据前决议阶段

WKWebView 在切换页面时，真正发起网络请求或使用缓存之前，会进行一些决议，大家熟知的 WKNavigationDelegate 的-webView:decidePolicyForNavigationAction:decisionHandler:就是在这个流程之中：

void WebPageProxy::decidePolicyForNavigationAction(...) {
    ...
    auto listener = ... {
        ...
        receivedNavigationPolicyDecision(policyAction, navigation.get(), processSwapRequestedByClient, frame, WTFMove(policies), WTFMove(sender));
        ...
    }
    ...
    //这个 m_navigationClient 和上层设置的 WKNavigationDelegate 代理关联，即会调用到 `-webView:decidePolicyForNavigationAction:decisionHandler:
    //上层调用 decisionHandler(WKNavigationActionPolicyAllow) 后，会调用上面的 listener 关联的闭包，执行后续逻辑
    m_navigationClient->decidePolicyForNavigationAction(*this, WTFMove(navigationAction), WTFMove(listener), process->transformHandlesToObjects(userData.object()).get());
    ...
}

重点关注的是后续的这个方法：

void WebPageProxy::receivedNavigationPolicyDecision(...) {
    ...
    //注：这里改写了源码
    Ref<WebProcessProxy>&& processForNavigation = process().processPool().processForNavigation(...);
    ...
    bool shouldProcessSwap = processForNavigation.ptr() != sourceProcess.ptr();
    if (shouldProcessSwap) {
        ...
        continueNavigationInNewProcess(...);
    } 
    ...
}

这里做了一个关键操作是获取 WebProcessProxy，然后判断是否和来源的sourceProcess相同，如果不同则会用另外的 WebProcessProxy 去处理这个 Navigation。当发生了 WebProcessProxy 切换，continueNavigationInNewProcess里面会创建一个 ProvisionalPageProxy 并关联到 WebPageProxy 的 m_provisionalPage 实例变量，标记这里有一次切换 WebProcessProxy 的操作。

processForNavigation内部会决议是否复用 WebProgressProxy，关键代码如下：

void WebProcessPool::processForNavigationInternal(...) {
    ...
    if (!sourceURL.isValid() || !targetURL.isValid() || sourceURL.isEmpty() || sourceURL.protocolIsAbout() || targetRegistrableDomain.matches(sourceURL))
    //域名相同，返回原始的 WebProgressProxy
    return completionHandler(WTFMove(sourceProcess), nullptr, "Navigation is same-site"_s);
    ...
    //域名不同，创建新的 WebProgressProxy 返回
    String reason = "Navigation is cross-site"_s;
    return completionHandler(createNewProcess(), nullptr, reason);
}

targetRegistrableDomain 是targetURL的一级+二级域名，也就是说目标和来源的 URL 允许三级子域名不同时去复用 Process，比如m.sogou.com和www.sogou.com。此时的时机是发起网络请求之前，对该targetURL是否会重定向不得而知，所以这里只和是否跨域有关。

UIProcess 进程中的 WebProgressProxy 对 WebContent 进程的映射，不考虑 WebContent 的复用机制，基本可以认为一个 WebProgressProxy 对应一个进程。如果前后两个页面是两个不同的 WebContent 进程，且没有重定向操作，调用goBack/goForward时也能平滑的切换，并且分别复用到各自 WebContent 进程的历史栈缓存。

页面数据返回阶段

前面提到，如果此次切换页面会切换 WebProgressProxy，WebPageProxy 内部就会创建一个 ProvisionalPageProxy 变量。在切换页面拉取到网络数据或者读取到缓存数据时，会进行提交：

void WebPageProxy::commitProvisionalPage(...) {
    ...
    //尝试缓存当前页面信息
    bool didSuspendPreviousPage = navigation ? suspendCurrentPageIfPossible(...) : false;
    //清理当前页面信息，m_process 就是当前的 WebProcessProxy
    m_process->removeWebPage(...);
    //页面信息切换到新的 m_provisionalPage
    //比如把 WebPageProxy 标识当前 WebProcessProxy 的 m_process 变量设置为 provisionalPage->process()
    swapToProvisionalPage(std::exchange(m_provisionalPage, nullptr));
    ...
}

suspendCurrentPageIfPossible会尝试去缓存当前页面的信息：

bool WebPageProxy::suspendCurrentPageIfPossible(...) {
    ...
    // If the source and the destination back / forward list items are the same, then this is a client-side redirect. In this case,
    // there is no need to suspend the previous page as there will be no way to get back to it.
    if (fromItem && fromItem == m_backForwardList->currentItem()) {
        RELEASE_LOG_IF_ALLOWED(ProcessSwapping, "suspendCurrentPageIfPossible: Not suspending current page for process pid %i because this is a client-side redirect", m_process->processIdentifier());
        return false;
    }
    ...
    //创建 SuspendedPageProxy 变量，此时 m_suspendedPageCount 的值会加一
    auto suspendedPage = makeUnique<SuspendedPageProxy>(*this, m_process.copyRef(), *mainFrameID, shouldDelayClosingUntilFirstLayerFlush);
    m_lastSuspendedPage = makeWeakPtr(*suspendedPage);
    ...
    //添加进历史栈缓存
    backForwardCache().addEntry(*fromItem, WTFMove(suspendedPage));
    ...
}

可以看到源码中的注释，在发生了client-side redirect时，即客户端重定向，会立即返回，并不会走到后面的添加历史栈缓存逻辑。而如果是服务器重定向，在 Networking 进程就会处理，这里其实并未感知到，所以就和常规的页面切换一样会把页面加入历史栈缓存。

看看更多的处理代码，发现若没有走到这个方法后面的逻辑让m_suspendedPageCount计数加一，commitProvisionalPage函数里面m_process->removeWebPage(...)会调用到如下逻辑：

void WebProcessProxy::shutDown() {
    ...
    //m_processPool 是装有 WebProcessProxy 集合的 WebProcessPool
    m_processPool->disconnectProcess(this);
    ...
}
void WebProcessPool::disconnectProcess(WebProcessProxy* process) {
    ...
    //这里就会清理掉 m_backForwardCache 里面和当前 process 关联的历史栈缓存了
    //m_backForwardCache 是 WebBackForwardCache 类型，一个 WebProcessPool 唯一一个
    m_backForwardCache->removeEntriesForProcess(*process);
    ...
}

它会清理当前 WebProcessProxy 的所有历史栈缓存，而不会影响到其它 WebProcessPool 或 WebProcessProxy。

如何理解`client-side redirect`？

判断代码很简单：

fromItem && fromItem == m_backForwardList->currentItem()

走到这段逻辑的前提是切换页面时切换了 WebProgressProxy，那目标 URL 就得跨域，比如从www.a.com到www.b.com，到这里表现如下：

fromItem : www.a.com
currentItem : www.b.com

那何时才能让两者相等？推测可能是fromItem被强制更改，考虑到 JS window.location对象的replace()函数有较大嫌疑，测试在www.a.com页面执行window.location.replace('www.b.com')，果不其然复现了两者相等的场景。

这么一看 WebKit 的处理似乎是合理的，因为replace()前的页面已经回不去了，但不知为何直接简单粗暴的干掉replace()前的页面归属的 WebProgressProxy 关联的所有历史栈缓存，可能 WebKit 这部分逻辑有优化空间，后续有空再关注下。

结论

现在可以回答文章开头的疑惑了。

WebKit 的常规历史栈缓存策略是怎样的？

限制最大缓存数量为两个的 LRU 淘汰算法。

WebKit 在跨域、重定向等场景下，历史栈缓存策略有怎样的变化？

WKWebView 切换页面时，发生cross-site + client-side redirect 时会清理当前 WebProgressProxy 关联的所有历史栈缓存，后续切换到这些历史栈时都需要重新请求网络。

这种场景用户切历史栈时重新拉取网络，一般会卡住好几秒，所以理论上应该避免这种现象发生，尽量利用 WebKit 的缓存机制提高用户体验。给 Web 开发同学的建议就是，在跨域场景尽量避免使用window.location.replace()去重定向页面，可以使用服务器重定向，或者前置页面旁路上报等方案替代。

另外注意的是，触发这种场景后，会让历史栈访问量增加，所以在服务访问量相关指标数据分析层面这是一个值得关注的重要变量。

WebKit 历史栈缓存策略探索

背景