浏览器插件功能实现-抓取dom/抓取浏览器响应/处理流式响应实现一个类似于monica的智能体，要求在注入页面内，当检测

一、需求背景

实现一个类似于monica的智能体，要求在注入页面内，当检测到用户点击制定区域后，抓取dom内容，询问智能体，流式返回内容

二、实现方案

1.抓取dom元素

最开始的构想是直接通过click事件，返回点击的dom，这里碰到了第一个问题点：用户点击的区域是一个iframe，对于iframe，需要考虑是否同源，如果是同源则可以通过contentWindow?.document来获取。

直接上代码

 1  useEffect(() => {
 2     const observer = new MutationObserver(() => {
 3       const iframe = document.querySelector(
 4         "iframe#amzv-web"
 5       ) as HTMLIFrameElement | null;
 6 
 7       if (iframe) {
 8         // 等待 iframe 加载完成
 9         iframe.onload = () => {
10           const iframeDoc = iframe.contentWindow?.document;
11 
12           if (iframeDoc) {
13             // 绑定事件监听器只在第一次
14             if (!(iframe as CustomHTMLIFrameElement)._clickEventBound) {
15               iframeDoc.addEventListener(
16                 "click",
17                 (event) => {
18                   const target = event.target as HTMLElement;
19                   console.log("Target:", target);
20 
21                   const elements = iframeDoc.querySelectorAll(
22                     ".right-box>.item>.label"
23                   ) as NodeListOf<HTMLElement>;
24 
25                   console.log("Elements:", elements);
26 
27                   const matchedElement = Array.from(elements).find(
28                     (el) => el.textContent?.trim() === "工单编号"
29                   );
30 
31                   if (matchedElement) {
32                     const nextSibling =
33                       matchedElement.nextElementSibling as HTMLElement | null;
34                     const value =
35                       nextSibling
36                         ?.querySelector(".gb-text-ellipsis")
37                         ?.textContent?.trim() || "";
38                     console.log("Work Order Value:", value);
39                   }
40                 },
41                 true
42               );
43 
44               // 标记事件已绑定
45               (iframe as CustomHTMLIFrameElement)._clickEventBound = true;
46             }
47           }
48         };
49       }
50     });
51 
52     observer.observe(document.body, { childList: true, subtree: true });
53 
54     return () => observer.disconnect(); // Clean up observer on unmount
55   }, []);

这里目前有个问题，第一次获取dom后的点击没法获取数据，这里通过定时器延时获取，我当前dom结构的查询。

export function getValueByLabelFromIframe(
    iframeDoc: Document,
    labelText: string,
    timeout = 1500
): Promise<string | null> {
    return new Promise((resolve) => {
        const tryFind = () => {
            const labelElements = iframeDoc.querySelectorAll(".right-box > .item > .label");
            const matched = Array.from(labelElements).find(
                (el) => el.textContent?.trim() === labelText
            );

            if (matched) {
                const value =
                    matched.nextElementSibling
                        ?.querySelector(".gb-text-ellipsis")
                        ?.textContent?.trim() || null;
                resolve(value);
                return true;
            }
            return false;
        };

        if (tryFind()) return;

        const interval = setInterval(() => {
            if (tryFind()) clearInterval(interval);
        }, 100);

        setTimeout(() => {
            clearInterval(interval);
            resolve(null);
        }, timeout);
    });
}

2.通过抓取接口数据

另一种实现思路是通过抓取接口数据，观察到在某个接口的response内

chrome本身的能力webRequest可以抓取的信息有限，并没有response的内容。当然如果能满足你的需求只使用它能解决是最好的。

这里采取了动态脚本注入的方式，world: 'MAIN' 非常关键的一个配置项，使用后可以让注入的content-script和宿主网页拥有相同的上下文，就可以实现XMR拓展。

代码如下

//background.ts
chrome.action.onClicked.addListener(function (tab) {
  chrome.scripting.executeScript({
    target: { tabId: tab.id as number },
    files: ["inject.js"],
    world: 'MAIN'
  });
});

// // inject.js 放在public文件目录下
(function (xhr) {
  if (XMLHttpRequest.prototype.sayMyName) return;
  console.log("%c>>>>> replace XMLHttpRequest", "color:yellow;background:red");

  var XHR = XMLHttpRequest.prototype;
  XHR.sayMyName = "aqinogbei";

  // 记录原始的 open 和 send 方法
  var open = XHR.open;
  var send = XHR.send;

  XHR.open = function (method, url) {
    this._method = method; // 记录method和url
    this._url = url;
    return open.apply(this, arguments);
  };

  XHR.send = function () {
    console.log("send", this._method, this._url);
    this.addEventListener("load", function (event) {
      console.log('XHR response received:', event.target.responseText); // 捕获响应文本
    });
    return send.apply(this, arguments);
  };
  
})(XMLHttpRequest);

// 捕获所有的 fetch 请求
(function () {
  const originalFetch = window.fetch;
  window.fetch = function (url, options) {
    console.log("fetch request:", url, options);
    return originalFetch(url, options)
      .then(response => {
        response.clone().text().then(body => {
          console.log('fetch response:', body); // 打印响应内容
        });
        return response;
      });
  };
})();
//manifest.json配置
"web_accessible_resources": [
{"resources": ["icons/*", "iconfont*","inject.js"],
"matches": ["<all_urls>"]}
],

看似解决了，但是还有个问题，这个脚本是被注入到网页内的，不是iframe，iframe内的请求还是拿不到。

做到这里的时候，回头又采用了dom抓取的方案，当然顺着这个方向继续排查应该可以有解决方式。

参考：segmentfault.com/a/119000004…

3.接口请求

接口请求参考juejin.cn/post/739693…

我们的接口请求一定会跨域，所有的fetch行为必须要放在background.ts内处理，然后把响应交给content_script.js

4.30更新

后续就是一些常规的流式数据处理，没什么难点。

需求要求实现点击切换其他卡片的时候中止当前stream，直接AbortController解决。同时封装请求，改造成单例。

实现打字特效还是用的每次更新dom，暂时没想出更好的优化点。

5.13更新

关于数据处理，目前把sendmessage方案放弃了，对于sse，使用port更方便且稳定

// portClient.ts
let portInstance: chrome.runtime.Port | null = null;

export const getPort = (portName = "stream-channel") => {
  if (!portInstance) {
    portInstance = chrome.runtime.connect({ name: portName });
  }
  return portInstance;
};

通过port发送消息

port.postMessage(message);

sendMessage使用例

(async () => {
  // 使用 sendMessage 从 Content 发送消息
  const response = await chrome.runtime.sendMessage({greeting: "hello"});
  console.log(response);

  // 使用 onMessage.addListener Content 接收消息
  chrome.runtime.onMessage.addListener(function(request, sender, sendResponse) {
    console.log(sender.tab ? "from a content script:" + sender.tab.url : "from the extension");
    if (request.greeting === "hello") sendResponse({farewell: "goodbye"});
  });


  // 使用 connect 从 Content 发送和接收消息
  var port = chrome.runtime.connect({name: "knockknock"});
  port.postMessage({joke: "Knock knock"});
  port.onMessage.addListener(function(msg) {
    if (msg.question === "Who's there?")
      port.postMessage({answer: "Madame"});
    else if (msg.question === "Madame who?")
      port.postMessage({answer: "Madame... Bovary"});
  });
})();

sendResponse的回调非常关键，如果不写会造成报错关闭channel导致组件重新加载