网页自动化操作实现免费页面工具转API接口

188 阅读3分钟

背景

把开放的工具网站的功能,集成到自己的工作流中,当成API使用,达到免费API接口类似的效果。

思路

接口分析

分析JS文件,抓包网络请求。工作量比较大,接口鉴权不确定性较大,不通用。

网页自动化

使用脚本驱动浏览器,模拟人工操作。 使用AI解析屏幕,自然语言驱动AI操作,实现简单,功能通用,不同的页面,修改自然语言就可以。

案例

文生图

实现步骤

  1. 找资源
    1. 基于Flux.1-Dev的开放工具网页:raphael.app/
    2. 无需登录,使用简单,
    3. 有人机验证
    4. 有广告
    5. 页面截图:image.png
  2. 开源自动化网页操作工具
    1. python实现的浏览器自动化操作框架:github.com/browser-use…

    2. 根据官方文档本地部署,当前测试环境为Mac

      # 安装 uv
      brew install uv
      
      # 初始化python环境
      uv venv --python 3.11
      source .venv/bin/activate
      # 安装依赖
      uv pip install browser-use
      playwright install
      
      
    3. 测试

      1. 配置 .env 文件,设置OPENAI_API_KEY=xxx
      2. 测试代码运行成功(openai需翻墙改IP)
      
      from langchain_openai import ChatOpenAI
      from browser_use import Agent
      from dotenv import load_dotenv
      load_dotenv()
      
      import asyncio
      
      llm = ChatOpenAI(model="gpt-4o")
      
      async def main():
          agent = Agent(
              task="Compare the price of gpt-4o and DeepSeek-V3",
              llm=llm,
          )
          result = await agent.run()
          print(result)
      
      asyncio.run(main())
      
      
    4. 自动化脚本

      1. 用自然语言描述任务给AI执行

      2. 不出意外,在人机验证环节报错了

      3. 验证码处理

        1. dashboard.capsolver.com/dashboard/o…
        2. 注册,申请api-key,购买额度,支付宝扫码支付
        3. 通过插件市场安装capsolver插件到chromium,也可以在browser_use的API中通过代码加载插件,这里为了方便快捷,直接在chromium中安装好插件,设置好api-key,然后手动指定浏览器路径。
        4. 浏览器路径:~/Caches/ms-playwright/chromium-1155/chrome-mac/Chromium.app/Contents/MacOS/Chromium
        5. 手动在浏览器内操作,验证插件正常运行

        image.png 7. 修改脚本代码,指定加载浏览器路径,否则默认加载方式,不带插件

        1. 运行成功,输出了4个可直接下载的图片地址

        image.png 10. 完整脚本内容


from browser_use.browser.browser import Browser
from langchain_openai import ChatOpenAI
from browser_use import Agent
from dotenv import load_dotenv
load_dotenv()

import asyncio
from browser_use import BrowserConfig

llm = ChatOpenAI(model="gpt-4o")

def prompt_text():
    return (
        "Task goal: Use the specified website to generate pictures with text, and return the complete download address of the picture. If an error occurs, the error reason is returned. If it is not completed for more than 2 minutes, the task fails and the failure reason is timeout. If human-machine verification occurs, wait for 20 seconds. If there is no message from human-machine verification after 20 seconds, the task fails and the error reason returned is that human-machine verification failed. If ads appear, please close them."
"Task steps:"
"1. Open the website https://raphael.app/en with a browser."
"2. Enter the red cat prompt word in the description prompt on the website."
"3. Click the Generate button to generate the image and wait for the image generation to be completed. If it takes more than 2 minutes, a failure will be returned. The failure reason is that the image generation timeout occurs."
"4. After the image generation is completed, the complete download addresses of all generated images will be returned."
    )

async def main():
    config = BrowserConfig(
        headless=False,
        disable_security=True,
        chrome_instance_path='~/Library/Caches/ms-playwright/chromium-1155/chrome-mac/Chromium.app/Contents/MacOS/Chromium'
    )

    browser = Browser(config=config)

    agent = Agent(
        task=prompt_text(),
        llm=llm,
        browser=browser
    )
    result = await agent.run()
    print(result)

asyncio.run(main())


//也可以使用deepseek代替OpenAI
async def main_deepseek():
    api_key = os.getenv('DEEPSEEK_API_KEY', '')
    if not api_key:
        raise ValueError('DEEPSEEK_API_KEY is not set')

    llm = ChatOpenAI(
        model='deepseek-chat',
        api_key=SecretStr(api_key),
        base_url='https://api.deepseek.com/v1'
    )

    agent = Agent(
        task=prompt_text(),
        llm=llm,
        browser=browser,
        use_vision=False,
    )
    result = await agent.run()
    print(result)

    await browser.close()

  1. API
    1. 脚本运行的过程,可以封装成任务队列,对接到API的输入输出,实现自动化。