丝滑,用Trae实现监测特朗普签署了哪些行政令

748 阅读4分钟

我正在参加Trae「超级体验官」创意实践征文,  本文所使用的 Trae 免费下载链接:  www.trae.ai/?utm_source…

新官上任三把火,美国新总统特朗普也不能免俗,每天会在用WordPress搭建的白宫官网,更新N篇最新签署的行政令,比如1月22日就更新了8篇,堪比他发推文的数量

今天我们就用字节新发布的Trae IDE来实现一次拿到所有签署的行政令,关注懂王实事动态或满足好奇心

下载安装

通过文章顶部链接下载Trae IDE安装包,目前支持macOS,且需要自行解决科学上网问题,PAC的user-rule规则需要新增*.trae.ai,然后就可以正常登录IDE了

需求实现

这里说个前提,我是后端开发,所以我只想先体验Trae能不能辅助我更高效的编程,切实提高实际项目开发效率

创建项目和Python虚拟环境

其实就是打开一个新的文件夹就可以,用vscode的同学,估计非常麻溜了

然后打开终端用uv创建Python虚拟环境

uv init .
uv vent --python 3.12.8

PS: 其实完全不懂也没关系,右侧的chat直接问

发需求提示词

我的提示词大概如下

我需要获取这个网页 www.whitehouse.gov/presidentia… 的所有文章列表,且标题需要翻译成中文

其中一个文章的HTML代码是

<div class="wp-block-group wp-block-whitehouse-post-template__content has-global-padding is-layout-constrained wp-container-core-group-is-layout-6 wp-block-group-is-layout-constrained">

<h2 class="wp-block-post-title has-heading-4-font-size"><a href="https://www.whitehouse.gov/presidential-actions/2025/01/executive-grant-of-clemency-for-terence-sutton/" target="_self">Executive Grant of Clemency for Terence Sutton</a></h2>


<div class="wp-block-group wp-block-whitehouse-post-template__meta is-nowrap is-layout-flex wp-container-core-group-is-layout-5 wp-block-group-is-layout-flex"><div class="taxonomy-category wp-block-post-terms"><a href="https://www.whitehouse.gov/presidential-actions/" rel="tag">Presidential Actions</a></div>

<div class="wp-block-post-date"><time datetime="2025-01-22T18:19:33-05:00">January 22, 2025</time></div></div>
</div>

记得分页获取所有文章列表,这个是分页的HTML代码

<nav class="wp-block-query-pagination is-layout-flex wp-block-query-pagination-is-layout-flex" aria-label="Pagination">


<div class="wp-block-query-pagination-numbers"><span data-wp-key="index-0" aria-current="page" class="page-numbers current">1</span>
<a data-wp-key="index-1" data-wp-on--click="core/query::actions.navigate" class="page-numbers" href="https://www.whitehouse.gov/presidential-actions/page/2/">2</a>
<a data-wp-key="index-2" data-wp-on--click="core/query::actions.navigate" class="page-numbers" href="https://www.whitehouse.gov/presidential-actions/page/3/">3</a>
<span data-wp-key="index-3" class="page-numbers dots">…</span>
<a data-wp-key="index-4" data-wp-on--click="core/query::actions.navigate" class="page-numbers" href="https://www.whitehouse.gov/presidential-actions/page/6/">6</a></div>

<a data-wp-key="query-pagination-next" data-wp-on--click="core/query::actions.navigate" data-wp-on-async--mouseenter="core/query::actions.prefetch" data-wp-watch="core/query::callbacks.prefetch" href="https://www.whitehouse.gov/presidential-actions/page/2/" class="wp-block-query-pagination-next">Next</a>
</nav>

注意,以上HTML结构还是得自行到网站获取拷贝网页元素,需求明确效率会更高

安装依赖和应用实际代码文件

如图

点击运行可以安装依赖

image.png

点击应用就可以创建实际可执行代码文件

image.png

然后就可以运行代码文件 python whitehouse_scraper.py

实际运行效果,还打印了日志

image.png

由于没有明确告知需要把数据存储在哪里,自动存储在了一个txt文件中,如图

image.png

分享下完整代码,当然还可以进一步优化代码,把他存到数据库

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from datetime import datetime
import time
import os
import requests
from bs4 import BeautifulSoup
from translate import Translator

def scrape_whitehouse_actions():
    try:
        # 初始化翻译器和新闻列表
        translator = Translator(to_lang='zh')
        news_items = []
        
        # 设置请求头
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        }
        
        # 获取所有页面的文章
        page = 1
        while True:
            # 构建分页 URL
            if page == 1:
                url = "https://www.whitehouse.gov/presidential-actions/"
            else:
                url = f"https://www.whitehouse.gov/presidential-actions/page/{page}/"
            
            print(f"正在访问第 {page} 页...")
            response = requests.get(url, headers=headers, timeout=30)
            
            # 检查是否到达最后一页
            if response.status_code == 404:
                print("已到达最后一页")
                break
                
            response.raise_for_status()
            soup = BeautifulSoup(response.text, 'html.parser')
            
            # 获取当前页的所有文章
            articles = soup.find_all('div', class_='wp-block-whitehouse-post-template__content')
            if not articles:
                print("当前页面未找到文章,可能已到达最后一页")
                break
                
            print(f"第 {page} 页找到 {len(articles)} 篇文章")
            
            # 处理当前页的文章
            for article in articles:
                try:
                    # 获取标题和链接
                    title_element = article.find('h2', class_='wp-block-post-title')
                    if title_element:
                        link_element = title_element.find('a')
                        title = link_element.text.strip() if link_element else "无标题"
                        try:
                            translated_title = translator.translate(title)
                        except Exception as e:
                            print(f"翻译出错: {str(e)}")
                            translated_title = title
                        
                        link = link_element.get('href') if link_element else "#"
                    
                        date_element = article.find('time')
                        date = date_element.get('datetime') if date_element else "无日期"
                        
                        print(f"找到文章: {title[:50]}...")
                        
                        news_items.append({
                            'title': title,
                            'translated_title': translated_title,
                            'link': link,
                            'date': date
                        })
                except Exception as e:
                    print(f"处理文章时出错: {str(e)}")
                    continue
            
            # 检查是否有下一页
            next_link = soup.find('a', class_='wp-block-query-pagination-next')
            if not next_link:
                print("没有下一页了")
                break
                
            # 添加延时,避免请求过快
            time.sleep(2)
            page += 1
        
        # 保存所有文章到文件
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        filename = f'whitehouse_actions_{timestamp}.txt'
        
        with open(filename, 'w', encoding='utf-8') as f:
            for item in news_items:
                f.write(f"英文标题: {item['title']}\n")
                f.write(f"中文标题: {item['translated_title']}\n")
                f.write(f"链接: {item['link']}\n")
                f.write(f"日期: {item['date']}\n")
                f.write('-' * 80 + '\n')
        
        print(f"成功爬取总计 {len(news_items)} 条新闻,已保存到 {filename}")
        return news_items
        
    except Exception as e:
        print(f"爬取过程中出现错误: {str(e)}")
        return None
    
    # 移除 finally 块,因为不再需要关闭 driver

if __name__ == "__main__":
    scrape_whitehouse_actions()

总结

整体体验下来,相比于之前的marscode插件等还是丝滑不少

前提是要有明确的需求,能结合产品思维和技术实现构建提示词给Trae最佳

另外Tare对于终端也进行了体验优化,可以快速发送报错信息给Chat窗口,debug效率贼高

image.png

好了,今天的分享就到这里,我会继续在实际项目中体验Chat模式,新的项目也会尝试体验Builder模式,欢迎评论区交流,也希望Trae等国产IDE越做越好