我正在参加Trae「超级体验官」创意实践征文, 本文所使用的 Trae 免费下载链接: www.trae.ai/?utm_source…
新官上任三把火,美国新总统特朗普也不能免俗,每天会在用WordPress搭建的白宫官网,更新N篇最新签署的行政令,比如1月22日就更新了8篇,堪比他发推文的数量
今天我们就用字节新发布的Trae IDE来实现一次拿到所有签署的行政令,关注懂王实事动态或满足好奇心
下载安装
通过文章顶部链接下载Trae IDE安装包,目前支持macOS,且需要自行解决科学上网问题,PAC的user-rule规则需要新增*.trae.ai,然后就可以正常登录IDE了
需求实现
这里说个前提,我是后端开发,所以我只想先体验Trae能不能辅助我更高效的编程,切实提高实际项目开发效率
创建项目和Python虚拟环境
其实就是打开一个新的文件夹就可以,用vscode的同学,估计非常麻溜了
然后打开终端用uv创建Python虚拟环境
uv init .
uv vent --python 3.12.8
PS: 其实完全不懂也没关系,右侧的chat直接问
发需求提示词
我的提示词大概如下
我需要获取这个网页 www.whitehouse.gov/presidentia… 的所有文章列表,且标题需要翻译成中文
其中一个文章的HTML代码是
<div class="wp-block-group wp-block-whitehouse-post-template__content has-global-padding is-layout-constrained wp-container-core-group-is-layout-6 wp-block-group-is-layout-constrained">
<h2 class="wp-block-post-title has-heading-4-font-size"><a href="https://www.whitehouse.gov/presidential-actions/2025/01/executive-grant-of-clemency-for-terence-sutton/" target="_self">Executive Grant of Clemency for Terence Sutton</a></h2>
<div class="wp-block-group wp-block-whitehouse-post-template__meta is-nowrap is-layout-flex wp-container-core-group-is-layout-5 wp-block-group-is-layout-flex"><div class="taxonomy-category wp-block-post-terms"><a href="https://www.whitehouse.gov/presidential-actions/" rel="tag">Presidential Actions</a></div>
<div class="wp-block-post-date"><time datetime="2025-01-22T18:19:33-05:00">January 22, 2025</time></div></div>
</div>
记得分页获取所有文章列表,这个是分页的HTML代码
<nav class="wp-block-query-pagination is-layout-flex wp-block-query-pagination-is-layout-flex" aria-label="Pagination">
<div class="wp-block-query-pagination-numbers"><span data-wp-key="index-0" aria-current="page" class="page-numbers current">1</span>
<a data-wp-key="index-1" data-wp-on--click="core/query::actions.navigate" class="page-numbers" href="https://www.whitehouse.gov/presidential-actions/page/2/">2</a>
<a data-wp-key="index-2" data-wp-on--click="core/query::actions.navigate" class="page-numbers" href="https://www.whitehouse.gov/presidential-actions/page/3/">3</a>
<span data-wp-key="index-3" class="page-numbers dots">…</span>
<a data-wp-key="index-4" data-wp-on--click="core/query::actions.navigate" class="page-numbers" href="https://www.whitehouse.gov/presidential-actions/page/6/">6</a></div>
<a data-wp-key="query-pagination-next" data-wp-on--click="core/query::actions.navigate" data-wp-on-async--mouseenter="core/query::actions.prefetch" data-wp-watch="core/query::callbacks.prefetch" href="https://www.whitehouse.gov/presidential-actions/page/2/" class="wp-block-query-pagination-next">Next</a>
</nav>
注意,以上HTML结构还是得自行到网站获取拷贝网页元素,需求明确效率会更高
安装依赖和应用实际代码文件
如图
点击运行可以安装依赖
点击应用就可以创建实际可执行代码文件
然后就可以运行代码文件
python whitehouse_scraper.py
实际运行效果,还打印了日志
由于没有明确告知需要把数据存储在哪里,自动存储在了一个txt文件中,如图
分享下完整代码,当然还可以进一步优化代码,把他存到数据库
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from datetime import datetime
import time
import os
import requests
from bs4 import BeautifulSoup
from translate import Translator
def scrape_whitehouse_actions():
try:
# 初始化翻译器和新闻列表
translator = Translator(to_lang='zh')
news_items = []
# 设置请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
# 获取所有页面的文章
page = 1
while True:
# 构建分页 URL
if page == 1:
url = "https://www.whitehouse.gov/presidential-actions/"
else:
url = f"https://www.whitehouse.gov/presidential-actions/page/{page}/"
print(f"正在访问第 {page} 页...")
response = requests.get(url, headers=headers, timeout=30)
# 检查是否到达最后一页
if response.status_code == 404:
print("已到达最后一页")
break
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# 获取当前页的所有文章
articles = soup.find_all('div', class_='wp-block-whitehouse-post-template__content')
if not articles:
print("当前页面未找到文章,可能已到达最后一页")
break
print(f"第 {page} 页找到 {len(articles)} 篇文章")
# 处理当前页的文章
for article in articles:
try:
# 获取标题和链接
title_element = article.find('h2', class_='wp-block-post-title')
if title_element:
link_element = title_element.find('a')
title = link_element.text.strip() if link_element else "无标题"
try:
translated_title = translator.translate(title)
except Exception as e:
print(f"翻译出错: {str(e)}")
translated_title = title
link = link_element.get('href') if link_element else "#"
date_element = article.find('time')
date = date_element.get('datetime') if date_element else "无日期"
print(f"找到文章: {title[:50]}...")
news_items.append({
'title': title,
'translated_title': translated_title,
'link': link,
'date': date
})
except Exception as e:
print(f"处理文章时出错: {str(e)}")
continue
# 检查是否有下一页
next_link = soup.find('a', class_='wp-block-query-pagination-next')
if not next_link:
print("没有下一页了")
break
# 添加延时,避免请求过快
time.sleep(2)
page += 1
# 保存所有文章到文件
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
filename = f'whitehouse_actions_{timestamp}.txt'
with open(filename, 'w', encoding='utf-8') as f:
for item in news_items:
f.write(f"英文标题: {item['title']}\n")
f.write(f"中文标题: {item['translated_title']}\n")
f.write(f"链接: {item['link']}\n")
f.write(f"日期: {item['date']}\n")
f.write('-' * 80 + '\n')
print(f"成功爬取总计 {len(news_items)} 条新闻,已保存到 {filename}")
return news_items
except Exception as e:
print(f"爬取过程中出现错误: {str(e)}")
return None
# 移除 finally 块,因为不再需要关闭 driver
if __name__ == "__main__":
scrape_whitehouse_actions()
总结
整体体验下来,相比于之前的marscode插件等还是丝滑不少
前提是要有明确的需求,能结合产品思维和技术实现构建提示词给Trae最佳
另外Tare对于终端也进行了体验优化,可以快速发送报错信息给Chat窗口,debug效率贼高
好了,今天的分享就到这里,我会继续在实际项目中体验Chat模式,新的项目也会尝试体验Builder模式,欢迎评论区交流,也希望Trae等国产IDE越做越好