22.Python的自动化与脚本：提升效率的实战技巧@[toc] Python的自动化与脚本：提升效率的实战技巧一、爬

@[toc]

Python的自动化与脚本：提升效率的实战技巧

一、爬虫基础：静态网页数据抓取

原理简介
爬虫通过模拟浏览器发送HTTP请求获取网页HTML代码，再解析其中的结构化数据。requests库负责发送网络请求，BeautifulSoup则像“HTML解析器”，通过标签名、属性等提取目标数据。

实战：新闻头条抓取

import requests  
from bs4 import BeautifulSoup  

def fetch_headlines(url):  
    headers = {'User-Agent': 'Mozilla/5.0'}  # 伪装浏览器请求头  
    response = requests.get(url, headers=headers)  
    soup = BeautifulSoup(response.text, 'html.parser')  
    headlines = soup.find_all('h2', class_='headline')  # 根据网页结构调整选择器  
    return [h.text.strip() for h in headlines[:5]]  

# 示例：抓取某某头条  
print(fetch_headlines("https://news.test.com"))

关键点：

添加User-Agent绕过基础反爬机制；
使用find_all()定位标签，需通过浏览器开发者工具确认目标元素的CSS选择器；
若网站加载JavaScript动态内容，需改用Selenium（见第三部分）。

二、文件批量处理：自动重命名与格式管理 ⭐️

原理简介
os模块提供操作系统接口（路径操作、文件列表获取），shutil实现文件复制/移动/删除。自动化核心是：

遍历目录（os.listdir()）；
分析文件名/扩展名（os.path.splitext()）；
执行批量操作（shutil.move(), os.rename()）。

实战1：按类型整理下载文件夹

import os  
import shutil  

def organize_downloads(folder_path):  
    for filename in os.listdir(folder_path):  
        file_path = os.path.join(folder_path, filename)  
        if os.path.isfile(file_path):  
            ext = filename.split('.')[-1].lower()  # 获取扩展名  
            target_dir = os.path.join(folder_path, ext + "_Files")  
            os.makedirs(target_dir, exist_ok=True)  # 不存在则创建目录  
            shutil.move(file_path, os.path.join(target_dir, filename))  

# 示例：整理Downloads文件夹  
organize_downloads("C:/Users/kwz/Downloads")

实战2：批量重命名图片

import os  

def rename_images(folder_path, prefix):  
    for idx, filename in enumerate(os.listdir(folder_path)):  
        if filename.lower().endswith(('.png', '.jpg')):  
            ext = os.path.splitext(filename)[1]  
            new_name = f"{prefix}_{idx+1}{ext}"  
            os.rename(  
                os.path.join(folder_path, filename),  
                os.path.join(folder_path, new_name)  
            )  

# 示例：将图片重命名为vacation_1.jpg, vacation_2.png...  
rename_images("C:/VacationPhotos", "vacation")

三、Web自动化实战：Selenium爬取动态网页与邮件通知

原理简介
动态网页通过JavaScript加载数据，传统爬虫无法获取渲染后内容。Selenium通过驱动真实浏览器（如Chrome）模拟点击、滚动等操作，获取完整页面。结合smtplib可实现结果邮件通知。

实战：监控动态价格并邮件报警

from selenium import webdriver  
from bs4 import BeautifulSoup  
import smtplib  
from email.mime.text import MIMEText  

def track_price(url, target_price):  
    driver = webdriver.Chrome()  
    driver.get(url)  
    soup = BeautifulSoup(driver.page_source, 'html.parser')  
    price_element = soup.find("span", class_="product-price")  
    current_price = float(price_element.text.strip('$'))  
    driver.quit()  

    if current_price < target_price:  
        send_email_alert(current_price)  

def send_email_alert(price):  
    sender = "zhangsan@gmail.com"  
    receiver = "lisi@example.com"  
    password = "app_password"  # 使用Gmail需生成应用专用密码  

    msg = MIMEText(f"Price dropped to ${price}! Check now.")  
    msg['Subject'] = "PRICE ALERT"  
    msg['From'] = sender  
    msg['To'] = receiver  

    with smtplib.SMTP('smtp.gmail.com', 587) as server:  
        server.starttls()  
        server.login(sender, password)  
        server.send_message(msg)  

# 示例：监控商品价格  
track_price("https://test.com/product", 99.99)

避坑指南：

安装对应浏览器驱动（如ChromeDriver）并添加至系统PATH；
动态元素加载需添加等待时间（WebDriverWait）；
邮件密码勿直接写入代码，可使用环境变量存储。

下期预告：23. Python的项目协作与部署

虚拟环境管理：venv与pipenv隔离依赖
协作工具链：Git版本控制 + requirements.txt依赖冻结
部署策略：Docker容器化 vs. Serverless无服务架构
CI/CD实战：GitHub Actions自动测试与发布

通过本章学习，你已掌握用Python简化重复任务的三大核心技能。记住：自动化不是取代思考，而是将精力从机械操作转向创造性工作。

更多技术干货欢迎关注微信公众号“科威舟的AI笔记”~

【转载须知】：转载请注明原文出处及作者信息