[Python教程系列-12] 自动化脚本开发：提升工作效率的实用技能引言在日常工作和开发过程中，我们经常会遇到重复性

引言

在日常工作和开发过程中，我们经常会遇到重复性的任务，比如文件处理、数据整理、系统监控、批量操作等。这些任务不仅耗时，而且容易出错。通过编写自动化脚本，我们可以将这些重复性工作交给计算机来完成，从而大大提高工作效率，减少人为错误，并释放更多时间用于创造性的工作。

Python作为一种简洁、易读且功能强大的编程语言，非常适合用于开发各种自动化脚本。Python丰富的标准库和第三方库使得开发自动化脚本变得更加简单和高效。无论是处理文件和目录、操作Excel表格、发送邮件、还是与Web服务交互，Python都能提供相应的解决方案。

在前面的章节中，我们学习了Python的基础语法、数据结构、面向对象编程、文件操作、标准库使用、环境管理、正则表达式以及单元测试等内容。本章将结合这些知识，深入学习如何开发实用的自动化脚本，帮助你在实际工作中提升效率。

学习目标

完成本章学习后，你将能够：

理解自动化脚本开发的基本概念和应用场景
掌握文件和目录操作的自动化处理方法
学会使用argparse模块处理命令行参数
熟悉配置文件的读取和管理
掌握日志记录的最佳实践
了解定时任务和计划任务的实现方法
学会开发实用的系统管理脚本
掌握数据处理和报告生成的自动化方法

核心知识点讲解

1. 自动化脚本概述

自动化脚本是一种用于自动执行特定任务的程序，通常是为了减少重复性手工操作而编写的。一个好的自动化脚本应该具备以下特点：

自动化脚本的特点：

可靠性：能够稳定地执行任务，减少人为错误
可重复性：每次执行都能得到一致的结果
可配置性：支持参数化配置，适应不同需求
可维护性：代码结构清晰，易于修改和扩展
可监控性：提供执行状态和日志记录

2. 命令行参数处理

命令行参数是自动化脚本与用户交互的重要方式，通过命令行参数，用户可以灵活地控制脚本的行为。

argparse模块：

ArgumentParser：参数解析器类
add_argument()：添加参数定义
parse_args()：解析命令行参数
参数类型：位置参数、可选参数、标志参数等

3. 配置文件管理

配置文件用于存储脚本的配置信息，使得脚本更加灵活和易于维护。

配置文件格式：

INI格式：简单的键值对配置
JSON格式：结构化的配置数据
YAML格式：人类友好的数据序列化标准
环境变量：系统级别的配置

4. 日志记录

日志记录是自动化脚本中不可或缺的部分，它帮助我们跟踪脚本的执行过程，诊断问题，并提供审计功能。

logging模块：

日志级别：DEBUG、INFO、WARNING、ERROR、CRITICAL
日志格式：自定义日志输出格式
日志处理器：控制日志输出目标（文件、控制台等）
日志轮转：自动管理日志文件大小

5. 文件和目录操作自动化

文件和目录操作是自动化脚本中最常见的任务之一，包括文件的创建、读取、写入、复制、移动、删除等操作。

常用操作：

路径处理：os.path、pathlib模块
文件操作：读写文件、批量处理
目录遍历：递归遍历目录结构
文件监控：监听文件变化

6. 定时任务和计划任务

定时任务允许脚本在特定时间自动执行，这对于定期数据处理、系统监控等场景非常有用。

实现方式：

cron：Linux/Unix系统的定时任务工具
Windows任务计划程序：Windows系统的任务调度工具
APScheduler：Python的高级调度库
Celery：分布式任务队列

7. 系统管理和监控

系统管理脚本用于自动化常见的系统管理任务，如进程管理、服务监控、磁盘空间检查等。

常见任务：

进程管理：启动、停止、监控进程
服务管理：系统服务的启停和状态检查
资源监控：CPU、内存、磁盘使用情况
网络监控：网络连接状态和性能

8. 数据处理和报告生成

自动化脚本经常需要处理数据并生成报告，这涉及到数据的读取、处理、分析和可视化。

处理流程：

数据读取：从文件、数据库、API等获取数据
数据清洗：处理缺失值、异常值等
数据分析：统计分析、聚合计算等
报告生成：生成文本、HTML、PDF等格式的报告

代码示例与实战

实战1：文件批量处理脚本

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
文件批量处理脚本
支持文件重命名、格式转换、批量压缩等功能
"""

import os
import sys
import argparse
import logging
import shutil
from pathlib import Path
from datetime import datetime

class FileProcessor:
    """文件处理器"""
    
    def __init__(self, config):
        self.config = config
        self.setup_logging()
    
    def setup_logging(self):
        """设置日志"""
        log_format = '%(asctime)s - %(levelname)s - %(message)s'
        logging.basicConfig(
            level=logging.INFO,
            format=log_format,
            handlers=[
                logging.FileHandler('file_processor.log'),
                logging.StreamHandler(sys.stdout)
            ]
        )
        self.logger = logging.getLogger(__name__)
    
    def rename_files(self, directory, pattern, replacement):
        """批量重命名文件"""
        directory = Path(directory)
        if not directory.exists():
            self.logger.error(f"目录不存在: {directory}")
            return
        
        renamed_count = 0
        for file_path in directory.iterdir():
            if file_path.is_file():
                old_name = file_path.name
                new_name = old_name.replace(pattern, replacement)
                
                if new_name != old_name:
                    new_path = file_path.parent / new_name
                    try:
                        file_path.rename(new_path)
                        self.logger.info(f"重命名: {old_name} -> {new_name}")
                        renamed_count += 1
                    except Exception as e:
                        self.logger.error(f"重命名失败 {old_name}: {e}")
        
        self.logger.info(f"共重命名 {renamed_count} 个文件")
    
    def convert_encoding(self, directory, from_encoding, to_encoding):
        """批量转换文件编码"""
        directory = Path(directory)
        if not directory.exists():
            self.logger.error(f"目录不存在: {directory}")
            return
        
        converted_count = 0
        for file_path in directory.iterdir():
            if file_path.is_file() and file_path.suffix in ['.txt', '.csv']:
                try:
                    # 读取原文件
                    with open(file_path, 'r', encoding=from_encoding) as f:
                        content = f.read()
                    
                    # 写入新文件
                    with open(file_path, 'w', encoding=to_encoding) as f:
                        f.write(content)
                    
                    self.logger.info(f"转换编码: {file_path.name}")
                    converted_count += 1
                except Exception as e:
                    self.logger.error(f"转换编码失败 {file_path.name}: {e}")
        
        self.logger.info(f"共转换 {converted_count} 个文件")
    
    def backup_files(self, source_dir, backup_dir, file_extension=None):
        """备份文件"""
        source_dir = Path(source_dir)
        backup_dir = Path(backup_dir)
        
        if not source_dir.exists():
            self.logger.error(f"源目录不存在: {source_dir}")
            return
        
        # 创建备份目录
        backup_dir.mkdir(parents=True, exist_ok=True)
        
        backed_up_count = 0
        for file_path in source_dir.rglob('*'):
            if file_path.is_file():
                # 如果指定了文件扩展名，只备份匹配的文件
                if file_extension and file_path.suffix != file_extension:
                    continue
                
                # 计算相对路径
                relative_path = file_path.relative_to(source_dir)
                backup_path = backup_dir / relative_path
                
                # 创建备份目录结构
                backup_path.parent.mkdir(parents=True, exist_ok=True)
                
                try:
                    shutil.copy2(file_path, backup_path)
                    self.logger.info(f"备份文件: {relative_path}")
                    backed_up_count += 1
                except Exception as e:
                    self.logger.error(f"备份失败 {file_path}: {e}")
        
        self.logger.info(f"共备份 {backed_up_count} 个文件")
    
    def clean_empty_dirs(self, directory):
        """清理空目录"""
        directory = Path(directory)
        if not directory.exists():
            self.logger.error(f"目录不存在: {directory}")
            return
        
        removed_count = 0
        for dir_path in sorted(directory.rglob('*'), reverse=True):
            if dir_path.is_dir() and not any(dir_path.iterdir()):
                try:
                    dir_path.rmdir()
                    self.logger.info(f"删除空目录: {dir_path}")
                    removed_count += 1
                except Exception as e:
                    self.logger.error(f"删除目录失败 {dir_path}: {e}")
        
        self.logger.info(f"共删除 {removed_count} 个空目录")

def main():
    """主函数"""
    parser = argparse.ArgumentParser(description='文件批量处理工具')
    parser.add_argument('action', choices=['rename', 'convert', 'backup', 'clean'],
                       help='操作类型')
    parser.add_argument('-d', '--directory', required=True,
                       help='目标目录路径')
    parser.add_argument('-p', '--pattern',
                       help='重命名模式')
    parser.add_argument('-r', '--replacement',
                       help='替换字符串')
    parser.add_argument('--from-encoding',
                       help='源编码')
    parser.add_argument('--to-encoding',
                       help='目标编码')
    parser.add_argument('--backup-dir',
                       help='备份目录')
    parser.add_argument('--extension',
                       help='文件扩展名')
    
    args = parser.parse_args()
    
    # 创建处理器实例
    processor = FileProcessor(args)
    
    # 执行相应操作
    if args.action == 'rename':
        if not args.pattern or not args.replacement:
            parser.error("重命名操作需要 --pattern 和 --replacement 参数")
        processor.rename_files(args.directory, args.pattern, args.replacement)
    
    elif args.action == 'convert':
        if not args.from_encoding or not args.to_encoding:
            parser.error("编码转换需要 --from-encoding 和 --to-encoding 参数")
        processor.convert_encoding(args.directory, args.from_encoding, args.to_encoding)
    
    elif args.action == 'backup':
        if not args.backup_dir:
            parser.error("备份操作需要 --backup-dir 参数")
        processor.backup_files(args.directory, args.backup_dir, args.extension)
    
    elif args.action == 'clean':
        processor.clean_empty_dirs(args.directory)

if __name__ == '__main__':
    main()

实战2：系统监控和告警脚本

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
系统监控脚本
监控系统资源使用情况并在异常时发送告警
"""

import psutil
import time
import smtplib
import logging
import argparse
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from datetime import datetime
import json
import os

class SystemMonitor:
    """系统监控器"""
    
    def __init__(self, config_file='monitor_config.json'):
        self.config = self.load_config(config_file)
        self.setup_logging()
    
    def load_config(self, config_file):
        """加载配置文件"""
        default_config = {
            "thresholds": {
                "cpu_percent": 80,
                "memory_percent": 80,
                "disk_percent": 90
            },
            "email": {
                "smtp_server": "smtp.gmail.com",
                "smtp_port": 587,
                "sender_email": "your_email@gmail.com",
                "sender_password": "your_password",
                "recipient_email": "admin@example.com"
            },
            "check_interval": 60
        }
        
        if os.path.exists(config_file):
            with open(config_file, 'r') as f:
                config = json.load(f)
                # 合并默认配置和用户配置
                for key in default_config:
                    if key not in config:
                        config[key] = default_config[key]
                return config
        else:
            # 创建默认配置文件
            with open(config_file, 'w') as f:
                json.dump(default_config, f, indent=4)
            return default_config
    
    def setup_logging(self):
        """设置日志"""
        log_format = '%(asctime)s - %(levelname)s - %(message)s'
        logging.basicConfig(
            level=logging.INFO,
            format=log_format,
            handlers=[
                logging.FileHandler('system_monitor.log'),
                logging.StreamHandler()
            ]
        )
        self.logger = logging.getLogger(__name__)
    
    def get_system_info(self):
        """获取系统信息"""
        info = {
            'timestamp': datetime.now().isoformat(),
            'cpu_percent': psutil.cpu_percent(interval=1),
            'memory': psutil.virtual_memory()._asdict(),
            'disk': psutil.disk_usage('/')._asdict(),
            'network': psutil.net_io_counters()._asdict()
        }
        return info
    
    def check_thresholds(self, system_info):
        """检查阈值"""
        alerts = []
        
        # CPU使用率检查
        cpu_percent = system_info['cpu_percent']
        if cpu_percent > self.config['thresholds']['cpu_percent']:
            alerts.append(f"CPU使用率过高: {cpu_percent}%")
        
        # 内存使用率检查
        memory_percent = system_info['memory']['percent']
        if memory_percent > self.config['thresholds']['memory_percent']:
            alerts.append(f"内存使用率过高: {memory_percent}%")
        
        # 磁盘使用率检查
        disk_percent = (system_info['disk']['used'] / system_info['disk']['total']) * 100
        if disk_percent > self.config['thresholds']['disk_percent']:
            alerts.append(f"磁盘使用率过高: {disk_percent:.2f}%")
        
        return alerts
    
    def send_alert_email(self, alerts, system_info):
        """发送告警邮件"""
        try:
            # 创建邮件内容
            msg = MIMEMultipart()
            msg['From'] = self.config['email']['sender_email']
            msg['To'] = self.config['email']['recipient_email']
            msg['Subject'] = f"系统告警 - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}"
            
            # 邮件正文
            body = f"""
系统监控告警

时间: {system_info['timestamp']}

告警信息:
{chr(10).join(alerts)}

系统状态:
CPU使用率: {system_info['cpu_percent']}%
内存使用率: {system_info['memory']['percent']}%
磁盘使用率: {(system_info['disk']['used'] / system_info['disk']['total']) * 100:.2f}%

请及时处理!
            """
            
            msg.attach(MIMEText(body, 'plain', 'utf-8'))
            
            # 发送邮件
            server = smtplib.SMTP(self.config['email']['smtp_server'], 
                                self.config['email']['smtp_port'])
            server.starttls()
            server.login(self.config['email']['sender_email'], 
                        self.config['email']['sender_password'])
            server.send_message(msg)
            server.quit()
            
            self.logger.info("告警邮件发送成功")
        except Exception as e:
            self.logger.error(f"发送告警邮件失败: {e}")
    
    def generate_report(self, system_info):
        """生成系统报告"""
        report = f"""
系统状态报告
================
时间: {system_info['timestamp']}

CPU:
  使用率: {system_info['cpu_percent']}%

内存:
  总量: {system_info['memory']['total'] / (1024**3):.2f} GB
  已用: {system_info['memory']['used'] / (1024**3):.2f} GB
  可用: {system_info['memory']['available'] / (1024**3):.2f} GB
  使用率: {system_info['memory']['percent']}%

磁盘:
  总量: {system_info['disk']['total'] / (1024**3):.2f} GB
  已用: {system_info['disk']['used'] / (1024**3):.2f} GB
  可用: {system_info['disk']['free'] / (1024**3):.2f} GB
  使用率: {(system_info['disk']['used'] / system_info['disk']['total']) * 100:.2f}%

网络:
  发送字节: {system_info['network']['bytes_sent'] / (1024**2):.2f} MB
  接收字节: {system_info['network']['bytes_recv'] / (1024**2):.2f} MB
        """
        
        # 保存报告到文件
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        report_file = f"system_report_{timestamp}.txt"
        with open(report_file, 'w', encoding='utf-8') as f:
            f.write(report)
        
        self.logger.info(f"系统报告已保存: {report_file}")
        return report_file
    
    def monitor_loop(self):
        """监控循环"""
        self.logger.info("系统监控启动")
        
        while True:
            try:
                # 获取系统信息
                system_info = self.get_system_info()
                
                # 检查阈值
                alerts = self.check_thresholds(system_info)
                
                # 如果有告警，发送邮件
                if alerts:
                    self.logger.warning("检测到系统异常")
                    for alert in alerts:
                        self.logger.warning(alert)
                    self.send_alert_email(alerts, system_info)
                
                # 记录系统状态
                self.logger.info(
                    f"系统状态 - CPU: {system_info['cpu_percent']}%, "
                    f"内存: {system_info['memory']['percent']}%, "
                    f"磁盘: {(system_info['disk']['used'] / system_info['disk']['total']) * 100:.2f}%"
                )
                
                # 等待下次检查
                time.sleep(self.config['check_interval'])
                
            except KeyboardInterrupt:
                self.logger.info("监控程序被用户中断")
                break
            except Exception as e:
                self.logger.error(f"监控过程中发生错误: {e}")
                time.sleep(self.config['check_interval'])

def main():
    """主函数"""
    parser = argparse.ArgumentParser(description='系统监控工具')
    parser.add_argument('--config', default='monitor_config.json',
                       help='配置文件路径')
    parser.add_argument('--report', action='store_true',
                       help='生成系统报告')
    parser.add_argument('--once', action='store_true',
                       help='只执行一次监控')
    
    args = parser.parse_args()
    
    # 创建监控器实例
    monitor = SystemMonitor(args.config)
    
    if args.report:
        # 生成报告
        system_info = monitor.get_system_info()
        report_file = monitor.generate_report(system_info)
        print(f"报告已生成: {report_file}")
    elif args.once:
        # 执行一次监控
        system_info = monitor.get_system_info()
        alerts = monitor.check_thresholds(system_info)
        if alerts:
            print("检测到告警:")
            for alert in alerts:
                print(f"  - {alert}")
        else:
            print("系统状态正常")
    else:
        # 启动持续监控
        monitor.monitor_loop()

if __name__ == '__main__':
    main()

实战3：数据处理和报告生成脚本

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
数据处理和报告生成脚本
处理CSV数据并生成HTML报告
"""

import csv
import json
import argparse
import logging
from datetime import datetime
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd

class DataProcessor:
    """数据处理器"""
    
    def __init__(self, config):
        self.config = config
        self.setup_logging()
    
    def setup_logging(self):
        """设置日志"""
        log_format = '%(asctime)s - %(levelname)s - %(message)s'
        logging.basicConfig(
            level=logging.INFO,
            format=log_format,
            handlers=[
                logging.FileHandler('data_processor.log'),
                logging.StreamHandler()
            ]
        )
        self.logger = logging.getLogger(__name__)
    
    def load_csv_data(self, file_path):
        """加载CSV数据"""
        try:
            df = pd.read_csv(file_path)
            self.logger.info(f"成功加载数据: {file_path}")
            self.logger.info(f"数据形状: {df.shape}")
            return df
        except Exception as e:
            self.logger.error(f"加载CSV数据失败: {e}")
            return None
    
    def process_sales_data(self, df):
        """处理销售数据"""
        try:
            # 数据清洗
            df = df.dropna()  # 删除空值
            df['date'] = pd.to_datetime(df['date'])  # 转换日期格式
            
            # 计算统计数据
            total_sales = df['amount'].sum()
            avg_sales = df['amount'].mean()
            max_sales = df['amount'].max()
            min_sales = df['amount'].min()
            
            # 按月份统计
            df['month'] = df['date'].dt.to_period('M')
            monthly_sales = df.groupby('month')['amount'].sum()
            
            # 按产品统计
            product_sales = df.groupby('product')['amount'].sum().sort_values(ascending=False)
            
            stats = {
                'total_sales': total_sales,
                'avg_sales': avg_sales,
                'max_sales': max_sales,
                'min_sales': min_sales,
                'monthly_sales': monthly_sales,
                'product_sales': product_sales,
                'record_count': len(df)
            }
            
            self.logger.info("销售数据处理完成")
            return stats
        except Exception as e:
            self.logger.error(f"处理销售数据失败: {e}")
            return None
    
    def generate_charts(self, stats, output_dir):
        """生成图表"""
        try:
            output_dir = Path(output_dir)
            output_dir.mkdir(parents=True, exist_ok=True)
            
            # 月度销售图表
            plt.figure(figsize=(12, 6))
            stats['monthly_sales'].plot(kind='bar')
            plt.title('月度销售额')
            plt.xlabel('月份')
            plt.ylabel('销售额')
            plt.xticks(rotation=45)
            plt.tight_layout()
            monthly_chart = output_dir / 'monthly_sales.png'
            plt.savefig(monthly_chart)
            plt.close()
            
            # 产品销售图表
            plt.figure(figsize=(10, 8))
            stats['product_sales'].head(10).plot(kind='pie', autopct='%1.1f%%')
            plt.title('产品销售额占比 (Top 10)')
            plt.ylabel('')
            plt.tight_layout()
            product_chart = output_dir / 'product_sales.png'
            plt.savefig(product_chart)
            plt.close()
            
            self.logger.info("图表生成完成")
            return str(monthly_chart), str(product_chart)
        except Exception as e:
            self.logger.error(f"生成图表失败: {e}")
            return None, None
    
    def generate_html_report(self, stats, charts, output_file):
        """生成HTML报告"""
        try:
            html_template = """
<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>销售数据分析报告</title>
    <style>
        body {{ font-family: Arial, sans-serif; margin: 20px; }}
        h1, h2 {{ color: #333; }}
        .stats {{ background-color: #f5f5f5; padding: 15px; border-radius: 5px; margin: 20px 0; }}
        .chart {{ margin: 20px 0; text-align: center; }}
        table {{ border-collapse: collapse; width: 100%; margin: 20px 0; }}
        th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
        th {{ background-color: #f2f2f2; }}
    </style>
</head>
<body>
    <h1>销售数据分析报告</h1>
    <p>生成时间: {timestamp}</p>
    
    <div class="stats">
        <h2>总体统计</h2>
        <p><strong>总销售额:</strong> ¥{total_sales:,.2f}</p>
        <p><strong>平均销售额:</strong> ¥{avg_sales:,.2f}</p>
        <p><strong>最高单笔销售:</strong> ¥{max_sales:,.2f}</p>
        <p><strong>最低单笔销售:</strong> ¥{min_sales:,.2f}</p>
        <p><strong>记录总数:</strong> {record_count:,}</p>
    </div>
    
    <div class="chart">
        <h2>月度销售额</h2>
        <img src="{monthly_chart}" alt="月度销售额图表" style="max-width: 100%;">
    </div>
    
    <div class="chart">
        <h2>产品销售额占比</h2>
        <img src="{product_chart}" alt="产品销售额占比图表" style="max-width: 100%;">
    </div>
    
    <h2>产品销售排名</h2>
    <table>
        <thead>
            <tr>
                <th>产品名称</th>
                <th>销售额</th>
                <th>占比</th>
            </tr>
        </thead>
        <tbody>
            {product_rows}
        </tbody>
    </table>
</body>
</html>
            """
            
            # 生成产品表格行
            total_sales = stats['total_sales']
            product_rows = ""
            for product, sales in stats['product_sales'].head(10).items():
                percentage = (sales / total_sales) * 100
                product_rows += f"""
            <tr>
                <td>{product}</td>
                <td>¥{sales:,.2f}</td>
                <td>{percentage:.1f}%</td>
            </tr>
                """
            
            # 填充模板
            html_content = html_template.format(
                timestamp=datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                total_sales=stats['total_sales'],
                avg_sales=stats['avg_sales'],
                max_sales=stats['max_sales'],
                min_sales=stats['min_sales'],
                record_count=stats['record_count'],
                monthly_chart=charts[0],
                product_chart=charts[1],
                product_rows=product_rows
            )
            
            # 保存HTML文件
            with open(output_file, 'w', encoding='utf-8') as f:
                f.write(html_content)
            
            self.logger.info(f"HTML报告生成完成: {output_file}")
            return output_file
        except Exception as e:
            self.logger.error(f"生成HTML报告失败: {e}")
            return None

def create_sample_data(file_path):
    """创建示例数据"""
    import random
    from datetime import datetime, timedelta
    
    products = ['笔记本电脑', '智能手机', '平板电脑', '智能手表', '耳机', '键盘', '鼠标', '显示器']
    start_date = datetime(2023, 1, 1)
    
    with open(file_path, 'w', newline='', encoding='utf-8') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(['date', 'product', 'amount', 'quantity'])
        
        for i in range(1000):
            date = start_date + timedelta(days=random.randint(0, 365))
            product = random.choice(products)
            amount = random.uniform(100, 5000)
            quantity = random.randint(1, 10)
            writer.writerow([date.strftime('%Y-%m-%d'), product, f"{amount:.2f}", quantity])
    
    print(f"示例数据已创建: {file_path}")

def main():
    """主函数"""
    parser = argparse.ArgumentParser(description='数据处理和报告生成工具')
    parser.add_argument('input_file', help='输入CSV文件路径')
    parser.add_argument('-o', '--output', default='report.html',
                       help='输出HTML报告文件路径')
    parser.add_argument('-c', '--charts-dir', default='charts',
                       help='图表输出目录')
    parser.add_argument('--create-sample', action='store_true',
                       help='创建示例数据文件')
    
    args = parser.parse_args()
    
    # 创建示例数据
    if args.create_sample:
        create_sample_data(args.input_file)
        return
    
    # 创建数据处理器实例
    processor = DataProcessor(args)
    
    # 加载数据
    df = processor.load_csv_data(args.input_file)
    if df is None:
        return
    
    # 处理数据
    stats = processor.process_sales_data(df)
    if stats is None:
        return
    
    # 生成图表
    monthly_chart, product_chart = processor.generate_charts(stats, args.charts_dir)
    if monthly_chart is None or product_chart is None:
        return
    
    # 生成报告
    report_file = processor.generate_html_report(
        stats, 
        (monthly_chart, product_chart), 
        args.output
    )
    
    if report_file:
        print(f"报告生成成功: {report_file}")
        print(f"月度图表: {monthly_chart}")
        print(f"产品图表: {product_chart}")

if __name__ == '__main__':
    main()

实战4：定时任务管理脚本

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
定时任务管理脚本
使用APScheduler管理定时任务
"""

import argparse
import logging
import json
import os
from datetime import datetime
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.cron import CronTrigger
import subprocess
import smtplib
from email.mime.text import MIMEText

class TaskManager:
    """任务管理器"""
    
    def __init__(self, config_file='tasks.json'):
        self.config_file = config_file
        self.scheduler = BlockingScheduler()
        self.tasks = self.load_tasks()
        self.setup_logging()
    
    def setup_logging(self):
        """设置日志"""
        log_format = '%(asctime)s - %(levelname)s - %(message)s'
        logging.basicConfig(
            level=logging.INFO,
            format=log_format,
            handlers=[
                logging.FileHandler('task_manager.log'),
                logging.StreamHandler()
            ]
        )
        self.logger = logging.getLogger(__name__)
    
    def load_tasks(self):
        """加载任务配置"""
        default_tasks = {
            "tasks": [
                {
                    "name": "系统备份",
                    "command": "python backup_script.py",
                    "schedule": "0 2 * * *",  # 每天凌晨2点执行
                    "enabled": True,
                    "description": "每日系统备份任务"
                },
                {
                    "name": "数据清理",
                    "command": "python cleanup_script.py",
                    "schedule": "0 3 * * 0",  # 每周日凌晨3点执行
                    "enabled": True,
                    "description": "每周数据清理任务"
                }
            ]
        }
        
        if os.path.exists(self.config_file):
            with open(self.config_file, 'r') as f:
                return json.load(f)
        else:
            with open(self.config_file, 'w') as f:
                json.dump(default_tasks, f, indent=4)
            return default_tasks
    
    def save_tasks(self):
        """保存任务配置"""
        with open(self.config_file, 'w') as f:
            json.dump(self.tasks, f, indent=4)
    
    def add_task(self, name, command, schedule, description="", enabled=True):
        """添加任务"""
        task = {
            "name": name,
            "command": command,
            "schedule": schedule,
            "enabled": enabled,
            "description": description
        }
        
        self.tasks["tasks"].append(task)
        self.save_tasks()
        self.logger.info(f"任务已添加: {name}")
    
    def remove_task(self, name):
        """删除任务"""
        self.tasks["tasks"] = [task for task in self.tasks["tasks"] if task["name"] != name]
        self.save_tasks()
        self.logger.info(f"任务已删除: {name}")
    
    def list_tasks(self):
        """列出所有任务"""
        print("定时任务列表:")
        print("-" * 80)
        for task in self.tasks["tasks"]:
            status = "启用" if task["enabled"] else "禁用"
            print(f"名称: {task['name']}")
            print(f"  命令: {task['command']}")
            print(f"  调度: {task['schedule']}")
            print(f"  状态: {status}")
            print(f"  描述: {task['description']}")
            print("-" * 80)
    
    def execute_task(self, command):
        """执行任务"""
        try:
            self.logger.info(f"开始执行任务: {command}")
            result = subprocess.run(command, shell=True, capture_output=True, text=True)
            
            if result.returncode == 0:
                self.logger.info(f"任务执行成功: {command}")
                self.logger.debug(f"输出: {result.stdout}")
            else:
                self.logger.error(f"任务执行失败: {command}")
                self.logger.error(f"错误: {result.stderr}")
                self.send_error_notification(command, result.stderr)
                
        except Exception as e:
            self.logger.error(f"执行任务时发生异常: {command}")
            self.logger.error(f"异常信息: {e}")
            self.send_error_notification(command, str(e))
    
    def send_error_notification(self, command, error_message):
        """发送错误通知"""
        # 这里可以实现邮件通知或其他通知方式
        self.logger.warning(f"任务执行失败，需要人工处理: {command}")
    
    def schedule_tasks(self):
        """调度任务"""
        for task in self.tasks["tasks"]:
            if task["enabled"]:
                try:
                    trigger = CronTrigger.from_crontab(task["schedule"])
                    self.scheduler.add_job(
                        self.execute_task,
                        trigger,
                        args=[task["command"]],
                        name=task["name"],
                        id=task["name"]
                    )
                    self.logger.info(f"任务已调度: {task['name']} ({task['schedule']})")
                except Exception as e:
                    self.logger.error(f"调度任务失败 {task['name']}: {e}")
    
    def start_scheduler(self):
        """启动调度器"""
        self.schedule_tasks()
        self.logger.info("定时任务调度器启动")
        print("定时任务调度器已启动，按 Ctrl+C 停止")
        
        try:
            self.scheduler.start()
        except KeyboardInterrupt:
            self.logger.info("调度器被用户中断")
            self.scheduler.shutdown()

def main():
    """主函数"""
    parser = argparse.ArgumentParser(description='定时任务管理器')
    parser.add_argument('action', choices=['start', 'list', 'add', 'remove'],
                       help='操作类型')
    parser.add_argument('--name', help='任务名称')
    parser.add_argument('--command', help='执行命令')
    parser.add_argument('--schedule', help='调度时间 (cron格式)')
    parser.add_argument('--description', help='任务描述')
    parser.add_argument('--config', default='tasks.json',
                       help='配置文件路径')
    
    args = parser.parse_args()
    
    # 创建任务管理器实例
    manager = TaskManager(args.config)
    
    if args.action == 'start':
        manager.start_scheduler()
    
    elif args.action == 'list':
        manager.list_tasks()
    
    elif args.action == 'add':
        if not all([args.name, args.command, args.schedule]):
            parser.error("添加任务需要 --name, --command, --schedule 参数")
        manager.add_task(
            args.name, 
            args.command, 
            args.schedule, 
            args.description or ""
        )
        print(f"任务已添加: {args.name}")
    
    elif args.action == 'remove':
        if not args.name:
            parser.error("删除任务需要 --name 参数")
        manager.remove_task(args.name)
        print(f"任务已删除: {args.name}")

if __name__ == '__main__':
    main()

小结与回顾

本章我们深入学习了自动化脚本开发的核心知识和实践方法：

自动化脚本概念：理解了自动化脚本的价值和特点，以及在提高工作效率方面的重要作用。
命令行参数处理：掌握了使用argparse模块处理命令行参数的方法，使脚本更加灵活和用户友好。
配置文件管理：学会了使用不同格式的配置文件来管理脚本配置，提高脚本的可维护性。
日志记录：熟悉了logging模块的使用，掌握了日志记录的最佳实践。
文件和目录操作：掌握了文件批量处理的技术，包括重命名、编码转换、备份等操作。
系统监控：学会了开发系统监控脚本，能够监控系统资源并在异常时发送告警。
数据处理和报告生成：掌握了数据处理和可视化报告生成的技术。
定时任务管理：了解了使用APScheduler管理定时任务的方法。

通过本章的学习和实战练习，你应该已经掌握了开发实用自动化脚本的技能，并能够在实际工作中运用这些技能来提升工作效率。自动化脚本开发是一项非常实用的技能，能够帮助你解决各种重复性任务，释放更多时间用于创造性工作。

练习与挑战

基础练习

开发以下实用脚本：
- 文件同步脚本：定期同步两个目录的内容
- 日志分析脚本：分析Web服务器日志并生成统计报告
- 数据库备份脚本：自动备份数据库并上传到云存储
- 网站监控脚本：定期检查网站可用性并发送状态报告
改进现有的脚本：
- 为文件处理脚本添加更多的文件操作功能
- 为系统监控脚本添加更多的监控指标
- 为数据处理脚本添加更多的数据可视化功能
学习使用以下库：
- schedule：简单的Python作业调度库
- watchdog：监控文件系统事件的库
- fabric：远程执行和部署工具

进阶挑战

开发一个完整的自动化运维平台：
- 支持多种类型的自动化任务
- 提供Web界面进行任务管理和监控
- 支持任务执行历史记录和审计
- 集成通知和告警功能
实现分布式任务调度：
- 使用Celery实现分布式任务队列
- 支持任务的负载均衡和故障转移
- 提供任务执行状态的实时监控
创建智能自动化系统：
- 基于机器学习的任务优先级调度
- 自动优化任务执行策略
- 预测性维护和故障预防

项目实战

开发一个"企业自动化运维平台"，集成以下功能：

任务调度和管理：支持定时任务、触发任务等多种调度方式
系统监控：实时监控服务器资源使用情况
日志管理：集中收集和分析系统日志
配置管理：统一管理服务器配置文件
部署管理：自动化应用部署和更新
告警通知：多渠道告警通知（邮件、短信、微信等）
报表统计：生成各类运维报表和统计分析
权限管理：基于角色的访问控制

扩展阅读

APScheduler官方文档: apscheduler.readthedocs.io/
- Python高级调度库的官方文档，包含详细的使用指南
《Python自动化运维：技术与最佳实践》:
- 专门介绍Python在运维自动化方面的应用和最佳实践
Fabric官方文档: www.fabfile.org/
- Python远程执行和部署工具的官方文档
Schedule库: github.com/dbader/sche…
- 简单易用的Python作业调度库
Watchdog库: github.com/gorakhargos…
- 监控文件系统事件的Python库
《Ansible: Up and Running》 by Lorin Hochstein:
- 介绍Ansible自动化工具的书籍
Celery官方文档: docs.celeryproject.org/
- 分布式任务队列系统的官方文档
《Site Reliability Engineering》 by Google:
- Google的SRE实践，包含大量自动化运维的经验

通过深入学习这些扩展资源，你将进一步巩固对自动化脚本开发的理解，并掌握更多高级用法和最佳实践。自动化脚本开发是提高工作效率的重要技能，掌握它将使你在工作中更具竞争力。