说明
asyncio是一个Python标准库中的轻量化的多任务并发框架,非常好用,比多进程、多线程的更轻量,更好用。它使用的是协程并发。最开始我准备采用gevent,但是后来看官方文档,说是Windows系统上只考虑通用,不考虑生产级的性能,而且第三方库还要monkey,算了。就用标准库的asyncio。
需求
多任务下载文件。
代码
依赖
先在项目根目录下的requirements.txt中增加下面两行:
aiohttp
aiofiles
主要代码
# 使用 asyncio 模块实现并发任务,要求 Python 3.9+
import asyncio
import os
import random
import time
from datetime import datetime
import logging
import aiofiles
import aiohttp
# 设置日志配置
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
# 设置随机种子以确保可重复性(可选)
random.seed(time.time())
async def download_file(session, url, filename):
try:
async with session.get(url) as response:
if response.status == 200:
content = await response.read()
# 确保目标目录存在
os.makedirs(os.path.dirname(filename), exist_ok=True)
async with aiofiles.open(filename, 'wb') as f:
await f.write(content)
return True
else:
logging.error(f"Error occurred while downloading {url} to file: {filename}, status code: {response.status}")
return False
except Exception as e:
logging.error(f"Error occurred while downloading {url} to file: {filename}", e)
return False
# 定义一个协程函数,模拟一个并发任务
async def task(task_id, args, session):
start = datetime.now()
logging.info(f"Task {task_id} started with args {args}")
url ='https://dldir1.qq.com/qqfile/qq/QQNT/Windows/QQ_9.9.17_250110_x64_01.exe'
filename = f"tmp/{task_id}_qq.exe"
if await download_file(session, url, filename):
logging.info(f"Task {task_id} finished, cost time: {datetime.now() - start}, args {args}")
else:
logging.info(f"Task {task_id} failed, cost time {datetime.now() - start}, args {args}")
# 主协程,启动四个任务,每个任务耗时随机秒数
async def main():
try:
logging.info("Starting")
timeout = aiohttp.ClientTimeout(total=120)
async with aiohttp.ClientSession(timeout=timeout) as session:
# 使用 asyncio.gather() 来并发执行多个异步任务
tasks = [
task(1, {"param": random.random()}, session),
task(2, {"param": random.random()}, session),
task(3, {"param": random.random()}, session),
task(4, {"param": random.random()}, session),
task(5, {"param": random.random()}, session),
]
await asyncio.gather(*tasks)
logging.info("Done")
except KeyboardInterrupt:
logging.warning("Program interrupted by user")
except asyncio.CancelledError:
logging.warning("Task was cancelled")
except Exception as e:
logging.error(f"An error occurred", e)
if __name__ == "__main__":
logging.debug("Starting main")
# 启动主协程,这是 Python 3.7+ 推荐的方式
asyncio.run(main())
logging.debug("Finished main")
files 代码非常简单,启动多个任务,在每个任务中执行下载,遇到会阻塞的情况,就用aiohttp或者aiofiles,然后await这个阻塞,出让CPU给其他任务,防止一个任务阻塞总体进度。
下面是代码执行输出:
E:\code\PycharmProjects\test-any\venv\Scripts\python.exe E:\code\PycharmProjects\test-any\parallel_job_2.py
2025-02-20 12:20:51,203 - DEBUG - Starting main
2025-02-20 12:20:51,204 - DEBUG - Using proactor: IocpProactor
2025-02-20 12:20:51,205 - INFO - Starting
2025-02-20 12:20:51,205 - INFO - Task 1 started with args {'param': 0.9537630237890041}
2025-02-20 12:20:51,209 - INFO - Task 2 started with args {'param': 0.09408088357590583}
2025-02-20 12:20:51,209 - INFO - Task 3 started with args {'param': 0.05718354901791489}
2025-02-20 12:20:51,209 - INFO - Task 4 started with args {'param': 0.21859518871851447}
2025-02-20 12:20:51,209 - INFO - Task 5 started with args {'param': 0.9989906549911304}
2025-02-20 12:22:18,922 - INFO - Task 5 finished, cost time: 0:01:27.712564, args {'param': 0.9989906549911304}
2025-02-20 12:22:28,717 - INFO - Task 1 finished, cost time: 0:01:37.511788, args {'param': 0.9537630237890041}
2025-02-20 12:22:39,031 - INFO - Task 3 finished, cost time: 0:01:47.822317, args {'param': 0.05718354901791489}
2025-02-20 12:22:40,442 - INFO - Task 2 finished, cost time: 0:01:49.233613, args {'param': 0.09408088357590583}
2025-02-20 12:22:44,941 - INFO - Task 4 finished, cost time: 0:01:53.732244, args {'param': 0.21859518871851447}
2025-02-20 12:22:44,942 - INFO - Done
2025-02-20 12:22:44,944 - DEBUG - Finished main
Process finished with exit code 0
是不是看上去还挺不错的?
后记
如果以后需要类似的任务,只需要把代码中的download_file这个任务函数换成其他的、再微调一下调用参数就可以了,整体结构变化不大。