使用Python的标准库asyncio实现多任务并发

108 阅读3分钟

说明

asyncio是一个Python标准库中的轻量化的多任务并发框架,非常好用,比多进程、多线程的更轻量,更好用。它使用的是协程并发。最开始我准备采用gevent,但是后来看官方文档,说是Windows系统上只考虑通用,不考虑生产级的性能,而且第三方库还要monkey,算了。就用标准库的asyncio。

需求

多任务下载文件。

代码

依赖

先在项目根目录下的requirements.txt中增加下面两行:

aiohttp
aiofiles

主要代码

# 使用 asyncio 模块实现并发任务,要求 Python 3.9+

import asyncio
import os
import random
import time
from datetime import datetime
import logging

import aiofiles
import aiohttp

# 设置日志配置
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')

# 设置随机种子以确保可重复性(可选)
random.seed(time.time())

async def download_file(session, url, filename):
    try:
        async with session.get(url) as response:
            if response.status == 200:
                content = await response.read()
                # 确保目标目录存在
                os.makedirs(os.path.dirname(filename), exist_ok=True)
                async with aiofiles.open(filename, 'wb') as f:
                    await f.write(content)
                return True
            else:
                logging.error(f"Error occurred while downloading {url} to file: {filename}, status code: {response.status}")
                return False
    except Exception as e:
        logging.error(f"Error occurred while downloading {url} to file: {filename}", e)
        return False

# 定义一个协程函数,模拟一个并发任务
async def task(task_id, args, session):
    start = datetime.now()
    logging.info(f"Task {task_id} started with args {args}")
    url ='https://dldir1.qq.com/qqfile/qq/QQNT/Windows/QQ_9.9.17_250110_x64_01.exe'
    filename = f"tmp/{task_id}_qq.exe"
    if await download_file(session, url, filename):
        logging.info(f"Task {task_id} finished, cost time: {datetime.now() - start}, args {args}")
    else:
        logging.info(f"Task {task_id} failed, cost time {datetime.now() - start}, args {args}")

# 主协程,启动四个任务,每个任务耗时随机秒数
async def main():
    try:
        logging.info("Starting")
        timeout = aiohttp.ClientTimeout(total=120)
        async with aiohttp.ClientSession(timeout=timeout) as session:
            # 使用 asyncio.gather() 来并发执行多个异步任务
            tasks = [
                task(1, {"param": random.random()}, session),
                task(2, {"param": random.random()}, session),
                task(3, {"param": random.random()}, session),
                task(4, {"param": random.random()}, session),
                task(5, {"param": random.random()}, session),
            ]
            await asyncio.gather(*tasks)
        logging.info("Done")
    except KeyboardInterrupt:
        logging.warning("Program interrupted by user")
    except asyncio.CancelledError:
        logging.warning("Task was cancelled")
    except Exception as e:
        logging.error(f"An error occurred", e)

if __name__ == "__main__":
    logging.debug("Starting main")
    # 启动主协程,这是 Python 3.7+ 推荐的方式
    asyncio.run(main())
    logging.debug("Finished main")

files 代码非常简单,启动多个任务,在每个任务中执行下载,遇到会阻塞的情况,就用aiohttp或者aiofiles,然后await这个阻塞,出让CPU给其他任务,防止一个任务阻塞总体进度。

下面是代码执行输出:

E:\code\PycharmProjects\test-any\venv\Scripts\python.exe E:\code\PycharmProjects\test-any\parallel_job_2.py 
2025-02-20 12:20:51,203 - DEBUG - Starting main
2025-02-20 12:20:51,204 - DEBUG - Using proactor: IocpProactor
2025-02-20 12:20:51,205 - INFO - Starting
2025-02-20 12:20:51,205 - INFO - Task 1 started with args {'param': 0.9537630237890041}
2025-02-20 12:20:51,209 - INFO - Task 2 started with args {'param': 0.09408088357590583}
2025-02-20 12:20:51,209 - INFO - Task 3 started with args {'param': 0.05718354901791489}
2025-02-20 12:20:51,209 - INFO - Task 4 started with args {'param': 0.21859518871851447}
2025-02-20 12:20:51,209 - INFO - Task 5 started with args {'param': 0.9989906549911304}
2025-02-20 12:22:18,922 - INFO - Task 5 finished, cost time: 0:01:27.712564, args {'param': 0.9989906549911304}
2025-02-20 12:22:28,717 - INFO - Task 1 finished, cost time: 0:01:37.511788, args {'param': 0.9537630237890041}
2025-02-20 12:22:39,031 - INFO - Task 3 finished, cost time: 0:01:47.822317, args {'param': 0.05718354901791489}
2025-02-20 12:22:40,442 - INFO - Task 2 finished, cost time: 0:01:49.233613, args {'param': 0.09408088357590583}
2025-02-20 12:22:44,941 - INFO - Task 4 finished, cost time: 0:01:53.732244, args {'param': 0.21859518871851447}
2025-02-20 12:22:44,942 - INFO - Done
2025-02-20 12:22:44,944 - DEBUG - Finished main

Process finished with exit code 0

是不是看上去还挺不错的?

后记

如果以后需要类似的任务,只需要把代码中的download_file这个任务函数换成其他的、再微调一下调用参数就可以了,整体结构变化不大。