Python的requests模块是不支持异步请求的。
例子就是我们在《Python爬虫(十四)使用协程实现异步爬虫》中最后爬取图片的例子。
那么,我们如何使用之前学到的知识实现异步请求呢?
我第一个想到了线程池。试试,代码如下所示:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# @Time : 2022/3/24 19:11
# @Author : camellia
# @Email : 805795955@qq.com
# @File : asyncPaChong.py
# @Software: PyCharm
import asyncio
import requests
import random
import string
async def request(url):
print("开始下载:", url)
loop = asyncio.get_event_loop()
# awaitable loop.run_in_executor(executor, func, *args)
# 参数: executor可以是ThreadPoolExecutor / ProcessPool, 如果是None,则使用默认线程池
# requests模块默认不支持异步操作,所以就使用线程池来配合实现
# response = requests.get(url)
future = loop.run_in_executor(None, requests.get, url)
response = await future
print("下载完成")
# 把图片保存到本地
file_name = './img/' + ''.join(random.sample(string.ascii_letters + string.digits, 8)) + '.jpg'
with open(file_name, mode='wb') as file_object:
file_object.write(response.content)
# task的使用
async def mains():
task = [
asyncio.create_task(request("https://resource.guanchao.site/uploads/sowing/welcome-image3.jpg")),
asyncio.create_task(request("https://resource.guanchao.site/uploads/sowing/welcome-image5.jpg")),
asyncio.create_task(request("https://resource.guanchao.site/uploads/sowing/welcome-image7.jpg"))
]
print(task)
done,pending = await asyncio.wait(task)
print(task)
asyncio.run(mains())
执行上方的代码,输出:
[<Task pending coro=<request() running at F:\camellia\python\testProject\asyncPaChong.py:13>>, <Task pending coro=<request() running at F:\camellia\python\testProject\asyncPaChong.py:13>>,
<Task pending coro=<request() running at F:\camellia\python\testProject\asyncPaChong.py:13>>]
开始下载: https://resource.guanchao.site/uploads/sowing/welcome-image3.jpg
开始下载: https://resource.guanchao.site/uploads/sowing/welcome-image5.jpg
开始下载: https://resource.guanchao.site/uploads/sowing/welcome-image7.jpg
下载完成
下载完成
下载完成
[<Task finished coro=<request() done, defined at F:\camellia\python\testProject\asyncPaChong.py:13> result=None>, <Task finished coro=<request() done, defined at
F:\camellia\python\testProject\asyncPaChong.py:13> result=None>, <Task finished coro=<request() done, defined at F:\camellia\python\testProject\asyncPaChong.py:13> result=None>]
根据上方代码执行的结果,我们可以看到爬取图片是异步爬取的。
但是,这里的主角并不是线程池,我们还可以使用异步请求 aiohttp
首先安装aiohttp模块:
pip install aiohttp
将上边的代码修改一下:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# @Time : 2022/3/24 19:11
# @Author : camellia
# @Email : 805795955@qq.com
# @File : asyncPaChong.py
# @Software: PyCharm
import asyncio
import aiohttp
import requests
import random
import string
async def request(url):
print("开始下载:", url)
async with aiohttp.ClientSession() as session:
async with await session.get(url,verify_ssl=False) as response:
print("下载完成")
content = await response.content.read()
# 把图片保存到本地
file_name = './img/' + ''.join(random.sample(string.ascii_letters + string.digits, 8)) + '.jpg'
with open(file_name, mode='wb') as file_object:
file_object.write(content)
# task的使用
async def mains():
task = [
asyncio.create_task(request("https://resource.guanchao.site/uploads/sowing/welcome-image3.jpg")),
asyncio.create_task(request("https://resource.guanchao.site/uploads/sowing/welcome-image5.jpg")),
asyncio.create_task(request("https://resource.guanchao.site/uploads/sowing/welcome-image7.jpg"))
]
print(task)
done,pending = await asyncio.wait(task)
print(task)
asyncio.run(mains())
执行上方代码输出:
[<Task pending coro=<request() running at F:\camellia\python\testProject\asyncPaChong.py:16>>, <Task pending coro=<request() running at
F:\camellia\python\testProject\asyncPaChong.py:16>>, <Task pending coro=<request() running at F:\camellia\python\testProject\asyncPaChong.py:16>>]
开始下载: https://resource.guanchao.site/uploads/sowing/welcome-image3.jpg
开始下载: https://resource.guanchao.site/uploads/sowing/welcome-image5.jpg
开始下载: https://resource.guanchao.site/uploads/sowing/welcome-image7.jpg
下载完成
下载完成
下载完成
[<Task finished coro=<request() done, defined at F:\camellia\python\testProject\asyncPaChong.py:16> result=None>, <Task finished coro=<request() done, defined at
F:\camellia\python\testProject\asyncPaChong.py:16> result=None>, <Task finished coro=<request() done, defined at F:\camellia\python\testProject\asyncPaChong.py:16> result=None>]
我们可以看到,输出的结果与使用线程池部分的代码输出的结果是一致的。
我这里只是简单地介绍了aiohttp在爬虫中的使用。
更多使用方法,请参考官方文档docs.aiohttp.org/en/stable/
有好的建议,请在下方输入你的评论。