背景
因为业务需求,最近需要研究一些并发抓取的技术。做为新手,最终选择了aiohttp+多台服务器的方案,后续再精进,在此记录一下aiohttp的学习。
1.aiohttp能做啥
aiohttp 是一个基于 asyncio 的异步 HTTP 网络模块,它既提供了服务端,又提供了客户端。
2.aiohttp集成
pip install aiohttp
pip install asyncio
3.aiohttp 常规设置
# 超时设置
timeout = aiohttp.ClientTimeout(total=60)
# 并发设置
semaphore = asyncio.Semaphore(2)
async with semaphore:
async with session.get(url,timeout=timeout) as response:
return await response.text(), response.status
4.aiohttp 实例1
import aiohttp
import asyncio
# get 请求
async def fetch(session, url):
async with session.get(url) as response:
return await response.text(), response.status
# get 带参数
async def fetch(session, url):
params = {'name': 'germey', 'age': 25}
async with session.get(url, params=params) as response:
return await response.text(), response.status
# post 带参数payload
async def fetch(session, url):
payload = {'name': 'germey', 'age': 25}
async with session.post(url, data=payload) as response:
return await response.text(), response.status
# post 带参数json
async def fetch(session, url):
json = {'name': 'germey', 'age': 25}
async with session.post(url, json=json) as response:
return await response.text(), response.status
async def main():
async with aiohttp.ClientSession() as session:
html, status = await fetch(session, 'https://baidu.com')
print(f'html: {html[:100]}...')
print(f'status: {status}')
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())