【python爬虫】aiohttp学习

220 阅读1分钟

背景

因为业务需求,最近需要研究一些并发抓取的技术。做为新手,最终选择了aiohttp+多台服务器的方案,后续再精进,在此记录一下aiohttp的学习。

1.aiohttp能做啥

aiohttp 是一个基于 asyncio 的异步 HTTP 网络模块,它既提供了服务端,又提供了客户端。

2.aiohttp集成

pip install aiohttp
pip install asyncio

3.aiohttp 常规设置

# 超时设置
timeout = aiohttp.ClientTimeout(total=60)
# 并发设置
semaphore = asyncio.Semaphore(2)
async with semaphore:
    async with session.get(url,timeout=timeout) as response:
        return await response.text(), response.status

4.aiohttp 实例1

import aiohttp
import asyncio

# get 请求
async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text(), response.status
        
# get 带参数
async def fetch(session, url):
    params = {'name': 'germey', 'age': 25}
    async with session.get(url, params=params) as response:
        return await response.text(), response.status
        
# post 带参数payload
async def fetch(session, url):
    payload = {'name': 'germey', 'age': 25}
    async with session.post(url, data=payload) as response:
        return await response.text(), response.status
        
# post 带参数json
async def fetch(session, url):
    json = {'name': 'germey', 'age': 25}
    async with session.post(url, json=json) as response:
        return await response.text(), response.status

async def main():
    async with aiohttp.ClientSession() as session:
        html, status = await fetch(session, 'https://baidu.com')
        print(f'html: {html[:100]}...')
        print(f'status: {status}')

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

参考:docs.aiohttp.org/en/stable/