asyncio 是什么
python3.4 之后引入的用于编写并发代码的库,是实现协程的一种方式
与多线程多进程的区别
- 进程、线程资源站用高(会有 CPU 上下文切换,内核态和用户态的切换)
- 开发者不能限制何时切换
- 进程、线程可创建数量较少,一般为 CPU*2
- GIL 锁导致同时只能有一个线程在工作
asyncio 适用于同时处理大量的网络请求、数据库请求等 io 密集工作
协程Coroutine
概念
协程函数是指具有开始、暂停、恢复能力的函数
创建
协程函数通过 async def 的方式创建
执行
coroutine 需要通过 event loop 事件循环执行
python3.7 之前需要先通过asyncio.get_event_loop() 获取到event loop,然后使用run_until_complete 执行coroutine
import asyncio
async def main():
await asyncio.sleep(1)
print('hello')
if __name__ == '__main__':
# after python3.7
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
# before python3.7
asyncio.run(main())
python3.7 之后可以直接使用asynio.run()直接执行
event loop
event loop 事件循环是 asyncio 程序的核心,用于运行异步任务和回调,执行网络 IO 操作,以及运行子进程。
await
await语法用于告知 python可以在此处暂停执行coroutine转而执行其他工作,该语法只能在coroutine函数中执行
await 后面只可以接 awaitable 的对象
-
coroutine
-
task
task 是对coroutine的再包装,可以通过
asyncio.create_task(coroutine)创建,python3.7之前用[asyncio.ensure_future()](<https://docs.python.org/zh-cn/3/library/asyncio-future.html#asyncio.ensure_future>)创建在 event loop 中,工作的执行以 task 为单位,当某个 task 正在等待执行结果时,也就是 await 的地方,event loop 会将其暂停,并切换至其他 task
task 也提供了一些其他操作,如取消 task,新增/删除回调函数(cancel(),add_done_callback(), remove_done_callback())
例如取消 task 的示例
import asyncio async def cancel_me(): print('cancel_me(): sleep') try: # Wait for 1 hour await asyncio.sleep(3600) except asyncio.CancelledError: print('cancel_me(): cancel sleep') raise finally: print('cancel_me(): after sleep') async def main(): print('main(): running') # Create a "cancel_me" Task task = asyncio.create_task(cancel_me()) # Wait for 5 second print('main(): sleep') await asyncio.sleep(5) print('main(): call cancel') task.cancel() try: await task except asyncio.CancelledError: print('main(): cancel_me is cancelled now') asyncio.run(main())上面的执行结果为
main(): running main(): sleep cancel_me(): sleep main(): call cancel cancel_me(): cancel sleep cancel_me(): after sleep main(): cancel_me is cancelled now -
future
task 继承自 future,因此 future 是一个相对底层的 awaitable 对象,用于代表一个异步运算的最终结果
与 task 不同的是,future 不是对 coroutine 的包装,而是异步运算的结果,因此其有一个set_result()方法,可以直接写入结果并将其标为done 状态
import asyncio async def do_async_job(fut): await asyncio.sleep(2) fut.set_result('Hello future') async def main(): loop = asyncio.get_running_loop() future = loop.create_future() loop.create_task(do_async_job(future)) # Wait until future has a result await future print(future.result()) asyncio.run(main())
**asyncio.gather()和asyncio.wait()
asyncio.gather()方法接受(**aws, return_exceptions=False)*,通常可以将多个执行相同任务的coroutines, Tasks 或者 Futures一起交给asyncio.gather()执行
return_exceptions默认为 False,在执行时首个引发的异常会导致之后的任务停止,指定为 True 时异常会和成功的结果一样处理,并聚合至结果列表。
未使用gather 的代码
import asyncio
import threading
from datetime import datetime
async def do_async_job():
await asyncio.sleep(2)
print(datetime.now().isoformat(), 'thread id', threading.current_thread().ident)
async def main():
await do_async_job()
await do_async_job()
await do_async_job()
asyncio.run(main())
结果为
2022-02-27T21:02:40.175619 thread id 4342875520
2022-02-27T21:02:42.176905 thread id 4342875520
2022-02-27T21:02:44.178145 thread id 4342875520
三个 do_async_job 还是依序执行的
使用gather
import asyncio
import threading
async def do_async_job():
await asyncio.sleep(2)
print(datetime.now().isoformat(), 'thread id', threading.current_thread().ident)
async def main():
job1 = do_async_job()
job2 = do_async_job()
job3 = do_async_job()
await asyncio.gather(job1, job2, job3)
asyncio.run(main())
结果为
2022-02-27T21:03:15.813865 thread id 4309861760
2022-02-27T21:03:15.814161 thread id 4309861760
2022-02-27T21:03:15.814209 thread id 4309861760
asyncio.wait 也可以实现 gather 的功能,是更底层的实现,接受的参数(aws, timeout=None, return_when=ALL_COMPLETED)
wait 中的 aws 推荐传入task/future 组成的列表或集合
timeout 用于指定最长等待秒数,
return_when指定函数在何时返回,默认为ALL_COMPLETED
| 常量 | 描述 |
|---|---|
| FIRST_COMPLETED | 函数将在任意可等待对象结束或取消时返回。 |
| FIRST_EXCEPTION | 函数将在任意可等待对象因引发异常而结束时返回。当没有引发任何异常时它就相当于 ALL_COMPLETED。 |
| ALL_COMPLETED | 函数将在所有可等待对象结束或取消时返回。 |
import asyncio
import random
async def coro(tag):
print(">",tag)
await asyncio.sleep(random.uniform(0.5, 5))
print("<",tag)
returntag
loop = asyncio.get_event_loop()
tasks = [coro(i) for i inrange(1, 11)]
print("Get first result:")
finished, unfinished = loop.run_until_complete(
asyncio.wait(tasks,return_when=asyncio.FIRST_COMPLETED))
for task in finished:
print(task.result())
print("unfinished:",len(unfinished))
print("Get more results in 2 seconds:")
finished2, unfinished2 = loop.run_until_complete(
asyncio.wait(unfinished,timeout=2))
for task in finished2:
print(task.result())
print("unfinished2:",len(unfinished2))
print("Get all other results:")
finished3, unfinished3 = loop.run_until_complete(asyncio.wait(unfinished2))
for task in finished3:
print(task.result())
loop.close()
限时和限制并发
限时 asyncio.wait_for
import asyncio
async def do_async_job():
await asyncio.sleep(2)
print('never print')
async def main():
try:
await asyncio.wait_for(do_async_job(), timeout=1)
except asyncio.TimeoutError:
print('timeout!')
asyncio.run(main())
限制并发 asyncio.**Semaphore(value=1)**
Semaphore信号量指定可同时执行的异步任务数量
使用 Semaphore 的推荐方式是通过 [async with](<https://docs.python.org/zh-cn/3/reference/compound_stmts.html#async-with>) 语句。
sem = asyncio.Semaphore(10)
# ... later
asyncwith sem:
# work with shared resource
在线程中运行
由于 event loop 负责执行非同步(asynchronous)的工作,为了发挥 event loop 最大性能,需要确保每个 coroutine 中都需要有 await 的存在或者想办法将执行时间很长的部分转为 coroutine,使得 event loop 能够有机会切换执行其他工作的机会,否则 event loop 遇到执行特别时间长的程式码,又没有 await 能够让 event loop 能够转为执行其他工作时,就会造成 event loop 阻塞,例如以下范例:
import asyncio
import threading
from time import sleep
def hard_work():
print('thread id:', threading.get_ident())
sleep(10)
async def do_async_job():
hard_work()
await asyncio.sleep(1)
print('job done!')
async def main():
task1 = asyncio.create_task(do_async_job())
task2 = asyncio.create_task(do_async_job())
task3 = asyncio.create_task(do_async_job())
await asyncio.gather(task1, task2, task3)
asyncio.run(main())
为了解决某些耗时执行的程式码阻塞 event loop 的问题, Python 3.9 提供 asyncio.to_thread() 可以将耗时执行的部分丢至 event loop 以外的另一个 thread 中执行
import asyncio
import threading
from time import sleep
def hard_work():
print('thread id:', threading.get_ident())
sleep(10)
async def do_async_job():
await asyncio.to_thread(hard_work)
await asyncio.sleep(1)
print('job done!')
async def main():
task1 = asyncio.create_task(do_async_job())
task2 = asyncio.create_task(do_async_job())
task3 = asyncio.create_task(do_async_job())
await asyncio.gather(task1, task2, task3)
asyncio.run(main())
不过 Python 3.8 以下并没有 asyncio.to_thread() 可以使用,但我们仍然可以利用 event loop 所提供的方法 loop.run_in_executor 达成相同效果。
loop.run_in_executor 能够结合 concurrent 将工作交给其他线程或进程执行, loop.run_in_executor 会返回 Future 对象,所以需要以 await 告诉 event loop 等待其结果。
上述 asyncio.to_thread() 示例可以改成:
import asyncio
import concurrent
import threading
from time import sleep
def hard_work():
print('thread id:', threading.get_ident())
sleep(10)
async def do_async_job(loop, pool):
await loop.run_in_executor(pool, hard_work)
await asyncio.sleep(1)
print('job done!')
async def main():
loop = asyncio.get_event_loop()
with concurrent.futures.ThreadPoolExecutor() as pool:
task1 = asyncio.create_task(do_async_job(loop, pool))
task2 = asyncio.create_task(do_async_job(loop, pool))
task3 = asyncio.create_task(do_async_job(loop, pool))
await asyncio.gather(task1, task2, task3)
asyncio.run(main())
asyncio 演示
import json
import random
import time
import requests
from Crypto.Cipher import AES
import base64
from urllib3 import disable_warnings
from faker import Faker
import asyncio
import aiohttp
from loguru import logger
import aioredis
disable_warnings()
class WyyHack:
def __init__(self, url):
self.emj = {
"色": "00e0b",
"流感": "509f6",
"这边": "259df",
"弱": "8642d",
"嘴唇": "bc356",
"亲": "62901",
"开心": "477df",
"呲牙": "22677",
"憨笑": "ec152",
"猫": "b5ff6",
"皱眉": "8ace6",
"幽灵": "15bb7",
"蛋糕": "b7251",
"发怒": "52b3a",
"大哭": "b17a8",
"兔子": "76aea",
"星星": "8a5aa",
"钟情": "76d2e",
"牵手": "41762",
"公鸡": "9ec4e",
"爱意": "e341f",
"禁止": "56135",
"狗": "fccf6",
"亲亲": "95280",
"叉": "104e0",
"礼物": "312ec",
"晕": "bda92",
"呆": "557c9",
"生病": "38701",
"钻石": "14af6",
"拜": "c9d05",
"怒": "c4f7f",
"示爱": "0c368",
"汗": "5b7a4",
"小鸡": "6bee2",
"痛苦": "55932",
"撇嘴": "575cc",
"惶恐": "e10b4",
"口罩": "24d81",
"吐舌": "3cfe4",
"心碎": "875d3",
"生气": "e8204",
"可爱": "7b97d",
"鬼脸": "def52",
"跳舞": "741d5",
"男孩": "46b8e",
"奸笑": "289dc",
"猪": "6935b",
"圈": "3ece0",
"便便": "462db",
"外星": "0a22b",
"圣诞": "8e7",
"流泪": "01000",
"强": "1",
"爱心": "0CoJU",
"女孩": "m6Qyw",
"惊恐": "8W8ju",
"大笑": "d"
}
self.md = [
"色",
"流感",
"这边",
"弱",
"嘴唇",
"亲",
"开心",
"呲牙",
"憨笑",
"猫",
"皱眉",
"幽灵",
"蛋糕",
"发怒",
"大哭",
"兔子",
"星星",
"钟情",
"牵手",
"公鸡",
"爱意",
"禁止",
"狗",
"亲亲",
"叉",
"礼物",
"晕",
"呆",
"生病",
"钻石",
"拜",
"怒",
"示爱",
"汗",
"小鸡",
"痛苦",
"撇嘴",
"惶恐",
"口罩",
"吐舌",
"心碎",
"生气",
"可爱",
"鬼脸",
"跳舞",
"男孩",
"奸笑",
"猪",
"圈",
"便便",
"外星",
"圣诞"
]
self.p1 = self.mapping(["流泪", "强"])
self.p2 = self.mapping(self.md)
self.p3 = self.mapping(["爱心", "女孩", "惊恐", "大笑"])
self.vi = b"0102030405060708"
self.base_str = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
self.session = requests.session()
self.music_id = url.split('=')[1]
self.api_url = "<https://music.163.com/weapi/comment/resource/comments/get?csrf_token=>"
self.headers = {
"User-Agent": str(Faker().user_agent()),
"Referer": url,
"Content-Type": "application/x-www-form-urlencoded",
"Origin": "<http://music.163.com>",
"Host": "music.163.com"
}
def mapping(self, list_):
# 获取映射
result = [self.emj[item] for item in list_]
return ''.join(result)
def gen_random_str(self, nums):
# 生成随机16位字符
c = ""
for d in range(nums):
e = int(random.random() * len(self.base_str))
c += self.base_str[e]
return c
def rsa(self, a, b, c):
# RSA加密
num = pow(int(a[::-1].encode().hex(), 16), int(b, 16), int(c, 16))
result = format(num, 'x')
return result
def aes(self, text, key, iv):
# aes加密
pad = 16 - len(text) % 16
text = (text + pad * chr(pad)).encode("utf-8")
encryptor = AES.new(key.encode('utf-8'), AES.MODE_CBC, iv)
encrypt_text = encryptor.encrypt(text)
encrypt_text = base64.b64encode(encrypt_text)
return encrypt_text.decode('utf-8')
def get_param(self, p4):
# 获取加密参数
rand_str = self.gen_random_str(16)
encText = self.aes(self.aes(p4, self.p3, self.vi), rand_str, self.vi)
encSecKey = self.rsa(rand_str, self.p1, self.p2)
return {"params": encText, "encSecKey": encSecKey}
async def save2redis(self, redis_conn, key, value):
await redis_conn.hset(f'music_comment:{self.music_id}', key=key, value=value)
async def async_start(self, page, session, semaphore, redis_conn):
async with semaphore:
# logger.info(f'========================正在获取第{page + 1}页评论========================')
p4 = json.dumps(
{"csrf_token": "990d60a1fa116061d77c53a71445b821", "cursor": int(time.time() * 1000),
"offset": str(page * 20), "orderType": "2",
"pageNo": str(page + 1), "pageSize": "20", "rid": f"R_SO_4_{self.music_id}",
"threadId": f"R_SO_4_{self.music_id}"})
param = self.get_param(p4)
try:
async with session.post(self.api_url, headers=self.headers, data=param, ssl=False) as response:
data = await response.json()
mapping = []
for comment in data.get("data", {}).get("comments", {}):
nickname = comment.get("user", {}).get("nickname", "")
content = comment.get("content", "")
logger.info(f'user: {nickname}\tcomment: {content}')
mapping.append([nickname, content])
# await redis_conn.hset(f'music_comment:{self.music_id}', key=nickname, value=content)
tasks = [asyncio.create_task(self.save2redis(redis_conn, item[0], item[1])) for item in mapping]
await asyncio.wait(tasks)
except Exception as e:
logger.error(e)
async def start(self):
redis_conn = await aioredis.from_url('redis://localhost:6379/0')
semaphore = asyncio.Semaphore(10)
async with aiohttp.ClientSession() as session:
tasks = [asyncio.create_task(self.async_start(page, session, semaphore, redis_conn)) for page in range(10)]
await asyncio.wait(tasks)
if __name__ == '__main__':
url = "<https://music.163.com/#/song?id=1359645265>"
wyy = WyyHack(url)
asyncio.run(wyy.start())