python 异步编程 asyncio

745 阅读7分钟

asyncio 是什么

python3.4 之后引入的用于编写并发代码的库,是实现协程的一种方式

与多线程多进程的区别

  1. 进程、线程资源站用高(会有 CPU 上下文切换,内核态和用户态的切换)
  2. 开发者不能限制何时切换
  3. 进程、线程可创建数量较少,一般为 CPU*2
  4. GIL 锁导致同时只能有一个线程在工作

asyncio 适用于同时处理大量的网络请求、数据库请求等 io 密集工作

协程Coroutine

概念

协程函数是指具有开始、暂停、恢复能力的函数

创建

协程函数通过 async def 的方式创建

执行

coroutine 需要通过 event loop 事件循环执行

python3.7 之前需要先通过asyncio.get_event_loop() 获取到event loop,然后使用run_until_complete 执行coroutine

import asyncio

async def main():
    await asyncio.sleep(1)
    print('hello')

if __name__ == '__main__':
    # after python3.7
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    # before python3.7
    asyncio.run(main())

python3.7 之后可以直接使用asynio.run()直接执行

event loop

event loop 事件循环是 asyncio 程序的核心,用于运行异步任务和回调,执行网络 IO 操作,以及运行子进程。

await

await语法用于告知 python可以在此处暂停执行coroutine转而执行其他工作,该语法只能在coroutine函数中执行

await 后面只可以接 awaitable 的对象

  1. coroutine

  2. task

    task 是对coroutine的再包装,可以通过asyncio.create_task(coroutine) 创建,python3.7之前用[asyncio.ensure_future()](<https://docs.python.org/zh-cn/3/library/asyncio-future.html#asyncio.ensure_future>) 创建

    在 event loop 中,工作的执行以 task 为单位,当某个 task 正在等待执行结果时,也就是 await 的地方,event loop 会将其暂停,并切换至其他 task

    task 也提供了一些其他操作,如取消 task,新增/删除回调函数(cancel(),add_done_callback(), remove_done_callback())

    例如取消 task 的示例

    import asyncio
    
    async def cancel_me():
        print('cancel_me(): sleep')
        try:
            # Wait for 1 hour
            await asyncio.sleep(3600)
        except asyncio.CancelledError:
            print('cancel_me(): cancel sleep')
            raise
        finally:
            print('cancel_me(): after sleep')
    
    async def main():
        print('main(): running')
        # Create a "cancel_me" Task
        task = asyncio.create_task(cancel_me())
    
        # Wait for 5 second
        print('main(): sleep')
        await asyncio.sleep(5)
    
        print('main(): call cancel')
        task.cancel()
        try:
            await task
        except asyncio.CancelledError:
            print('main(): cancel_me is cancelled now')
    
    asyncio.run(main())
    

    上面的执行结果为

    main(): running
    main(): sleep
    cancel_me(): sleep
    main(): call cancel
    cancel_me(): cancel sleep
    cancel_me(): after sleep
    main(): cancel_me is cancelled now
    
  3. future

    task 继承自 future,因此 future 是一个相对底层的 awaitable 对象,用于代表一个异步运算的最终结果

    与 task 不同的是,future 不是对 coroutine 的包装,而是异步运算的结果,因此其有一个set_result()方法,可以直接写入结果并将其标为done 状态

    import asyncio
    
    async def do_async_job(fut):
        await asyncio.sleep(2)
        fut.set_result('Hello future')
    
    async def main():
        loop = asyncio.get_running_loop()
    
        future = loop.create_future()
        loop.create_task(do_async_job(future))
    
        # Wait until future has a result
        await future
    
        print(future.result())
    
    asyncio.run(main())
    

**asyncio.gather()asyncio.wait()

asyncio.gather()方法接受(**aws, return_exceptions=False)*,通常可以将多个执行相同任务的coroutines, Tasks 或者 Futures一起交给asyncio.gather()执行

return_exceptions默认为 False,在执行时首个引发的异常会导致之后的任务停止,指定为 True 时异常会和成功的结果一样处理,并聚合至结果列表。

未使用gather 的代码

import asyncio
import threading
from datetime import datetime

async def do_async_job():
    await asyncio.sleep(2)
    print(datetime.now().isoformat(), 'thread id', threading.current_thread().ident)

async def main():
    await do_async_job()
    await do_async_job()
    await do_async_job()

asyncio.run(main())

结果为

2022-02-27T21:02:40.175619 thread id 4342875520
2022-02-27T21:02:42.176905 thread id 4342875520
2022-02-27T21:02:44.178145 thread id 4342875520

三个 do_async_job 还是依序执行的

使用gather

import asyncio
import threading

async def do_async_job():
    await asyncio.sleep(2)
    print(datetime.now().isoformat(), 'thread id', threading.current_thread().ident)

async def main():
    job1 = do_async_job()
    job2 = do_async_job()
    job3 = do_async_job()
    await asyncio.gather(job1, job2, job3)

asyncio.run(main())

结果为

2022-02-27T21:03:15.813865 thread id 4309861760
2022-02-27T21:03:15.814161 thread id 4309861760
2022-02-27T21:03:15.814209 thread id 4309861760

asyncio.wait 也可以实现 gather 的功能,是更底层的实现,接受的参数(awstimeout=Nonereturn_when=ALL_COMPLETED)

wait 中的 aws 推荐传入task/future 组成的列表或集合

timeout 用于指定最长等待秒数,

return_when指定函数在何时返回,默认为ALL_COMPLETED

常量描述
FIRST_COMPLETED函数将在任意可等待对象结束或取消时返回。
FIRST_EXCEPTION函数将在任意可等待对象因引发异常而结束时返回。当没有引发任何异常时它就相当于 ALL_COMPLETED。
ALL_COMPLETED函数将在所有可等待对象结束或取消时返回。
import asyncio
import random

async def coro(tag):
print(">",tag)
    await asyncio.sleep(random.uniform(0.5, 5))
print("<",tag)
    returntag

loop = asyncio.get_event_loop()

tasks = [coro(i) for i inrange(1, 11)]

print("Get first result:")
finished, unfinished = loop.run_until_complete(
    asyncio.wait(tasks,return_when=asyncio.FIRST_COMPLETED))

for task in finished:
print(task.result())
print("unfinished:",len(unfinished))

print("Get more results in 2 seconds:")
finished2, unfinished2 = loop.run_until_complete(
    asyncio.wait(unfinished,timeout=2))

for task in finished2:
print(task.result())
print("unfinished2:",len(unfinished2))

print("Get all other results:")
finished3, unfinished3 = loop.run_until_complete(asyncio.wait(unfinished2))

for task in finished3:
print(task.result())

loop.close()

限时和限制并发

限时 asyncio.wait_for

import asyncio

async def do_async_job():
    await asyncio.sleep(2)
    print('never print')

async def main():
    try:
        await asyncio.wait_for(do_async_job(), timeout=1)
    except asyncio.TimeoutError:
        print('timeout!')

asyncio.run(main())

限制并发 asyncio.**Semaphore(value=1)**

Semaphore信号量指定可同时执行的异步任务数量

使用 Semaphore 的推荐方式是通过 [async with](<https://docs.python.org/zh-cn/3/reference/compound_stmts.html#async-with>) 语句。

sem = asyncio.Semaphore(10)

# ... later
asyncwith sem:
	# work with shared resource

在线程中运行

由于 event loop 负责执行非同步(asynchronous)的工作,为了发挥 event loop 最大性能,需要确保每个 coroutine 中都需要有 await 的存在或者想办法将执行时间很长的部分转为 coroutine,使得 event loop 能够有机会切换执行其他工作的机会,否则 event loop 遇到执行特别时间长的程式码,又没有 await 能够让 event loop 能够转为执行其他工作时,就会造成 event loop 阻塞,例如以下范例:

import asyncio
import threading

from time import sleep

def hard_work():
    print('thread id:', threading.get_ident())
    sleep(10)

async def do_async_job():
    hard_work()
    await asyncio.sleep(1)
    print('job done!')

async def main():
    task1 = asyncio.create_task(do_async_job())
    task2 = asyncio.create_task(do_async_job())
    task3 = asyncio.create_task(do_async_job())
    await asyncio.gather(task1, task2, task3)

asyncio.run(main())

为了解决某些耗时执行的程式码阻塞 event loop 的问题, Python 3.9 提供 asyncio.to_thread() 可以将耗时执行的部分丢至 event loop 以外的另一个 thread 中执行

import asyncio
import threading

from time import sleep

def hard_work():
    print('thread id:', threading.get_ident())
    sleep(10)

async def do_async_job():
    await asyncio.to_thread(hard_work)
    await asyncio.sleep(1)
    print('job done!')

async def main():
    task1 = asyncio.create_task(do_async_job())
    task2 = asyncio.create_task(do_async_job())
    task3 = asyncio.create_task(do_async_job())
    await asyncio.gather(task1, task2, task3)

asyncio.run(main())

不过 Python 3.8 以下并没有 asyncio.to_thread() 可以使用,但我们仍然可以利用 event loop 所提供的方法 loop.run_in_executor 达成相同效果。

loop.run_in_executor 能够结合 concurrent 将工作交给其他线程或进程执行, loop.run_in_executor 会返回 Future 对象,所以需要以 await 告诉 event loop 等待其结果。

上述 asyncio.to_thread() 示例可以改成:

import asyncio
import concurrent
import threading

from time import sleep

def hard_work():
    print('thread id:', threading.get_ident())
    sleep(10)

async def do_async_job(loop, pool):
    await loop.run_in_executor(pool, hard_work)
    await asyncio.sleep(1)
    print('job done!')

async def main():
    loop = asyncio.get_event_loop()
    with concurrent.futures.ThreadPoolExecutor() as pool:
        task1 = asyncio.create_task(do_async_job(loop, pool))
        task2 = asyncio.create_task(do_async_job(loop, pool))
        task3 = asyncio.create_task(do_async_job(loop, pool))
        await asyncio.gather(task1, task2, task3)

asyncio.run(main())

asyncio 演示

import json
import random
import time
import requests
from Crypto.Cipher import AES
import base64
from urllib3 import disable_warnings
from faker import Faker
import asyncio
import aiohttp
from loguru import logger
import aioredis

disable_warnings()

class WyyHack:
    def __init__(self, url):
        self.emj = {
            "色": "00e0b",
            "流感": "509f6",
            "这边": "259df",
            "弱": "8642d",
            "嘴唇": "bc356",
            "亲": "62901",
            "开心": "477df",
            "呲牙": "22677",
            "憨笑": "ec152",
            "猫": "b5ff6",
            "皱眉": "8ace6",
            "幽灵": "15bb7",
            "蛋糕": "b7251",
            "发怒": "52b3a",
            "大哭": "b17a8",
            "兔子": "76aea",
            "星星": "8a5aa",
            "钟情": "76d2e",
            "牵手": "41762",
            "公鸡": "9ec4e",
            "爱意": "e341f",
            "禁止": "56135",
            "狗": "fccf6",
            "亲亲": "95280",
            "叉": "104e0",
            "礼物": "312ec",
            "晕": "bda92",
            "呆": "557c9",
            "生病": "38701",
            "钻石": "14af6",
            "拜": "c9d05",
            "怒": "c4f7f",
            "示爱": "0c368",
            "汗": "5b7a4",
            "小鸡": "6bee2",
            "痛苦": "55932",
            "撇嘴": "575cc",
            "惶恐": "e10b4",
            "口罩": "24d81",
            "吐舌": "3cfe4",
            "心碎": "875d3",
            "生气": "e8204",
            "可爱": "7b97d",
            "鬼脸": "def52",
            "跳舞": "741d5",
            "男孩": "46b8e",
            "奸笑": "289dc",
            "猪": "6935b",
            "圈": "3ece0",
            "便便": "462db",
            "外星": "0a22b",
            "圣诞": "8e7",
            "流泪": "01000",
            "强": "1",
            "爱心": "0CoJU",
            "女孩": "m6Qyw",
            "惊恐": "8W8ju",
            "大笑": "d"
        }
        self.md = [
            "色",
            "流感",
            "这边",
            "弱",
            "嘴唇",
            "亲",
            "开心",
            "呲牙",
            "憨笑",
            "猫",
            "皱眉",
            "幽灵",
            "蛋糕",
            "发怒",
            "大哭",
            "兔子",
            "星星",
            "钟情",
            "牵手",
            "公鸡",
            "爱意",
            "禁止",
            "狗",
            "亲亲",
            "叉",
            "礼物",
            "晕",
            "呆",
            "生病",
            "钻石",
            "拜",
            "怒",
            "示爱",
            "汗",
            "小鸡",
            "痛苦",
            "撇嘴",
            "惶恐",
            "口罩",
            "吐舌",
            "心碎",
            "生气",
            "可爱",
            "鬼脸",
            "跳舞",
            "男孩",
            "奸笑",
            "猪",
            "圈",
            "便便",
            "外星",
            "圣诞"
        ]
        self.p1 = self.mapping(["流泪", "强"])
        self.p2 = self.mapping(self.md)
        self.p3 = self.mapping(["爱心", "女孩", "惊恐", "大笑"])
        self.vi = b"0102030405060708"
        self.base_str = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
        self.session = requests.session()
        self.music_id = url.split('=')[1]
        self.api_url = "<https://music.163.com/weapi/comment/resource/comments/get?csrf_token=>"
        self.headers = {
            "User-Agent": str(Faker().user_agent()),
            "Referer": url,
            "Content-Type": "application/x-www-form-urlencoded",
            "Origin": "<http://music.163.com>",
            "Host": "music.163.com"
        }

    def mapping(self, list_):
        # 获取映射
        result = [self.emj[item] for item in list_]
        return ''.join(result)

    def gen_random_str(self, nums):
        # 生成随机16位字符
        c = ""
        for d in range(nums):
            e = int(random.random() * len(self.base_str))
            c += self.base_str[e]
        return c

    def rsa(self, a, b, c):
        # RSA加密
        num = pow(int(a[::-1].encode().hex(), 16), int(b, 16), int(c, 16))
        result = format(num, 'x')
        return result

    def aes(self, text, key, iv):
        # aes加密
        pad = 16 - len(text) % 16
        text = (text + pad * chr(pad)).encode("utf-8")
        encryptor = AES.new(key.encode('utf-8'), AES.MODE_CBC, iv)
        encrypt_text = encryptor.encrypt(text)
        encrypt_text = base64.b64encode(encrypt_text)
        return encrypt_text.decode('utf-8')

    def get_param(self, p4):
        # 获取加密参数
        rand_str = self.gen_random_str(16)
        encText = self.aes(self.aes(p4, self.p3, self.vi), rand_str, self.vi)
        encSecKey = self.rsa(rand_str, self.p1, self.p2)
        return {"params": encText, "encSecKey": encSecKey}

    async def save2redis(self, redis_conn, key, value):
        await redis_conn.hset(f'music_comment:{self.music_id}', key=key, value=value)

    async def async_start(self, page, session, semaphore, redis_conn):
        async with semaphore:
            # logger.info(f'========================正在获取第{page + 1}页评论========================')
            p4 = json.dumps(
                {"csrf_token": "990d60a1fa116061d77c53a71445b821", "cursor": int(time.time() * 1000),
                 "offset": str(page * 20), "orderType": "2",
                 "pageNo": str(page + 1), "pageSize": "20", "rid": f"R_SO_4_{self.music_id}",
                 "threadId": f"R_SO_4_{self.music_id}"})
            param = self.get_param(p4)
            try:
                async with session.post(self.api_url, headers=self.headers, data=param, ssl=False) as response:
                    data = await response.json()
                    mapping = []
                    for comment in data.get("data", {}).get("comments", {}):
                        nickname = comment.get("user", {}).get("nickname", "")
                        content = comment.get("content", "")
                        logger.info(f'user: {nickname}\tcomment: {content}')
                        mapping.append([nickname, content])
                        # await redis_conn.hset(f'music_comment:{self.music_id}', key=nickname, value=content)
                    tasks = [asyncio.create_task(self.save2redis(redis_conn, item[0], item[1])) for item in mapping]
                    await asyncio.wait(tasks)
            except Exception as e:
                logger.error(e)

    async def start(self):
        redis_conn = await aioredis.from_url('redis://localhost:6379/0')
        semaphore = asyncio.Semaphore(10)
        async with aiohttp.ClientSession() as session:
            tasks = [asyncio.create_task(self.async_start(page, session, semaphore, redis_conn)) for page in range(10)]
            await asyncio.wait(tasks)

if __name__ == '__main__':
    url = "<https://music.163.com/#/song?id=1359645265>"
    wyy = WyyHack(url)
    asyncio.run(wyy.start())