Python异步编程实战:让代码跑得比AI还快

4 阅读3分钟

Python异步编程实战:让代码跑得比AI还快

本文为船长Talk公众号技术文章,同步发布CSDN和掘金

前言

最近船长在做一个数据采集项目,用传统的同步方式,跑了3个小时还没完。

换成异步协程后,同样的任务只用了18分钟。

今天就跟大家聊聊Python异步编程,让你也能体验什么叫"飞一般的感觉"。

1. 什么是异步编程?

先来科普一下概念。

同步编程:像排队买票一样,一个一个来,前面的人没买完,后面的人只能等着。

异步编程:像点外卖一样,点了单就可以干别的,外卖到了再取。

同步代码:1秒 + 2秒 + 3秒 = 6秒
异步代码:max(1秒, 2秒, 3秒) = 3

这就是异步的魅力:IO密集型任务(网络请求、文件读写)并行化

2. 同步 vs 异步代码对比

同步版本

import requests
import time

def fetch_data(url):
    """同步请求"""
    response = requests.get(url)
    return response.json()

def main():
    urls = [
        "https://api.example.com/data1",
        "https://api.example.com/data2", 
        "https://api.example.com/data3",
    ]
    
    start = time.time()
    results = []
    for url in urls:
        result = fetch_data(url)  # 一个一个来
        results.append(result)
    
    print(f"耗时: {time.time() - start:.2f}秒")
    return results

# 如果每个请求耗时2秒,3个请求 = 6秒

异步版本

import asyncio
import aiohttp  # 异步HTTP客户端
import time

async def fetch_data(session, url):
    """异步请求"""
    async with session.get(url) as response:
        return await response.json()

async def main():
    urls = [
        "https://api.example.com/data1",
        "https://api.example.com/data2",
        "https://api.example.com/data3",
    ]
    
    start = time.time()
    
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_data(session, url) for url in urls]
        results = await asyncio.gather(*tasks)  # 并行执行
    
    print(f"耗时: {time.time() - start:.2f}秒")
    return results

# 3个请求同时进行 = 2秒(最慢的那个)

# 运行方式
asyncio.run(main())

3. 实战案例:批量采集数据

import asyncio
import aiohttp
import pandas as pd
from typing import List, Dict

class AsyncDataCollector:
    """异步数据采集器"""
    
    def __init__(self, max_concurrent: int = 10):
        self.max_concurrent = max_concurrent  # 控制并发数
        self.semaphore = None
    
    async def fetch_stock_price(self, session, stock_code: str) -> Dict:
        """获取单只股票价格"""
        url = f"https://api.example.com/stock/{stock_code}"
        try:
            async with self.semaphore:  # 限流
                async with session.get(url, timeout=10) as response:
                    if response.status == 200:
                        data = await response.json()
                        return {
                            "code": stock_code,
                            "price": data.get("price", 0),
                            "change": data.get("change", 0)
                        }
        except Exception as e:
            print(f"获取{stock_code}失败: {e}")
        return {"code": stock_code, "price": None, "change": None}
    
    async def collect_all(self, stock_codes: List[str]) -> pd.DataFrame:
        """批量采集"""
        self.semaphore = asyncio.Semaphore(self.max_concurrent)
        
        async with aiohttp.ClientSession() as session:
            tasks = [
                self.fetch_stock_price(session, code) 
                for code in stock_codes
            ]
            results = await asyncio.gather(*tasks)
        
        # 转为DataFrame
        df = pd.DataFrame([r for r in results if r["price"] is not None])
        return df

# 使用示例
async def main():
    collector = AsyncDataCollector(max_concurrent=20)
    stock_codes = [f"00{i:04d}.SZ" for i in range(100)]  # 100只股票
    df = await collector.collect_all(stock_codes)
    print(f"成功采集{len(df)}只股票数据")

asyncio.run(main())

4. 性能对比

船长实测结果(采集100个API接口):

方式耗时效率提升
同步requests187秒1x
异步aiohttp9秒20x

20倍效率提升,香不香?

5. 常见问题与避坑

坑1:忘记await

# ❌ 错误
result = fetch_data(url)  # 这还是同步执行

# ✅ 正确
result = await fetch_data(url)

坑2:并发太高被限流

# ❌ 一次发1000个请求,可能被封IP
tasks = [fetch_data(url) for url in urls]

# ✅ 用Semaphore限流
semaphore = asyncio.Semaphore(10)  # 最多10个并发
async def bounded_fetch(url):
    async with semaphore:
        return await fetch_data(url)

坑3:混用同步库

# ❌ time.sleep() 会阻塞整个事件循环
await asyncio.sleep(2)  # ✅ 用这个

# ❌ requests.get() 是同步的
# ✅ 用 aiohttp 或 httpx

6. 适用场景

适合异步的场景:

  • 批量HTTP请求(爬虫、API调用)
  • 文件批量读写
  • 数据库批量操作
  • 并发IO任务

不适合异步的场景:

  • CPU密集型任务(图片处理、加密计算)→ 用multiprocessing
  • 简单的一次性请求 → 没必要

总结

# 一句话总结异步编程

# 同步:排队,一个个来
for item in items:
    result = sync_fetch(item)

# 异步:点外卖,一起点,到的先取
tasks = [async_fetch(item) for item in items]
results = await asyncio.gather(*tasks)

记住这个公式:IO密集型任务 → 异步 → 20倍效率提升


船长Talk - 专注技术实战,拒绝废话

📊 本文同步发布于CSDN和掘金,技术细节更完整