Python多线程vs多进程：并发编程选型指南GIL：绕不开的话题 Python的全局解释器锁（GIL）保证同一时刻只有

摘要：Python并发编程有三种主要方式：多线程、多进程、异步IO。GIL的存在让选型变得微妙。本文用实际基准测试数据，帮你在不同场景下做出正确选择。

GIL：绕不开的话题

Python的全局解释器锁（GIL）保证同一时刻只有一个线程执行Python字节码。这意味着：

CPU密集型任务：多线程几乎没有加速效果
IO密集型任务：多线程依然有效（等待IO时会释放GIL）

import threading
import time

# CPU密集型：多线程反而更慢
def cpu_task():
    total = 0
    for i in range(10_000_000):
        total += i

# 单线程
start = time.time()
cpu_task()
cpu_task()
print(f'单线程: {time.time() - start:.2f}s')  # ~1.8s

# 双线程
start = time.time()
t1 = threading.Thread(target=cpu_task)
t2 = threading.Thread(target=cpu_task)
t1.start(); t2.start()
t1.join(); t2.join()
print(f'双线程: {time.time() - start:.2f}s')  # ~2.0s（更慢！）

三种并发方式对比

多线程（threading）

import threading
import requests
from concurrent.futures import ThreadPoolExecutor

def download(url):
    resp = requests.get(url, timeout=10)
    return len(resp.content)

urls = ['https://httpbin.org/delay/1'] * 10

# 方式1：手动管理线程
threads = []
for url in urls:
    t = threading.Thread(target=download, args=(url,))
    threads.append(t)
    t.start()
for t in threads:
    t.join()

# 方式2：线程池（推荐）
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(download, urls))

多进程（multiprocessing）

from multiprocessing import Pool
from concurrent.futures import ProcessPoolExecutor

def heavy_compute(n):
    """CPU密集型任务"""
    total = 0
    for i in range(n):
        total += i ** 2
    return total

numbers = [10_000_000] * 8

# 方式1：进程池
with Pool(processes=4) as pool:
    results = pool.map(heavy_compute, numbers)

# 方式2：ProcessPoolExecutor（接口统一）
with ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(heavy_compute, numbers))

异步IO（asyncio）

import asyncio
import aiohttp

async def download(session, url):
    async with session.get(url) as resp:
        return len(await resp.read())

async def main():
    urls = ['https://httpbin.org/delay/1'] * 10
    async with aiohttp.ClientSession() as session:
        tasks = [download(session, url) for url in urls]
        results = await asyncio.gather(*tasks)
    return results

results = asyncio.run(main())

基准测试

IO密集型：下载100个网页

import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import asyncio
import aiohttp
import requests

urls = [f'https://httpbin.org/delay/0.1'] * 100

# 串行
start = time.time()
for url in urls:
    requests.get(url)
serial_time = time.time() - start

# 多线程
start = time.time()
with ThreadPoolExecutor(20) as ex:
    list(ex.map(lambda u: requests.get(u), urls))
thread_time = time.time() - start

# 异步
async def async_test():
    async with aiohttp.ClientSession() as s:
        tasks = [s.get(u) for u in urls]
        return await asyncio.gather(*tasks)

start = time.time()
asyncio.run(async_test())
async_time = time.time() - start

方式	耗时	加速比
串行	12.5s	1x
多线程(20)	0.8s	15x
异步IO(100)	0.3s	40x
多进程(4)	3.5s	3.5x

IO密集型场景：异步 > 多线程 >> 多进程 > 串行

CPU密集型：计算100个大数阶乘

import math

def compute(n):
    return len(str(math.factorial(n)))

numbers = [100000] * 8

方式	耗时	加速比
串行	16.2s	1x
多线程(4)	16.8s	0.96x（更慢）
多进程(4)	4.5s	3.6x

CPU密集型场景：多进程 >> 串行 ≈ 多线程

选型决策树

你的任务是什么类型？
├── IO密集型（网络请求、文件读写、数据库查询）
│   ├── 需要极致性能 → asyncio + aiohttp
│   ├── 代码简单优先 → ThreadPoolExecutor
│   └── 已有同步代码不想改 → ThreadPoolExecutor
│
├── CPU密集型（数学计算、图像处理、数据分析）
│   ├── 纯Python计算 → multiprocessing
│   ├── NumPy/Pandas → 它们内部已释放GIL，多线程也行
│   └── 可以用C扩展 → 释放GIL的C扩展 + 多线程
│
└── 混合型
    └── 多进程 + 每个进程内用异步IO

实战：混合型任务

爬虫就是典型的混合型——下载是IO密集，解析是CPU密集：

import asyncio
import aiohttp
from concurrent.futures import ProcessPoolExecutor
from bs4 import BeautifulSoup

def parse_html(html):
    """CPU密集：解析HTML（在子进程中执行）"""
    soup = BeautifulSoup(html, 'lxml')
    return {
        'title': soup.title.string if soup.title else '',
        'links': len(soup.find_all('a')),
        'text_length': len(soup.get_text()),
    }

async def fetch(session, url):
    """IO密集：下载页面（异步）"""
    async with session.get(url) as resp:
        return await resp.text()

async def main(urls):
    # 进程池用于CPU密集的解析
    process_pool = ProcessPoolExecutor(max_workers=4)
    loop = asyncio.get_event_loop()
    
    async with aiohttp.ClientSession() as session:
        # 异步下载所有页面
        htmls = await asyncio.gather(*[fetch(session, url) for url in urls])
        
        # 用进程池并行解析
        tasks = [
            loop.run_in_executor(process_pool, parse_html, html)
            for html in htmls
        ]
        results = await asyncio.gather(*tasks)
    
    process_pool.shutdown()
    return results

线程安全

多线程共享内存，要注意数据竞争：

import threading

# ❌ 不安全
counter = 0
def increment():
    global counter
    for _ in range(100000):
        counter += 1  # 非原子操作！

# ✅ 用锁保护
lock = threading.Lock()
counter = 0
def safe_increment():
    global counter
    for _ in range(100000):
        with lock:
            counter += 1

# ✅ 更好的方式：用Queue避免共享状态
from queue import Queue

def worker(q, results):
    while True:
        item = q.get()
        if item is None:
            break
        result = process(item)
        results.put(result)
        q.task_done()

总结

IO密集型 → 首选asyncio，次选多线程
CPU密集型 → 多进程是唯一选择（GIL限制）
混合型 → 多进程 + 异步IO
简单场景用concurrent.futures，接口统一，切换方便
多线程注意线程安全，能用Queue就别用Lock

选对并发模型，性能提升是数量级的。选错了，不仅没加速，还增加了复杂度。