14、🔀多线程与多进程:Python 并发你必须掌握的那点事

52 阅读2分钟

⚙️ 想让你的爬虫、下载器、脚本提速 5 倍?那你必须掌握 Python 中的线程(Thread)与进程(Process)。这篇带你写出实用的并发工具,不卷术语,直接上项目!


✅ 本文目标

  • 理解 threading vs multiprocessing 的适用场景
  • 实现一个并发下载器(多线程)和批量 CPU 密集处理(多进程)
  • 掌握进度控制、数据共享与任务管理

🧠 一、线程 & 进程的区别通俗解释

特性Thread(线程)Process(进程)
使用场景I/O 密集(如网络请求)CPU 密集(如图像处理)
内存空间共享内存独立内存
创建开销
GIL 限制受限制不受限制

🧵 二、threading 实战:多线程爬虫/下载器

import threading
import requests

urls = [
    "https://www.baidu.com",
    "https://juejin.cn",
    "https://zhihu.com",
    "https://python.org"
]

def fetch(url):
    print(f"🌐 开始抓取 {url}")
    resp = requests.get(url)
    print(f"✅ 完成 {url},长度:{len(resp.text)}")

threads = []

for u in urls:
    t = threading.Thread(target=fetch, args=(u,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print("🎉 所有任务完成")

image.png

🧠 三、multiprocessing 实战:批量处理 CPU 任务

比如计算斐波那契数列:

from multiprocessing import Pool

def fib(n):
    if n <= 2:
        return 1
    return fib(n-1) + fib(n-2)

if __name__ == "__main__":
    with Pool(4) as p:
        results = p.map(fib, [30, 31, 32, 33])
        print(results)

👉 输出:

image.png


🔄 四、使用 concurrent.futures 写法更优雅

from concurrent.futures import ThreadPoolExecutor

def fetch(url):
    print(f"抓取 {url}")
    return requests.get(url).status_code

urls = [
    "https://www.baidu.com",
    "https://www.zhihu.com",
    "https://juejin.cn"
]

with ThreadPoolExecutor(max_workers=3) as executor:
    results = executor.map(fetch, urls)

for code in results:
    print("返回状态码:", code)

image.png

🔐 五、线程锁与数据共享(避免“数据混乱”)

import threading

count = 0
lock = threading.Lock()

def add():
    global count
    for _ in range(100000):
        with lock:
            count += 1

threads = [threading.Thread(target=add) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()

print("最终 count =", count)

image.png

📦 项目实战:多线程批量图片下载器

import threading
import requests
import os

img_urls = [
    "https://picsum.photos/300/300",
    "https://picsum.photos/400/400",
    "https://picsum.photos/500/500"
]

os.makedirs("imgs", exist_ok=True)

def download_img(i, url):
    r = requests.get(url)
    with open(f"imgs/img_{i}.jpg", "wb") as f:
        f.write(r.content)
    print(f"✅ 下载完成:img_{i}.jpg")

threads = []
for i, u in enumerate(img_urls):
    t = threading.Thread(target=download_img, args=(i, u))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

image.png

💡 拓展挑战

  1. multiprocessing.Pool 写一个视频转码脚本(多核加速)
  2. 下载失败自动重试(结合 retryingtenacity
  3. 写一个线程池 + 队列的任务派发中心

🧠 总结一句话

threading 用来加速爬虫,multiprocessing 用来榨干 CPU,学会并发,才能写出既快又猛的 Python 工具。