14、🔀多线程与多进程：Python 并发你必须掌握的那点事✅ 本文目标理解 threading vs multip

⚙️ 想让你的爬虫、下载器、脚本提速 5 倍？那你必须掌握 Python 中的线程（Thread）与进程（Process）。这篇带你写出实用的并发工具，不卷术语，直接上项目！

✅ 本文目标

理解 threading vs multiprocessing 的适用场景
实现一个并发下载器（多线程）和批量 CPU 密集处理（多进程）
掌握进度控制、数据共享与任务管理

🧠 一、线程 & 进程的区别通俗解释

特性	Thread（线程）	Process（进程）
使用场景	I/O 密集（如网络请求）	CPU 密集（如图像处理）
内存空间	共享内存	独立内存
创建开销	小	大
GIL 限制	受限制	不受限制

🧵 二、threading 实战：多线程爬虫/下载器

import threading
import requests

urls = [
    "https://www.baidu.com",
    "https://juejin.cn",
    "https://zhihu.com",
    "https://python.org"
]

def fetch(url):
    print(f"🌐 开始抓取 {url}")
    resp = requests.get(url)
    print(f"✅ 完成 {url}，长度：{len(resp.text)}")

threads = []

for u in urls:
    t = threading.Thread(target=fetch, args=(u,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print("🎉 所有任务完成")

🧠 三、multiprocessing 实战：批量处理 CPU 任务

比如计算斐波那契数列：

from multiprocessing import Pool

def fib(n):
    if n <= 2:
        return 1
    return fib(n-1) + fib(n-2)

if __name__ == "__main__":
    with Pool(4) as p:
        results = p.map(fib, [30, 31, 32, 33])
        print(results)

👉 输出：

🔄 四、使用 concurrent.futures 写法更优雅

from concurrent.futures import ThreadPoolExecutor

def fetch(url):
    print(f"抓取 {url}")
    return requests.get(url).status_code

urls = [
    "https://www.baidu.com",
    "https://www.zhihu.com",
    "https://juejin.cn"
]

with ThreadPoolExecutor(max_workers=3) as executor:
    results = executor.map(fetch, urls)

for code in results:
    print("返回状态码：", code)

🔐 五、线程锁与数据共享（避免“数据混乱”）

import threading

count = 0
lock = threading.Lock()

def add():
    global count
    for _ in range(100000):
        with lock:
            count += 1

threads = [threading.Thread(target=add) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()

print("最终 count =", count)

📦 项目实战：多线程批量图片下载器

import threading
import requests
import os

img_urls = [
    "https://picsum.photos/300/300",
    "https://picsum.photos/400/400",
    "https://picsum.photos/500/500"
]

os.makedirs("imgs", exist_ok=True)

def download_img(i, url):
    r = requests.get(url)
    with open(f"imgs/img_{i}.jpg", "wb") as f:
        f.write(r.content)
    print(f"✅ 下载完成：img_{i}.jpg")

threads = []
for i, u in enumerate(img_urls):
    t = threading.Thread(target=download_img, args=(i, u))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

💡 拓展挑战

用 multiprocessing.Pool 写一个视频转码脚本（多核加速）
下载失败自动重试（结合 retrying 或 tenacity）
写一个线程池 + 队列的任务派发中心

🧠 总结一句话

threading 用来加速爬虫，multiprocessing 用来榨干 CPU，学会并发，才能写出既快又猛的 Python 工具。