多进程

`Process`

Python中，使用multiprocessing模块实现多进程。其中Process是一个进程类，可实例化一个进程对象。

from multiprocessing import Process
import os

# 子进程的代码
def run_proc(name):
    print('Run child process %s (%s)...' % (name, os.getpid()))


if __name__=='__main__':
    print('Parent process %s.' % os.getpid())  # 打印当前进程 pid

    # 实例化一个进程，注意传入的参数为一个元组，当只有一个参数时，写成(arg,)
    p = Process(target=run_proc, args=('test',))

    print('Child process will start.')
    p.start()
    p.join()  # join()方法可以等待子进程结束后再继续往下运行，通常用于进程间的同步。
    print('Child process end.')


'''
执行结果如下：
Parent process 928.
Child process will start.
Run child process test (929)...
Process end.
'''

进程池

当启动大量子进程时，为了防止系统资源占用过高，需采用进程池的方式，循环利用进程实现多任务。

from multiprocessing import Pool
import os, time, random

def long_time_task(name):
    print('Run task %s (%s)...' % (name, os.getpid()))
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print('Task %s runs %0.2f seconds.' % (name, (end - start)))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())

    p = Pool(4)  # 创建进程池，参数为进程池中允许存在进程的最多个数

    for i in range(5):
        p.apply_async(long_time_task, args=(i,))  # 注意 apply_async：异步非阻塞
    print('Waiting for all subprocesses done...')
    p.close()
    p.join()
    print('All subprocesses done.')

'''
执行结果如下：

Parent process 669.
Waiting for all subprocesses done...
Run task 0 (671)...
Run task 1 (672)...
Run task 2 (673)...
Run task 3 (674)...
Task 2 runs 0.14 seconds.
Run task 4 (673)...
Task 1 runs 0.27 seconds.
Task 3 runs 0.86 seconds.
Task 0 runs 1.41 seconds.
Task 4 runs 1.91 seconds.
All subprocesses done.
'''

join()方法会等待所有子进程执行完毕，再执行下面的代码。

调用join()之前必须先调用close()。

调用close()之后就不能继续在进程池中添加新的Process。

由于创建进程池对象时传进的参数为4，因此task 0，1，2，3立刻执行，task 4等待前面某个task完成后才执行，最多同时执行4个进程。

Pool的默认大小是CPU的核数。

多线程

线程是操作系统直接支持的执行单元。

Python中，由于存在GIL锁：Global Interpreter Lock，任何线程执行前，必须先获得GIL锁，然后，每执行100条字节码，解释器就自动释放GIL锁，让别的线程有机会执行。这个GIL全局锁实际上把所有线程的执行代码都给上了锁，因此多线程在Python中只能交替执行，并不能加快效率。

Threading

Threading模块可实现Python中的多线程。

import time, threading

# 新线程执行的代码:
def loop():
    print('thread %s is running...' % threading.current_thread().name)
    n = 0
    while n < 5:
        n = n + 1
        print('thread %s >>> %s' % (threading.current_thread().name, n))
        time.sleep(1)
    print('thread %s ended.' % threading.current_thread().name)

print('thread %s is running...' % threading.current_thread().name)
t = threading.Thread(target=loop, name='LoopThread')
t.start()
t.join()
print('thread %s ended.' % threading.current_thread().name)


'''
执行结果如下：

thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.
'''

由于任何进程默认就会启动一个线程，我们把该线程称为主线程，主线程又可以启动新的线程。名字仅仅用来显示，没有其他意义。

Lock

多线程中，所有变量都由所有线程共享，因此多个线程同时对一个变量进行修改，容易造成混乱。

Python中threading模块的Lock类，可以给线程中的部分代码加锁，加锁部分运行时，其中的变量不会被其他线程更改。

import threading
lock = threading.Lock()

def run_thread(n):
    for i in range(100000):
        # 先要获取锁:
        lock.acquire()
        try:
            # 放心地改吧:
            change_it(n)
        finally:
            # 改完了一定要释放锁:
            lock.release()

协程

利用了程序中的延时时间，实际上还是单线程，但是不断的切换所执行的代码块。`

Python中可以使用 gevent 模块实现协程。

import time
import gevent
from gevent import monkey

# 打个补丁：将程序中的延时代码，换为gevent中的
monkey.patch_all()

def continue_work(name):
    for i in range(5):
        print(name, i)
        time.sleep(0.5)

gevent.joinall([
    gevent.spawn(continue_work, 'work_1')  # (函数名, 传入函数的参数)
    gevent.spawn(continue_work, 'work_2')
])

进程、线程、协程对比

进程是资源分配的单位，线程是操作系统调度的单位
进程切换时，需要创建一份资源，效率较低
线程切换需要的资源少，但效率一般
协程切换任务效率高，利用了线程延时等待的时间（因此，在实际处理时率先考虑使用协程实现多任务）
多线程和多进程有可能是并行的，但协程一定是并发的

Python中多任务的实现

多进程

Process

进程池

多线程

Threading

Lock

协程

进程、线程、协程对比

`Process`