多进程运行任务解决方案

108 阅读2分钟

在使用 Python 的 multiprocessing 模块时,遇到一个问题,就是无法让多个进程并发运行。代码如下:

huake_00198_.jpg

def run_normalizers(config, debug, num_threads, name=None):

    def _run():
        print('Started process for normalizer')
        sqla_engine = init_sqla_from_config(config)
        image_vfs = create_s3vfs_from_config(config, config.AWS_S3_IMAGE_BUCKET)
        storage_vfs = create_s3vfs_from_config(config, config.AWS_S3_STORAGE_BUCKET)

        pp = PipedPiper(config, image_vfs, storage_vfs, debug=debug)

        if name:
            pp.run_pipeline_normalizers(name)
        else:
            pp.run_all_normalizers()
        print('Normalizer process complete')

    threads = []
    for i in range(num_threads):
        threads.append(multiprocessing.Process(target=_run))
    [t.start() for t in threads]
    [t.join() for t in threads]


run_normalizers(...)

问题在于,虽然代码创建了多个进程,但它们似乎没有并发运行,速度与单进程运行时差不多。

解决方案

经过重新架构代码,采用了任务队列和工作线程池的方式,实现了多进程并发运行。具体实现如下:

1. 定义任务类

class AbstractTask(object):
    """
        The base task
    """
    __metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def run_task(self):
        pass

2. 定义任务运行器类

class TaskRunner(object):

    def __init__(self, queue_size, num_threads=1, stop_on_exception=False):
        super(TaskRunner, self).__init__()
        self.queue              = Queue(queue_size)
        self.execute_tasks      = True
        self.stop_on_exception  = stop_on_exception

        # create a worker
        def _worker():
            while self.execute_tasks:

                # get a task
                task = None
                try:
                    task = self.queue.get(False, 1)
                except Empty:
                    continue

                # execute the task
                failed = True
                try:
                    task.run_task()
                    failed = False
                finally:
                    if failed and self.stop_on_exception:
                        print('Stopping due to exception')
                        self.execute_tasks = False
                    self.queue.task_done()

        # start threads
        for i in range(0, int(num_threads)):
            t = Thread(target=_worker)
            t.daemon = True
            t.start()


    def add_task(self, task, block=True, timeout=None):
        """
            Adds a task
        """
        if not self.execute_tasks:
            raise Exception('TaskRunner is not accepting tasks')
        self.queue.put(task, block, timeout)


    def wait_for_tasks(self):
        """
            Waits for tasks to complete
        """
        if not self.execute_tasks:
            raise Exception('TaskRunner is not accepting tasks')
        self.queue.join()

3. 在代码中使用任务运行器

# create a TaskRunner
task_runner = TaskRunner(queue_size=1000, num_threads=4)

# create tasks
tasks = []
for i in range(1000):
    task = MyTask()
    tasks.append(task)

# add tasks to the task runner
for task in tasks:
    task_runner.add_task(task)

# wait for tasks to complete
task_runner.wait_for_tasks()

通过这种方式,可以创建多个工作线程,并把任务添加到任务队列中,工作线程会从队列中获取任务并执行。这样就可以实现多进程并发运行。