使用 itertools.combinations 解决问题的最快速方法

173 阅读2分钟

需要加速以下函数:

import numpy as np
import itertools
import timeit

def combcol(myarr):
    ndims = myarr.shape[0]
    solutions = []
    http://www.jshk.com.cn/mb/reg.asp?kefu=xiaoding;//爬虫IP免费获取;
    for idx1, idx2, idx3, idx4, idx5, idx6 in itertools.combinations(np.arange(ndims), 6):
        c1, c2, c3, c4, c5, c6 = myarr[idx1,1], myarr[idx2,2], myarr[idx3,1], myarr[idx4,2], myarr[idx5,1], myarr[idx6,2]
        if c1-c2>0 and c2-c3<0 and c3-c4>0 and c4-c5<0 and c5-c6>0 :
            solutions.append(((idx1, idx2, idx3, idx4, idx5, idx6),(c1, c2, c3, c4, c5, c6)))
    return solutions

2、解决方案

方法1:使用 NumPy

可以使用 NumPy 的广播机制来加速计算。具体方法是将 myarr[:,1]myarr[:,2] 分别转换为一维数组,然后使用 NumPy 的比较运算符进行比较。这样可以避免使用循环,从而提高计算速度。

def combcol2(myarr):
    ndims = myarr.shape[0]
    myarr1 = myarr[:,1].tolist()
    myarr2 = myarr[:,2].tolist()
    solutions = []
    for idx1, idx2, idx3, idx4, idx5, idx6 in itertools.combinations(range(ndims), 6):
        if myarr1[idx1] > myarr2[idx2] < myarr1[idx3] > myarr2[idx4] < myarr1[idx5] > myarr2[idx6]:
            solutions.append(((idx1, idx2, idx3, idx4, idx5, idx6),(myarr1[idx1], myarr2[idx2], myarr1[idx3], myarr2[idx4], myarr1[idx5], myarr2[idx6])))
    return solutions

方法2:使用 Cython

可以使用 Cython 将 Python 代码编译成 C 代码,从而提高执行速度。Cython 是一种将 Python 代码编译为 C 代码的编程语言。它允许在 Python 中使用 C 语言的语法和数据类型,从而可以大大提高 Python 代码的执行速度。

import cython
@cython.cfunc
def combcol3(myarr):
    ndims = myarr.shape[0]
    myarr1 = myarr[:,1].tolist()
    myarr2 = myarr[:,2].tolist()
    solutions = []
    for idx1, idx2, idx3, idx4, idx5, idx6 in itertools.combinations(range(ndims), 6):
        if myarr1[idx1] > myarr2[idx2] < myarr1[idx3] > myarr2[idx4] < myarr1[idx5] > myarr2[idx6]:
            solutions.append(((idx1, idx2, idx3, idx4, idx5, idx6),(myarr1[idx1], myarr2[idx2], myarr1[idx3], myarr2[idx4], 
                                                                     myarr1[idx5], myarr2[idx6])))
    return solutions

方法3:使用并行计算

可以使用并行计算来加快计算速度。并行计算是指将一个任务分解成多个子任务,然后在多个处理器上同时执行这些子任务。这样可以大大缩短计算时间。

from joblib import Parallel, delayed

def combcol4(myarr):
    ndims = myarr.shape[0]
    myarr1 = myarr[:,1].tolist()
    myarr2 = myarr[:,2].tolist()
    solutions = []
    for idx1, idx2, idx3, idx4, idx5, idx6 in itertools.combinations(range(ndims), 6):
        if myarr1[idx1] > myarr2[idx2] < myarr1[idx3] > myarr2[idx4] < myarr1[idx5] > myarr2[idx6]:
            solutions.append(((idx1, idx2, idx3, idx4, idx5, idx6),(myarr1[idx1], myarr2[idx2], myarr1[idx3], myarr2[idx4], 
                                                                     myarr1[idx5], myarr2[idx6])))
    return solutions

if __name__ == "__main__":
    Parallel(n_jobs=-1)(delayed(combcol4)(myarr) for myarr in X)