背景:
最近的项目涉及到大量的数据写入mysql, 之前的单线程方法估算了下写入完成需要1个月+的时间,按照此方法我们估计得下岗
解决:
软件层面: 无非也就是多线程、多进程、多协程、优化代码、sql优化;现在我们只谈进程和线程
此次我采用的办法是开启的多线程 下面是核心代码展示core_inset就是我们项目的replace into 核心函数
采用的库 ThreadPoolExecutor
结果: 2小时搞定
# 防止锁的问题我们在传递参数的时候给他解决掉, 不同的线程使用不同的参数,以下的代码是我们程序拎出来做了部分简化
from concurrent.futures.thread import ThreadPoolExecutor
def core_inset(n):
print(n)
return n
file_names_father = list(range(100))
# 线程总数
T_COUNT = 5
pool = ThreadPoolExecutor(T_COUNT)
# addend = len(file_names_father) // T_COUNT
count_ = 0
count_1 = 0
while count_1 <= T_COUNT:
if count_1 == T_COUNT:
pool.submit(core_inset, file_names_father[count_:])
count_1 += 1
else:
pool.submit(core_inset, file_names_father[count_: count_ + T_COUNT])
count_1 += 1
count_ += T_COUNT
# 多进程
from concurrent.futures.process import ProcessPoolExecutor
def core_inset(n):
print(n)
return n
file_names_father = list(range(100))
T_COUNT = 5
# pool = ThreadPoolExecutor(T_COUNT)
pool = ProcessPoolExecutor()
# addend = len(file_names_father) // T_COUNT
count_ = 0
count_1 = 0
if __name__ == '__main__':
while count_1 <= T_COUNT:
if count_1 == T_COUNT:
pool.submit(core_inset, file_names_father[count_:])
count_1 += 1
else:
pool.submit(core_inset, file_names_father[count_: count_ + T_COUNT])
count_1 += 1
count_ += T_COUNT
备注: 多进程在windows下需要放到main函数下面 文档的解释
"For an explanation of why (on Windows) the if __name__ == '__main__'
part is necessary, see Programming guidelines."
# 我的理解:避免递归创建子进程
Safe importing of main module
> Make sure that the main module can be safely imported by a new Python interpreter without causing
# 进程不放在main函数下报错信息
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.