环境:CentOS7.9
Vmware pro 16.2.5
已安装parallel_studio_xe_2020_update4_cluster_edition 并配置好环境变量,参考我之前的文章
安装 lammps-3Mar2020.tar.gz
编译libfftw3xf_intel.a
在自己安装路径中~/intel/compilers_and_libraries_2020.4.304/linux/mkl/interfaces/fftw3xf 中编译:make libintel64 得到libfftw3xf_intel.a
解压安装
解压缩lammps,tar -xvf lammps-*****,进入lammps-3Mar20/src
依次执行
make yes-all
make no-lib
make -j n intel_cpu_intelmpi (n 为cpu核心数)
完成后得到:lmp_intel_cpu_intelmpi
测试
mkdir lammps_test && cd lammps_test
[root@mgt lammps_test]# cp -r /share/lammps-3Mar20/examples/shear . && cd shear/
[root@mgt shear]# mpirun -np 4 /share/lammps-3Mar20/src/lmp_intel_cpu_intelmpi < in.shear
-np 后面是线程数,根据自己硬件情况填
LAMMPS (3 Mar 2020)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (../comm.cpp:94)
using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 3.52 3.52 3.52
Created orthogonal box = (0 0 0) to (56.32 35.2 9.95606)
2 by 2 by 1 MPI processor grid
Lattice spacing in x,y,z = 3.52 4.97803 4.97803
Created 1912 atoms
create_atoms CPU = 0.00295113 secs
Reading potential file Ni_u3.eam with DATE: 2007-06-11
264 atoms in group lower
264 atoms in group upper
528 atoms in group boundary
1384 atoms in group mobile
Setting atom values ...
264 settings made for type
Setting atom values ...
264 settings made for type
WARNING: Temperature for thermo pressure is not for group all (../thermo.cpp:485)
Neighbor list info ...
update every 1 steps, delay 5 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 5.1
ghost atom cutoff = 5.1
binsize = 2.55, bins = 23 14 4
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair eam, perpetual
attributes: half, newton on
pair build: half/bin/atomonly/newton
stencil: half/bin/3d/newton
bin: standard
Setting up Verlet run ...
Unit style : metal
Current step : 0
Time step : 0.001
Per MPI rank memory allocation (min/avg/max) = 3.372 | 3.372 | 3.372 Mbytes
Step Temp E_pair E_mol TotEng Press Volume
0 300 -8317.4367 0 -8263.8067 -7100.7667 19547.02
25 219.81848 -8272.1577 0 -8232.8615 5206.8057 19547.02
50 300 -8238.3413 0 -8184.7112 13308.809 19688.933
75 294.78636 -8232.2217 0 -8179.5237 13192.782 19748.176
100 300 -8248.1223 0 -8194.4923 7352.0246 19816.321
Loop time of 0.0675873 on 4 procs for 100 steps with 1912 atoms
Performance: 127.835 ns/day, 0.188 hours/ns, 1479.568 timesteps/s
90.3% CPU use with 4 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 0.056151 | 0.057039 | 0.058211 | 0.3 | 84.39
Neigh | 0.0018907 | 0.001902 | 0.0019073 | 0.0 | 2.81
Comm | 0.0053503 | 0.0066736 | 0.0077041 | 1.1 | 9.87
Output | 9.3606e-05 | 0.0001099 | 0.00014841 | 0.0 | 0.16
Modify | 0.00062962 | 0.00069847 | 0.00081757 | 0.0 | 1.03
Other | | 0.001165 | | | 1.72
Nlocal: 478 ave 490 max 466 min
Histogram: 1 0 1 0 0 0 0 1 0 1
Nghost: 1036.25 ave 1046 max 1027 min
Histogram: 1 1 0 0 0 0 0 1 0 1
Neighs: 11488 ave 11948 max 11157 min
Histogram: 1 0 1 0 1 0 0 0 0 1
Total # of neighbors = 45952
Ave neighs/atom = 24.0335
Neighbor list builds = 4
Dangerous builds = 0
WARNING: Temperature for thermo pressure is not for group all (../thermo.cpp:485)
Setting up Verlet run ...
Unit style : metal
Current step : 0
Time step : 0.001
Per MPI rank memory allocation (min/avg/max) = 3.372 | 3.372 | 3.372 Mbytes
Step Temp E_pair E_mol TotEng Press Volume
0 302.29407 -8248.1223 0 -8212.0956 6393.6774 19845.81
100 291.61298 -8259.5472 0 -8224.7933 -1300.9229 19874.36
200 293.36405 -8256.9998 0 -8222.0373 -799.49219 19965.148
300 305.94188 -8252.9181 0 -8216.4566 -1335.0012 20062.063
400 309.95918 -8247.5756 0 -8210.6354 -1062.2448 20094.446
500 301.94062 -8239.3596 0 -8203.375 797.08496 20172.635
600 302.21507 -8230.7027 0 -8194.6854 3987.1988 20265.23
700 296.32595 -8221.2036 0 -8185.8881 5409.7911 20394.703
800 291.23487 -8207.8671 0 -8173.1583 10667.09 20510.74
900 297.88948 -8196.1164 0 -8160.6146 13967.96 20646.32
1000 301.54921 -8182.0007 0 -8146.0627 17939.885 20752.586
1100 308.95153 -8164.9247 0 -8128.1046 22823.971 20889.388
1200 301.95399 -8153.476 0 -8117.4898 25618.698 21000.539
1300 300 -8143.3818 0 -8107.6284 26668.263 21122.684
1400 300 -8136.2928 0 -8100.5395 26328.325 21252.157
1500 300 -8132.5465 0 -8096.7931 23584.447 21379.187
1600 300 -8129.9298 0 -8094.1764 20684.486 21497.667
1700 300 -8131.655 0 -8095.9016 15384.272 21617.369
1800 300 -8149.3135 0 -8113.5601 9698.7054 21738.292
1900 300 -8156.1776 0 -8120.4243 9887.2669 21861.658
2000 300 -8161.9857 0 -8126.2324 8382.4517 21988.688
2100 300 -8163.9644 0 -8128.211 5288.1872 22107.168
2200 309.9432 -8171.1806 0 -8134.2422 331.97612 22234.198
2300 300 -8173.679 0 -8137.9256 -2756.1784 22346.571
2400 300 -8183.2429 0 -8147.4895 -6494.1612 22472.38
2500 309.13407 -8186.7918 0 -8149.9499 -8827.4368 22599.41
2600 299.71761 -8177.7445 0 -8142.0248 -7906.1647 22721.555
2700 300 -8174.4672 0 -8138.7138 -8920.5441 22832.706
2800 306.09492 -8173.4147 0 -8136.935 -10981.226 22960.958
2900 303.27397 -8168.2141 0 -8132.0706 -8905.5017 23078.216
3000 301.48023 -8165.8151 0 -8129.8854 -10668.385 23201.582
Loop time of 2.17941 on 4 procs for 3000 steps with 1912 atoms
Performance: 118.931 ns/day, 0.202 hours/ns, 1376.521 timesteps/s
94.2% CPU use with 4 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 1.8085 | 1.8659 | 1.9289 | 3.1 | 85.61
Neigh | 0.11359 | 0.11912 | 0.12499 | 1.2 | 5.47
Comm | 0.079575 | 0.15125 | 0.21504 | 12.4 | 6.94
Output | 0.00070049 | 0.00082434 | 0.0011907 | 0.0 | 0.04
Modify | 0.019464 | 0.01998 | 0.020798 | 0.4 | 0.92
Other | | 0.02237 | | | 1.03
Nlocal: 478 ave 509 max 446 min
Histogram: 2 0 0 0 0 0 0 0 0 2
Nghost: 1009.5 ave 1054 max 963 min
Histogram: 2 0 0 0 0 0 0 0 0 2
Neighs: 11210.5 ave 12215 max 10197 min
Histogram: 1 0 1 0 0 0 0 1 0 1
Total # of neighbors = 44842
Ave neighs/atom = 23.4529
Neighbor list builds = 225
Dangerous builds = 0
Total wall time: 0:00:02
测试用脚本
#!/bin/sh
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --ntasks-per-node=1
#SBATCH --partition=normal
#SBATCH --output=%j.out
#SBATCH --error=%j.err
#source /share/intel/parallel_studio_xe_2020.4.912/bin/psxevars.sh intel64
#默认已经在系统变量中配置好了可以选择性注释或解除没有影响
export PATH=/share/lammps-3Mar20/src:$PATH
mpirun -np $SLURM_NTASKS /share/lammps-3Mar20/src/lmp_intel_cpu_intelmpi < in.shear