How to run HPL/HPCG/IO500 in WSL
3. Compile HPL
-
Get HPL from www.netlib.org/benchmark/h….
wget https://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz -
Unpack file.
tar -xzvf hpl-2.3.tar.gz && cd hpl-2.3 -
Copy a sample configuration from the Setups folder.
cp setup/Make.Linux_PII_CBLAS ./ -
Edit the Make.Linux_PII_CBLAS file. You can use editors like vi/vim/gedit/nano, I'll use nano as an example.
nano Make.Linux_PII_CBLASAnd change the following lines:
TOPdir = $(HOME)/hpl-2.3 MPdir = /usr/lib/x86_64-linux-gnu/openmpi MPlib = $(MPdir)/lib/libmpi.so LAdir = /usr/lib/x86_64-linux-gnu/openblas-pthread LAlib = $(LAdir)/libopenblas.a $(LAdir)/libbblas.a CC = /usr/bin/mpicc LINKER = /usr/bin/gfortranFor nano,
Ctrl + oandEnterto save,Ctrl + xto exit.If you have other MPI or BLAS libraries installed, you also need to modify
MPdirMPlibandLAdirLAlibto the corresponding installation path and library files. -
Compile HPL,
-j8is the number of threads you want to open, a larger number will speed up compilation.make arch=Linux_PII_CBLAS -j8Waiting for the end of compilation, if everything is fine with your configuration, you can find
HPL.datandxhplunderbin/Linux_PII_CBLAS.If you don't find
xhplinbin, but can find intesting, you may have entered the wrong path. If you can't find even intesingdirectory, there may be a configuration problem, please check theMake.Linux_PII_CBLASfile. -
To run the sample HPL test, we use
mpirunto run it in multiple threads, the number after-npis the number of processes to run.cd bin/Linux_PII_CBLAS touch HPL.out mpirun -np 8 xhplYou will get a lot of output, just a small snippet here.
================================================================================ HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018 Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK Modified by Julien Langou, University of Colorado Denver ================================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 29 30 34 35 NB : 1 2 3 4 PMAP : Row-major process mapping P : 2 1 4 Q : 2 4 1 PFACT : Left Crout Right NBMIN : 2 4 NDIV : 2 RFACT : Left Crout Right BCAST : 1ring DEPTH : 0 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words -------------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual check will be computed: ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR00L2L2 29 1 2 2 0.00 1.9114e-02 HPL_pdgesv() start time HPL_pdgesv() end time -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 1.88218349e-02 ...... PASSED ... ... ... ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR00R2R4 35 4 4 1 0.00 5.9052e-01 HPL_pdgesv() start time HPL_pdgesv() end time -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 1.99396688e-02 ...... PASSED ================================================================================ Finished 864 tests with the following results: 864 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. -------------------------------------------------------------------------------- End of Tests. ================================================================================We need to focus on the
Gflopsvalue, my maximum value in the sample configuration is1.3211e+00.