How to run HPL/HPCG/IO500 in WSL (3) | 青训营笔记

136 阅读3分钟

How to run HPL/HPCG/IO500 in WSL

4. Performance Tuning.

0. It is a constant process of experimentation and requires patience.

1. Adjust the HPL.dat file

You can refer www.netlib.org/benchmark/h… to adjust the HPL.dat file, or generate one directly from www.advancedclustering.com/act_kb/tune….

InputNum
Nodes1
Cores per Node1
Memory per Node (MB)512
Block Size (NB)192

Output:

 HPLinpack benchmark input file
 Innovative Computing Laboratory, University of Tennessee
 HPL.out      output file name (if any) 
 6            device out (6=stdout,7=stderr,file)
 1            # of problems sizes (N)
 7296         Ns
 1            # of NBs
 192          NBs
 0            PMAP process mapping (0=Row-,1=Column-major)
 1            # of process grids (P x Q)
 1            Ps
 1            Qs
 16.0         threshold
 1            # of panel fact
 2            PFACTs (0=left, 1=Crout, 2=Right)
 1            # of recursive stopping criterium
 4            NBMINs (>= 1)
 1            # of panels in recursion
 2            NDIVs
 1            # of recursive panel fact.
 1            RFACTs (0=left, 1=Crout, 2=Right)
 1            # of broadcast
 1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
 1            # of lookahead depth
 1            DEPTHs (>=0)
 2            SWAP (0=bin-exch,1=long,2=mix)
 64           swapping threshold
 0            L1 in (0=transposed,1=no-transposed) form
 0            U  in (0=transposed,1=no-transposed) form
 1            Equilibration (0=no,1=yes)
 8            memory alignment in double (> 0)
 ##### This line (no. 32) is ignored (it serves as a separator). ######
 0                               Number of additional problem sizes for PTRANS
 1200 10000 30000                values of N
 0                               number of additional blocking sizes for PTRANS
 40 9 8 13 13 20 16 32 64        values of NB

Modify HPL.dat to the above value and run xhpl again.

 mpirun -np 4 xhpl

Output (partial):

 ================================================================================
 HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
 Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
 Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
 Modified by Julien Langou, University of Colorado Denver
 ================================================================================
 ​
 An explanation of the input/output parameters follows:
 T/V    : Wall time / encoded variant.
 N      : The order of the coefficient matrix A.
 NB     : The partitioning blocking factor.
 P      : The number of process rows.
 Q      : The number of process columns.
 Time   : Time in seconds to solve the linear system.
 Gflops : Rate of execution for solving the linear system.
 ​
 The following parameter values will be used:
 ​
 N      :    7296
 NB     :     192
 PMAP   : Row-major process mapping
 P      :       1
 Q      :       1
 PFACT  :   Right
 NBMIN  :       4
 NDIV   :       2
 RFACT  :   Crout
 BCAST  :  1ringM
 DEPTH  :       1
 SWAP   : Mix (threshold = 64)
 L1     : transposed form
 U      : transposed form
 EQUIL  : yes
 ALIGN  : 8 double precision words
 ​
 --------------------------------------------------------------------------------
 ​
 - The matrix A is randomly generated for each test.
 - The following scaled residual check will be computed:
       ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
 - The relative machine precision (eps) is taken to be               1.110223e-16
 - Computational tests pass if scaled residuals are less than                16.0
 ​
 ================================================================================
 T/V                N    NB     P     Q               Time                 Gflops
 --------------------------------------------------------------------------------
 WR11C2R4        7296   192     1     1               1.82             1.4243e+02
 HPL_pdgesv() start time Fri May 19 23:52:43 2023
 ​
 HPL_pdgesv() end time   Fri May 19 23:52:45 2023
 ​
 --------------------------------------------------------------------------------
 ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.98844764e-03 ...... PASSED
 ================================================================================
 ​
 Finished      1 tests with the following results:
               1 tests completed and passed residual checks,
               0 tests completed and failed residual checks,
               0 tests skipped because of illegal input values.
 --------------------------------------------------------------------------------
 ​
 End of Tests.
 ================================================================================

You can see that the score has improved by 10781%.

Then I modify these:

 file         device out (6=stdout,7=stderr,file)
 2            # of problems sizes (N)
 16384 20352  Ns
 0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
 2 8          NBMINs (>= 1)
 0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
 3 2          BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)

Now run it again and get the output:

 ================================================================================
 T/V                N    NB     P     Q               Time                 Gflops
 --------------------------------------------------------------------------------
 WR13L2L2       16384   192     1     1              10.85             2.7028e+02
 --------------------------------------------------------------------------------
 ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.48156500e-03 ...... PASSED
 ================================================================================
 T/V                N    NB     P     Q               Time                 Gflops
 --------------------------------------------------------------------------------
 WR13L2L2       20352   192     1     1              20.03             2.8064e+02
 --------------------------------------------------------------------------------
 ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   1.90299920e-03 ...... PASSED
 ================================================================================

You can see that the score has improved by 22253% than the first. The result of 2.8064e+02 Gflops can enter the Top500 in 2003.06.

image-20230519231327395-1685235555203-1-1685235558434-3.png