How to run HPL/HPCG/IO500 in WSL (2) | 青训营笔记

203 阅读2分钟

How to run HPL/HPCG/IO500 in WSL

3. Compile HPL

  1. Get HPL from www.netlib.org/benchmark/h….

    wget https://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
    
  2. Unpack file.

    tar -xzvf hpl-2.3.tar.gz && cd hpl-2.3
    
  3. Copy a sample configuration from the Setups folder.

    cp setup/Make.Linux_PII_CBLAS ./
    
  4. Edit the Make.Linux_PII_CBLAS file. You can use editors like vi/vim/gedit/nano, I'll use nano as an example.

    nano Make.Linux_PII_CBLAS
    

    And change the following lines:

    TOPdir       = $(HOME)/hpl-2.3
    
    MPdir        = /usr/lib/x86_64-linux-gnu/openmpi
    MPlib        = $(MPdir)/lib/libmpi.so
    
    LAdir        = /usr/lib/x86_64-linux-gnu/openblas-pthread
    LAlib        = $(LAdir)/libopenblas.a $(LAdir)/libbblas.a
    
    CC           = /usr/bin/mpicc
    LINKER       = /usr/bin/gfortran
    

    For nano, Ctrl + o and Enter to save, Ctrl + x to exit.

    If you have other MPI or BLAS libraries installed, you also need to modify MPdir MPlib and LAdir LAlib to the corresponding installation path and library files.

  5. Compile HPL, -j8 is the number of threads you want to open, a larger number will speed up compilation.

    make arch=Linux_PII_CBLAS -j8
    

    Waiting for the end of compilation, if everything is fine with your configuration, you can find HPL.dat and xhpl under bin/Linux_PII_CBLAS.

    If you don't find xhpl in bin, but can find in testing, you may have entered the wrong path. If you can't find even in tesing directory, there may be a configuration problem, please check the Make.Linux_PII_CBLAS file.

  6. To run the sample HPL test, we use mpirun to run it in multiple threads, the number after -np is the number of processes to run.

    cd bin/Linux_PII_CBLAS
    touch HPL.out
    mpirun -np 8 xhpl
    

    You will get a lot of output, just a small snippet here.

    ================================================================================
    HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
    Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
    Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
    Modified by Julien Langou, University of Colorado Denver
    ================================================================================
    
    An explanation of the input/output parameters follows:
    T/V    : Wall time / encoded variant.
    N      : The order of the coefficient matrix A.
    NB     : The partitioning blocking factor.
    P      : The number of process rows.
    Q      : The number of process columns.
    Time   : Time in seconds to solve the linear system.
    Gflops : Rate of execution for solving the linear system.
    
    The following parameter values will be used:
    
    N      :      29       30       34       35
    NB     :       1        2        3        4
    PMAP   : Row-major process mapping
    P      :       2        1        4
    Q      :       2        4        1
    PFACT  :    Left    Crout    Right
    NBMIN  :       2        4
    NDIV   :       2
    RFACT  :    Left    Crout    Right
    BCAST  :   1ring
    DEPTH  :       0
    SWAP   : Mix (threshold = 64)
    L1     : transposed form
    U      : transposed form
    EQUIL  : yes
    ALIGN  : 8 double precision words
    
    --------------------------------------------------------------------------------
    
    - The matrix A is randomly generated for each test.
    - The following scaled residual check will be computed:
          ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
    - The relative machine precision (eps) is taken to be               1.110223e-16
    - Computational tests pass if scaled residuals are less than                16.0
    
    ================================================================================
    T/V                N    NB     P     Q               Time                 Gflops
    --------------------------------------------------------------------------------
    WR00L2L2          29     1     2     2               0.00             1.9114e-02
    HPL_pdgesv() start time
    
    HPL_pdgesv() end time
    
    --------------------------------------------------------------------------------
    ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   1.88218349e-02 ...... PASSED
    
    ...
    ...
    ...
    
    ================================================================================
    T/V                N    NB     P     Q               Time                 Gflops
    --------------------------------------------------------------------------------
    WR00R2R4          35     4     4     1               0.00             5.9052e-01
    HPL_pdgesv() start time
    
    HPL_pdgesv() end time
    
    --------------------------------------------------------------------------------
    ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   1.99396688e-02 ...... PASSED
    ================================================================================
    
    Finished    864 tests with the following results:
                864 tests completed and passed residual checks,
                  0 tests completed and failed residual checks,
                  0 tests skipped because of illegal input values.
    --------------------------------------------------------------------------------
    
    End of Tests.
    ================================================================================
    

    We need to focus on the Gflops value, my maximum value in the sample configuration is 1.3211e+00.