Slurm单机部署

836 阅读1分钟

1.安装munge

slurm 内部的认证程序munge,编译这个依赖前需要安装两个依赖:

yum -y install munge munge-libs munge-devel -y

配置:

/usr/sbin/create-munge-key

启动:

systemctl enable munge --now

2.安装pmix

mpi通信的服务器

先安装两个依赖库

yum install -y hwloc-devel libevent-devel 

github.com/openpmix/op… 下载最新包

解压后进入内部编译和安装:

./configure && make install -j $(nproc)

3.编译slurm

下载最新源码包

github.com/SchedMD/slu…

解压进入内部:

./configure && make install -j $(nproc)

4.配置slurm

在刚才的目录下 doc/html 有自动生成的网页,拷贝一份到本地,用浏览器打开,填写相应信息

ClusterName、SlurmctldHost、NodeName 都填hostname的值 CPUs、Sockets、CoresPerSocket 根据lscpu 填 RealMemory 根据free -m 填 SlurmUser 方便起见用root,生产环境建议用权限小的用户 Default MPI Type 可不选,这里选PMIX Process Tracking 这里选LinuxProc Resource Selection 选Cons_res SelectTypeParameters 选CR_Memory Job Accounting Gather 选Linux

然后点击sumbit 得到一份配置文件,

写入到/usr/local/etc/slurm.conf

在 etc/ 下:

chmod 755 *.service && cp *.service /etc/systemd/system

启动:

systemclt enable slurmcltd --now
systemclt enable slurmd --now

查看队列:

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
test*        up   infinite      1   idle spack

验证:

$ cat>hello.cpp<<EOF
#include "mpi.h"
#include <iostream>
int main(int argc,  char* argv[])
{
        int rank;
        int size;
        MPI_Init(0,0);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);

        std::cout<<"Hello world from process "<<rank<<" of "<<size<<std::endl;

        MPI_Finalize();

        return 0;
}
EOF
$ mpicxx hello.cpp -o hello
$ srun -p test --mpi=pmi2 -n 4 ./hello
Hello world from process 1 of 4
Hello world from process 2 of 4
Hello world from process 0 of 4
Hello world from process 3 of 4