rping: No such device
# server side
$ rping -d -s -a 0.0.0.0
# client side
$ rping -d -c -a 0.0.0.0
client
port 2
created cm_id 0x55c75f037e30
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x55c75f037e30 (parent)
rdma_resolve_route: No such device
waiting for addr/route resolution state 11
destroy cm_id 0x55c75f037e30
destroy cm_id 0x55c75f037e30
-a 参数问题,将 client side 的 -a 改为本机 IP 即可,使用 localhost 也会有相同的问题。
rdma_xserver & rdma_xclient: Cannot allocate memory
RoCE 不支持 XRC,改用 RC 后无该问题。
根据 open-mpi faq 里 48 的说法,XRC 存在一些问题。
48. Does Open MPI support XRC? (openib BTL)
Older versions of Open MPI support XRC.
XRC (eXtended Reliable Connection) decreases the memory consumption of Open MPI and improves its scalability by significantly decreasing number of QPs per machine.
XRC is available on Mellanox ConnectX family HCAs with OFED 1.4 and later.
XRC was was removed in the middle of multiple release streams (which were effectively concurrent in time) because there were known problems with it and no one was going to fix it. Here are the versions where XRC support was disabled:
- In then 2.0.x series, XRC was disabled in v2.0.4.
- In then 2.1.x series, XRC was disabled in v2.1.2.
- In then 3.0.x series, XRC was disabled prior to the v3.0.0 release.
Specifically: v2.1.1 was the latest release that contained XRC support. Note that it is not known whether it actually works, however.
See this FAQ entry for instructions how to tell Open MPI to use XRC receive queues.