Device to Device transfers don't work with OpenMPI + LinkX provider on AMD GPUs

OpenMPI 5.0.6 with `shm+cxi:lnx` fails to perform Device - Device transfers on LUMI system (AMD GPUs) with OSU benchmark. Host - Host transfers work as expected for intra- and inter-node transfers. For Device - Device transfers OpenMPI fails with

```
export FI_LNX_PROV_LINKS=shm+cxi
mpirun --mca opal_common_ofi_provider_include "shm+cxi:lnx" -np 2 -map-by numa ./osu_bibw -m 131072: D D

# OSU MPI-ROCM Bi-Directional Bandwidth Test v7.4
# Datatype: MPI_CHAR.
# Size      Bandwidth (MB/s)
--------------------------------------------------------------------------
Open MPI failed to register your buffer.
This error is fatal, your job will abort

  Buffer Type: rocm
  Buffer Address: 0x154beaa00000
  Buffer Length: 131072
  Error: Required key not available (4294967030)
--------------------------------------------------------------------------
```

@hppritcha identified the problem to be related to https://github.com/open-mpi/ompi/pull/11076. There was a fix for this issue in https://github.com/open-mpi/ompi/pull/12290, but it was not merged to the 5.x branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Device to Device transfers don't work with OpenMPI + LinkX provider on AMD GPUs #13048

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Device to Device transfers don't work with OpenMPI + LinkX provider on AMD GPUs #13048

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions