Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontier GPU-aware MPI #405

Closed
sbryngelson opened this issue May 3, 2024 · 2 comments · Fixed by #448
Closed

Frontier GPU-aware MPI #405

sbryngelson opened this issue May 3, 2024 · 2 comments · Fixed by #448
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@sbryngelson
Copy link
Member

sbryngelson commented May 3, 2024

Support and document GPU-aware MPI on Frontier.

  • If not oversubscribing the GPUs, make sure we run with -c 7 --gpus-per-task=1 --gpu-bind=closest. Because the NIC is connected directly to the GPU, affinity is very important.

  • Since we are already building everything in, it should just be a matter of setting the runtime flags.

    • Runtime flag to set: MPICH_GPU_SUPPORT_ENABLED=1
@sbryngelson sbryngelson added the enhancement New feature or request label May 3, 2024
@sbryngelson
Copy link
Member Author

From some old 2021 docs:

• Environment variable, CRAY_ACC_USE_UNIFIED_MEM=1
• CCE offloading runtime library will auto-detect user-allocations of pinned or managed memory
• No explicit allocations or transfers will be issued for such memory
• Original pointers passed directly into GPU kernels
• CRAY_ACC_DEBUG runtime messages reflect this capability

https://www.olcf.ornl.gov/wp-content/uploads/2021/04/2021-05-20-Frontier-Tutorial-CCE.pdf

@sbryngelson sbryngelson added the help wanted Extra attention is needed label May 23, 2024
@sbryngelson
Copy link
Member Author

From here: https://www.openmp.org/wp-content/uploads/2022-04-29-ECP-OMP-Telecon-HPE-Compiler.pdf

CCE OPENMP UNIFIED SHARED MEMORY SUPPORT FOR AMD MI250X

Dynamically enable GPU unified memory for OpenMP map clauses
• Set env vars CRAY_ACC_USE_UNIFIED_MEM=1 and HSA_XNACK=1
• Skips explicit allocate/transfer for all system memory
• Global ”declare target” variables will still be allocated separately (compiler statically emits a device copy)
• Statically enable GPU unified memory for OpenMP map clauses
• Compile with “requires unified_shared_memory” directive
• Set env var HSA_XNACK=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Development

Successfully merging a pull request may close this issue.

2 participants