Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic min max GS operator #1704

Merged
merged 14 commits into from
Feb 13, 2025
Merged

Atomic min max GS operator #1704

merged 14 commits into from
Feb 13, 2025

Conversation

tuananhdao
Copy link
Collaborator

@tuananhdao tuananhdao commented Feb 10, 2025

Changes

  • Supports GS_OP_MIN, GS_OP_MAX for gs op using CUDA and HIP
  • Add unit tests for the added operators, in tests/unit/gather_scatter
  • Add benchmark tests to tests/bench/gs

Benchmark results

By running tests/bench/gs.

HIP, rocm/6.2.0

 -------------Mesh-------------
 Reading a binary Neko file box.nmsh
 gdim = 3, nelements =     27000
 Reading elements
 Reading BC/zone data
 Reading deformation data
 Mesh read, setting up connectivity
 Done setting up mesh and connectivity
 Mesh and connectivity setup (excluding read) time (s): 93.528476

 --------Gather-Scatter--------
 Comm         :   Device MPI
 Avg. internal:      2646000
 Avg. external:            0
 Backend      :          hip
 
GS_OP_ADD mean:   0.4491E-03, stddev:   0.3858E-05
 
GS_OP_MIN mean:   0.4492E-03, stddev:   0.3962E-05
 
GS_OP_MAX mean:   0.4490E-03, stddev:   0.4021E-05

CUDA

 -------------Mesh-------------
 Reading a binary Neko file box.nmsh
 gdim = 3, nelements =     27000
 Reading elements
 Reading BC/zone data
 Reading deformation data
 Mesh read, setting up connectivity
 Done setting up mesh and connectivity
 Mesh and connectivity setup (excluding read) time (s): 69.145985

 --------Gather-Scatter--------
 Comm         :          MPI
 Avg. internal:      2646000
 Avg. external:            0
 Backend      :         cuda

GS_OP_ADD mean:   0.7639E-04, stddev:   0.7434E-05

GS_OP_MIN mean:   0.7630E-04, stddev:   0.7027E-05

GS_OP_MAX mean:   0.7629E-04, stddev:   0.6781E-05

@tuananhdao tuananhdao added NVIDIA NVIDIA GPUs and CUDA AMD AMD GPUs and HIP don't merge Don't merge yet! performance Performance labels Feb 11, 2025
@tuananhdao tuananhdao marked this pull request as ready for review February 13, 2025 12:07
@tuananhdao tuananhdao removed the don't merge Don't merge yet! label Feb 13, 2025
Copy link
Collaborator

@MartinKarp MartinKarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Champ!

@MartinKarp MartinKarp merged commit 811450d into develop Feb 13, 2025
28 checks passed
@MartinKarp MartinKarp deleted the feat/atomic-min-max branch February 13, 2025 16:03
@tuananhdao tuananhdao restored the feat/atomic-min-max branch February 13, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMD AMD GPUs and HIP NVIDIA NVIDIA GPUs and CUDA performance Performance
Projects
Status: 🍻 Done
Development

Successfully merging this pull request may close these issues.

Gather-scatter is missing Min/Max operations for device aware MPI backend
4 participants