Atomic min max GS operator #1704

tuananhdao · 2025-02-10T09:44:00Z

Changes

Supports GS_OP_MIN, GS_OP_MAX for gs op using CUDA and HIP
Add unit tests for the added operators, in tests/unit/gather_scatter
Add benchmark tests to tests/bench/gs

Benchmark results

By running tests/bench/gs.

HIP, `rocm/6.2.0`

 -------------Mesh-------------
 Reading a binary Neko file box.nmsh
 gdim = 3, nelements =     27000
 Reading elements
 Reading BC/zone data
 Reading deformation data
 Mesh read, setting up connectivity
 Done setting up mesh and connectivity
 Mesh and connectivity setup (excluding read) time (s): 93.528476

 --------Gather-Scatter--------
 Comm         :   Device MPI
 Avg. internal:      2646000
 Avg. external:            0
 Backend      :          hip
 
GS_OP_ADD mean:   0.4491E-03, stddev:   0.3858E-05
 
GS_OP_MIN mean:   0.4492E-03, stddev:   0.3962E-05
 
GS_OP_MAX mean:   0.4490E-03, stddev:   0.4021E-05

CUDA

 -------------Mesh-------------
 Reading a binary Neko file box.nmsh
 gdim = 3, nelements =     27000
 Reading elements
 Reading BC/zone data
 Reading deformation data
 Mesh read, setting up connectivity
 Done setting up mesh and connectivity
 Mesh and connectivity setup (excluding read) time (s): 69.145985

 --------Gather-Scatter--------
 Comm         :          MPI
 Avg. internal:      2646000
 Avg. external:            0
 Backend      :         cuda

GS_OP_ADD mean:   0.7639E-04, stddev:   0.7434E-05

GS_OP_MIN mean:   0.7630E-04, stddev:   0.7027E-05

GS_OP_MAX mean:   0.7629E-04, stddev:   0.6781E-05

…ompilers

tests/unit/gather_scatter/gather_scatter_parallel.pf

MartinKarp

Champ!

tuananhdao added 9 commits February 6, 2025 16:14

atomicMin atomicMax and unit tests

5998615

add benchmarks for GS_OP_MIN and GS_OP_MAX

8de2743

add atomicCAS for comparison

9b275a2

fix sp/dp compiling hip

40e43cf

fix sp,dp for atomicMax_CAS

a272928

update cuda kernels

fd96077

fix check negative for long long presentation of double

1170881

add comment to remove atomicCAS for the PR

2240af0

Merge branch 'develop' into feat/atomic-min-max

2ddc1d8

tuananhdao added NVIDIA NVIDIA GPUs and CUDA AMD AMD GPUs and HIP don't merge Don't merge yet! performance Performance labels Feb 11, 2025

tuananhdao added 3 commits February 12, 2025 11:06

correct atomicMin atomicMax

7ab320b

improve performance for atomic Min Max

4d0290c

sync cuda with hip gs kernels

46e4552

tuananhdao marked this pull request as ready for review February 13, 2025 12:07

tuananhdao removed the don't merge Don't merge yet! label Feb 13, 2025

njansson approved these changes Feb 13, 2025

View reviewed changes

njansson requested review from timofeymukha and timfelle February 13, 2025 12:10

tuananhdao added 2 commits February 13, 2025 13:38

cast second argument of atomicMin to unsigned long long for GNU c…

0dad4bc

…ompilers

guard cuda compute capability >= 6.0 for 64-bit atomics

69d39e7

timfelle approved these changes Feb 13, 2025

View reviewed changes

tuananhdao requested a review from MartinKarp February 13, 2025 15:51

MartinKarp approved these changes Feb 13, 2025

View reviewed changes

MartinKarp merged commit 811450d into develop Feb 13, 2025
28 checks passed

MartinKarp deleted the feat/atomic-min-max branch February 13, 2025 16:03

tuananhdao restored the feat/atomic-min-max branch February 13, 2025 16:05

tuananhdao mentioned this pull request Feb 13, 2025

Fix spacing, indentation #1704 #1707

Merged

njansson linked an issue Feb 13, 2025 that may be closed by this pull request

Gather-scatter is missing Min/Max operations for device aware MPI backend #1333

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic min max GS operator #1704

Atomic min max GS operator #1704

tuananhdao commented Feb 10, 2025 •

edited

Loading

MartinKarp left a comment

Atomic min max GS operator #1704

Atomic min max GS operator #1704

Conversation

tuananhdao commented Feb 10, 2025 • edited Loading

Changes

Benchmark results

HIP, rocm/6.2.0

CUDA

MartinKarp left a comment

Choose a reason for hiding this comment

tuananhdao commented Feb 10, 2025 •

edited

Loading

HIP, `rocm/6.2.0`