Provide generic and safe C++ interfaces for warp shuffle: Issue #2976 #3210

soumikiith · 2024-12-20T13:01:30Z

Description

I have provided generic and safe C++ interface for warp shuffle (shuffle_sync only for now). The safety features include: (1) checking for allowable data types, (2) handling of variables that consists of 4 bytes (32 bits).
Soon, I will post the feature to handle 16 bit and 64 bit data types.

Provide generic and safe C++ interfaces for warp shuffle: Issue #2976

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

…A#2976

copy-pr-bot · 2024-12-20T13:01:35Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

fbusato · 2024-12-20T17:35:46Z

thanks for the contribution, @soumikiith. I have a couple of initial comments.

cmath provides a set of mathematical operations, while warp shuffles are about data movement. I would create another header cuda/shuffle.
you don't need to handle all data types one by one, or by size. My suggestion is to create an array of uint32_t and then use memcpy. Even better if you find a way to use bit_cast.

fbusato · 2024-12-20T18:47:37Z

I updated #2976 to better formalize the features and checks of these functions

soumikiith · 2024-12-21T06:08:45Z

One Question:

While computing laneid, can I use modulo operator ? Or is the preferable way to fetch it directly from assembly using asm instructions?

Note that my doubt is only in the context of shfl_up and shfl_down.

Also, why does a mask value need to be passed (I know that the default value is assigned) in shfl_xor? Is not passing lanemask sufficient ?

fbusato · 2024-12-23T17:25:20Z

While computing laneid, can I use modulo operator ? Or is the preferable way to fetch it directly from assembly using asm instructions?

you can use C++ API for PTX, see https://nvidia.github.io/cccl/libcudacxx/ptx/instructions/special_registers.html#laneid

Also, why does a mask value need to be passed (I know that the default value is assigned) in shfl_xor? Is not passing lanemask sufficient ?

Referring to the official documentation, laneMask and mask have different meaning. mask represents the active lanes, while laneMask is the value to apply to the XOR operator, i.e. laneid() ^ laneMask

…ded extra supports for checks.

soumikiith · 2024-12-24T07:37:14Z

Hi, I have added the checks (I need to fix the assertion statements, though). Please check them and let me know if this is meeting your expected requirements. I will soon commit the casting of different data types using memcpy.

Please let me know of any additional requirements.

soumikiith · 2024-12-25T05:42:47Z

Hi,
I have added the code to do the __shfl operations for various data types. Please let me know if anything is to be added or if anything is flawed. I will happily revise my code.

Merry Christmas !!

Provide generic and safe C++ interfaces for warp shuffle: Issue NVIDI…

22344f4

…A#2976

soumikiith requested review from a team as code owners December 20, 2024 13:01

soumikiith requested review from wmaxey and alliepiper December 20, 2024 13:01

soumikiith force-pushed the main branch from 23b1242 to 22344f4 Compare December 24, 2024 07:17

soumikiith and others added 3 commits December 24, 2024 12:48

Fix: Moved the contents to a new location for better locality, and ad…

a998c6c

…ded extra supports for checks.

Merge branch 'main' into main

94a55dc

Merge remote-tracking branch 'origin/main'

764d4c4

Added shfl operations using memcpy.

599f49a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide generic and safe C++ interfaces for warp shuffle: Issue #2976 #3210

Provide generic and safe C++ interfaces for warp shuffle: Issue #2976 #3210

soumikiith commented Dec 20, 2024 •

edited

Loading

copy-pr-bot bot commented Dec 20, 2024

fbusato commented Dec 20, 2024

fbusato commented Dec 20, 2024

soumikiith commented Dec 21, 2024 •

edited

Loading

fbusato commented Dec 23, 2024

soumikiith commented Dec 24, 2024

soumikiith commented Dec 25, 2024

Provide generic and safe C++ interfaces for warp shuffle: Issue #2976 #3210

Are you sure you want to change the base?

Provide generic and safe C++ interfaces for warp shuffle: Issue #2976 #3210

Conversation

soumikiith commented Dec 20, 2024 • edited Loading

Description

Checklist

copy-pr-bot bot commented Dec 20, 2024

fbusato commented Dec 20, 2024

fbusato commented Dec 20, 2024

soumikiith commented Dec 21, 2024 • edited Loading

fbusato commented Dec 23, 2024

soumikiith commented Dec 24, 2024

soumikiith commented Dec 25, 2024

soumikiith commented Dec 20, 2024 •

edited

Loading

soumikiith commented Dec 21, 2024 •

edited

Loading