Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Changes fp8 implementation to more closely match NCCL, and added logi… #1619

Draft
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

corey-derochie-amd
Copy link
Collaborator

…c to handle FNUZ types.

Details

Do not mention proprietary info or link to internal work items in this PR.

Work item: "Internal", or link to GitHub issue (if applicable).

What were the changes?
One sentence describing the work done.

Why were the changes made?
Explain the motivation behind the work. Provide any publicly-available historical context.

How was the outcome achieved?
Technical details behind the work. Explain any publicly-available hardware peculiarities.

Additional Documentation:
What else should the reviewer know?

Approval Checklist

Do not approve until these items are satisfied.

  • Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

@corey-derochie-amd corey-derochie-amd force-pushed the fp8-fnuz branch 2 times, most recently from 25f9475 to e4fcde4 Compare April 1, 2025 16:48
@corey-derochie-amd corey-derochie-amd marked this pull request as draft April 1, 2025 22:49
@corey-derochie-amd corey-derochie-amd changed the title Changes fp8 implementation to more closely match NCCL, and added logi… [DRAFT] Changes fp8 implementation to more closely match NCCL, and added logi… Apr 1, 2025
@@ -277,11 +277,19 @@ static ncclResult_t hostToDevRedOp(
#if defined(RCCL_FLOAT8)
case ncclFloat8e4m3:
opFull->op = ncclDevPreMulSum;
fp8_e4m3 = (rccl_float8)(float(1.0/comm->nRanks));
if (rccl_float8_useFnuz) {
fp8_e4m3_fnuz = (rccl_float8_fnuz)(float(1.0/comm->nRanks));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we cast from double to float and then again to the 8-bit float type? Are casts from double to rccl_float8_fnuz not possible?

I'm also curious why we're not using static_cast instead of c-style casts.

case ncclFloat8e4m3: F1[idx] = rccl_float8(ReduceOp(op, float(F1[idx]), float(inputCpu.F1[idx]))); break;
case ncclFloat8e4m3:
{
if (PtrUnion_Float8UseFnuz) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this slow down rccl unit testing? If so, by how much?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants