-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Changes fp8 implementation to more closely match NCCL, and added logi… #1619
base: develop
Are you sure you want to change the base?
Conversation
25f9475
to
e4fcde4
Compare
27a6182
to
da94a6f
Compare
@@ -277,11 +277,19 @@ static ncclResult_t hostToDevRedOp( | |||
#if defined(RCCL_FLOAT8) | |||
case ncclFloat8e4m3: | |||
opFull->op = ncclDevPreMulSum; | |||
fp8_e4m3 = (rccl_float8)(float(1.0/comm->nRanks)); | |||
if (rccl_float8_useFnuz) { | |||
fp8_e4m3_fnuz = (rccl_float8_fnuz)(float(1.0/comm->nRanks)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we cast from double to float and then again to the 8-bit float type? Are casts from double to rccl_float8_fnuz not possible?
I'm also curious why we're not using static_cast instead of c-style casts.
test/common/PtrUnion.cpp
Outdated
case ncclFloat8e4m3: F1[idx] = rccl_float8(ReduceOp(op, float(F1[idx]), float(inputCpu.F1[idx]))); break; | ||
case ncclFloat8e4m3: | ||
{ | ||
if (PtrUnion_Float8UseFnuz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this slow down rccl unit testing? If so, by how much?
…c to handle FNUZ types.
Details
Do not mention proprietary info or link to internal work items in this PR.
Work item: "Internal", or link to GitHub issue (if applicable).
What were the changes?
One sentence describing the work done.
Why were the changes made?
Explain the motivation behind the work. Provide any publicly-available historical context.
How was the outcome achieved?
Technical details behind the work. Explain any publicly-available hardware peculiarities.
Additional Documentation:
What else should the reviewer know?
Approval Checklist
Do not approve until these items are satisfied.