-
Notifications
You must be signed in to change notification settings - Fork 61
Add new data type Float8_e8m0fnu
#4665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Review updated until commit badeeb2 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
? (outputs[0] | ||
.as<at::Tensor>() | ||
.ge(ref_output / 2) | ||
.logical_and( | ||
outputs[0].as<at::Tensor>().le(ref_output * 2)) | ||
.all() | ||
.item<bool>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like PyTorch's implementation is different from NVIDIA's implementation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what you're doing here? Looks like it accounts for a 2x error range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly. e8m0 has no mantissa, so it can only represent 1*2^x, and I am asserting that, if the results are different, then x is different by at most 1.
!test |
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Yet another variant of fp8, commonly used as scaling factors for mxfp4