-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math_brute_force: treat reciprocal as unary function #2281
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this direction and this is something we should continue pursuing, but unfortunately this isn't working for me in its current form.
Treat reciprocal as a unary function, instead of handling it through the binary function testing mechanism and special-casing it there. This addresses two shortcomings of the previous implementation: - Testing took significantly longer as the entire input domain was tested many times (e.g. fp16 reciprocal has only 2^16 possible input values, but binary function testing iterates over 2^16 * 2^16 input values). - The reciprocal test kernel was identical to the divide kernel. Thus the device compiler would see a regular divide operation instead of a reciprocal operation and would be unlikely to emit a specialized reciprocal sequence. This reverts all of the changes in binary_operator*.cpp made by bcfa1f7 ("Added corrections to re-enable reciprocal test in math_brute_force suite for relaxed math mode (KhronosGroup#2221)", 2025-02-04). Signed-off-by: Sven van Haastregt <[email protected]>
a9a4efa
to
c33e9b0
Compare
Works fine with my nvidia and intel devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This works for the device I've tested, and in some cases it's a significant performance improvement.
Discussed in the February 25th teleconference. Waiting one more week for additional review approvals. |
Treat reciprocal as a unary function, instead of handling it through the binary function testing mechanism and special-casing it there.
This addresses two shortcomings of the previous implementation:
Testing took significantly longer as the entire input domain was tested many times (e.g. fp16 reciprocal has only 2^16 possible input values, but binary function testing iterates over 2^16 * 2^16 input values).
The reciprocal test kernel was identical to the divide kernel. Thus the device compiler would see a regular divide operation instead of a reciprocal operation and would be unlikely to emit a specialized reciprocal sequence.
This reverts all of the changes in binary_operator*.cpp made by bcfa1f7 ("Added corrections to re-enable reciprocal test in math_brute_force suite for relaxed math mode (#2221)", 2025-02-04).