-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EAMxx: refactor field manipulation interfaces, and add new ones #6970
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int->double conversions are going to be nice!
@tcclevenger I added a few more improvements:
The biggest rework compared to earlier is the CombineViewsHelper struct. I needed that on CUDA, since apparently you can't use an |
09f6f77
to
094230b
Compare
@tcclevenger I'm adding some more stuff. Since compilation times are going up a lot, I am adding ETI for the update method. The combinations are quite a few, but mostly b/c of the fine grain combinations in CombineMode: enum class CombineMode {
ScaleUpdate, // out = beta*out + alpha*in (most generic case)
Update, // out = beta*out + in (special case of ScaleUpdate wiht alpha=1)
ScaleAdd, // out = out + alpha*in (special case of ScaleUpdate with beta=1)
ScaleReplace, // out = alpha*in (special case of ScaleUpdate with beta=0)
Add, // out = out + in (special case of ScaleUpdate with alpha=beta=1)
Rescale, // out = beta*out
Replace, // out = in
Multiply, // out = out*in
Divide, // out = out/in
Max, // out = max(out,in)
Min // out = min(out,in)
}; I was thinking: what if we remove all except the last four and the first? All others can be expressed in terms of ScaleUpdate (which we may just rename "Update" for simplicity). The reason I added all of those was to allow a single interface (which used alpha and beta), but guarantee zero overhead when the combine mode did not require that coefficient (e.g., avoid doing enum class CombineMode {
Update, // out = beta*out + alpha*in
Multiply, // out = alpha*out*in
Divide, // out = beta*out/in
Max, // out = max(out,in)
Min // out = min(out,in)
}; This would drastically reduce the number of ETI calls we do (by a factor 6/11). These remaining 5 cannot be expressed in terms of each other, so it clearly cannot be reduced further. |
e064584
to
44d44f5
Compare
Ok, I decidedto pull the trigger on reducing CombineMode options, as well as adding some ETI for Field methods, which started to grow. I ran some timing tests on my laptop (not a great data point, but still): without ETI:
with ETI:
The compilation of the templated methods for HostOrDevice::Device is what takes most of the time (that's b/c of the update methods, which have a lot of ramifications). I have only added ETI for update for Update/Multiply/Divide. The Max/Min impl are currently unused, although I plan to use them in the scorpio output class (I will just rely on implicit instantiation in that file). Also, I split device and host ETI into two files, so that they can be built in parallel, reducing a bit the compilation time. Notice that here I'm only showing the compilation time saved for field_tests.cpp.o, but all cpp.o files that include field.hpp and use these common ETI-ed methods will see some compilation time improvement. We are paying the price once in the compilation of I will run a bit more precise experiments tomorrow. |
44d44f5
to
903848c
Compare
* Implement scale/scale_inv in terms of update * Use fill_val only if needed
That is, we can do y.scale(x), if y's data type is double and x's data type is int, but not viceversa
In Field::update_impl, in order to use if constexpr in a kernel we must use a functor, and not an extended host-device lambda.
The scalar type of the two fields may differ, so we must retrieve fill val from *this and x according to their scalar types.
Namely, remove those that were special cases of ScaleUpdate, and rename ScaleUpdate simply Update
* ETI separately for int/float/double and host/device * ETI separately each function * Remove possibility of updating a field with another with different (but compatible) scalar type. It generates too many template instantiations * Do not implement deep_copy in terms of update. It would prevent replacing NaN with valid values, since update's arithmetic implementation would propagate NaN's. Signed-off-by: Luca Bertagna <[email protected]>
85fbf10
to
8d577d4
Compare
@tcclevenger this PR is growing into much more than it originally was. Yet, I think it's valuable. Here's a recap of all you will find:
For the ETI, I clocked building the src folder (no tpls or system tests) on my workstation (haswell, 10 cores) with make -j10. I did that with ETI, as well as commenting out all the ETI-related stuff. The timings are:
Now, this was just my machine, and things may be quite different on other archs, but it definitely helped, so I expect it to help a bit on other platforms too. Also, the PR has commits adding something, and then taking it back. Let me know if you'd rather me rewrite the git history in a cleaner way. |
Cannot rely on Kokkos::deep_copy, since one view may be padded and the other may be not padded. Signed-off-by: Luca Bertagna <[email protected]>
Simplify implementation of the scale/scale_inv method, and add new ones for max/min.
[BFB]
First, the current implementation had several methods implemented that could just be expressed as a single call to
update
, if onlyupdate
accepted different combine mode values (likeupdate_impl
did). The solution was simple: templateupdate
on the combine mode, defaulting to the current behavior (i.e.,ScaleUpdate
). With this modification,scale
,scale_impl
and alsodeep_copy
(from a Field rhs) could be rewritten as a 1-liner call toupdate
.Second, the PR adds two more values for
CombineMode
, namelyMax
andMin
. The goal is to later use Field operations insidescorpio_output.cpp
, to avoid manually handling the same kind of stuff in that class, simplifying its implementation.Finally, allow do do stuff like
y.scale(x)
if y's data type is, say, double but x's data type is int (but not the other way around!).