-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preserving allocation domain transform on aliased output #3970
Conversation
!test |
Review updated until commit 0fe131d Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
csrc/fusion.cpp
Outdated
NVF_CHECK(output->isA<TensorView>() && input->isA<TensorView>(), | ||
"aliasing output to input is only supported for TensorView"); | ||
auto new_domain = TransformReplay::selfAllocationReplay(output->as<TensorView>()->domain(), input->as<TensorView>()->domain()); | ||
output->as<TensorView>()->setAllocationDomain(new_domain->allocation(), new_domain->contiguity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we are aliasing output to input, the two needs to have identical
allocation domain and contiguity. (technically we only needed compatible allocation domain, but it's easier to have a fixed target).
!test |
keeping my 🤞 |
!test --diff |
!test --diff |
!test --diff |
!test --diff |
csrc/transform_replay.cpp
Outdated
@@ -233,6 +233,97 @@ TensorDomain* TransformReplay::fullSelfReplay( | |||
new_self_root->contiguity()); | |||
} | |||
|
|||
TensorDomain* TransformReplay::selfAllocationReplay( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tagged @wujingyue because I think you asked for something similar to this?
Though I'm not totally sure if this does what you want, because I'm addressing some requirement from remove_bcast_squeeze
, which I think we can also refactor this one and put some logic on the user side, if we needed a more generic replay later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall -- there are some details that I want to make sure I understand to have a good mental model
csrc/transform_replay.cpp
Outdated
@@ -233,6 +233,97 @@ TensorDomain* TransformReplay::fullSelfReplay( | |||
new_self_root->contiguity()); | |||
} | |||
|
|||
TensorDomain* TransformReplay::selfAllocationReplay( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this can probably be consolidated with the existing
Fuser/csrc/transform_replay.cpp
Line 1220 in 7524c9f
TensorDomain* fullReplay( |
@@ -389,6 +390,16 @@ TensorView* maybeDoReplacement(TensorView* orig) { | |||
} | |||
} | |||
|
|||
// If we are replacing an output, we need to preserve the memory layout by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this check in Fusion::replaceOutput
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That check needs to be opt-in. We have places where we are explicitly altering memory layout. i.e. mark alias prepare and sharding propagation that would intentionally change the memory layout.
But it would be nice to have a check on that.
So the check would actually be checking for compatible transforms. i.e. the two transforms might not necessarily be the same, but they should be equivalent... This feels like something @wujingyue and @zasdfgbnm were discussing together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened this to track #3994
!test --diff |
I'm surprised seeing a segfault with the python_bc_advisory thing. I didn't change any python API. No idea where that segfault is coming from. |
!test --diff |
NVF_ERROR( | ||
it != replay.getReplay().end(), | ||
"Error during replay, didn't replay an axis."); | ||
if (it->second->isBroadcast() == self_contiguity[i].has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a bug or a misuse. Is there a good use case for is-broadcast to be inconsistent between self
and new_self
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same reason with the comment above about symbolic IDs. If we are dealing with symbolic ID, we just need to make sure that the contiguity flag on the given ID is consistent with itself.
Later during concretization, it will fix the consistency globally.
!test --diff |
Failure doesn't seem to be relevant. We also have all test cleared in 4e6d81e. I'm merging this PR as-is. |
Fixes #3957
The issue was that when we specify output alias, we need to update the (allocation domain and contiguity) of the alias to be identical to the source.