-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support unspill for SpillableHostBuffer #12186
base: branch-25.04
Are you sure you want to change the base?
Conversation
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Lets add some details on how unspill works in this PR when you get a chance. I also would like to see a benchmark that shows a performance change with this PR, what is the impact of unspill in your scenario, mostly to understand how bad things really are today. |
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Hi @abellina , I have refined my PR and updated how it works in the PR description. Please take a look. As to the benchmark, let's say the example in #12184 (comment) (we need #12215 fixed to run this example, but it in turn depends on this current issue), if you limit the host memory under a relatively low threshold, you'll find it takes forever for the shuffle concat to finish, because it's input pieces are frequently churning. |
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
build |
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
2bb50b8
to
7ea3d9b
Compare
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone please see my comment here: #12184 (comment). Is there a reason why this pattern change suggestion doesn't work? |
So thins brings up a very interesting point. Do we want to be able to run the concat operation when we do not have enough host memory to store the entire input shuffle table and the output? My idea to stop the churn is in #12236 (comment) But it assumes that we want to have enough host memory to do the entire operation. If we have some kind of a requirement to do it with less memory, I would prefer it if we handled the split and retry exception and then tried to concat fewer KudoTable buffers instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@binmahone I think your idea of unspill and my idea of unspill are a bit different.
For me unspill implies that we can have both a DiskHandle
and a HostMemoryBuffer
at the same time inside of the SpillableHostBufferHandle
.
So if we spill data to disk
and then read it back in again we keep a copy of it cached in host
.
If we go to spill and we have both disk
and host
, then it is simple we just release host
so it can be cleaned up, no writing to disk
needed.
If we run into a situation where we are in the middle of spilling and we want to materialize the buffer, then ideally we let the spill code know that it can finish writing to disk
, but it should not delete the host
buffer because someone grabbed it.
For me the only down side to this approach is the added complexity in the SpillFramework code and the slightly higher probability that we might need to call spill more frequently if the the life time of the SpillableHostBufferHandle is much longer than the life time of the HostMemoryBuffer that is returned by materialize.
I think the answer is NO, that why the code pattern will be "lock all input buffers first" (checkout the snapshot in #12184 (comment)) , it will make sure all inputs are already in memory |
Sorry @revans2 I'm a little confused, it appears to me that this PR's implementation matches your description? Need some advises on the next step of this PR |
There are a lot of places where it is not doing what I called out.
spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala Lines 392 to 397 in 48a2011
If we want to keep the disk around along with the host memory, then why are we setting disk to None here? If anyone asks to "unspill" this buffer the
That code is not implemented. spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala Line 413 in 48a2011
If we have both a disk and a host, then line 413 will be false. Which in turn sets thisThreadSpills to false. Which then also means that once we are in a situation with both host and disk, the host will never be released because disk is used as the gate to see if it can be spilled.
spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala Lines 455 to 457 in 48a2011
Technically that is true about this code, but it also does not set the As for next steps with this PR. You should retest with the changes that were made in #12236 and no unspilling. See if this is even needed at all. My guess is that the changes there make it so there is much less churn and this is not really needed. If the performance shows that something like this would still help, then please look at the places that I pointed out above so that unspill does what I want/expect. Then we would need follow on issues to look at all of the handles to see if there are similar changes that would be helpful for them too. |
This PR closes #12184.
It only takes care of SpillableHostBuffer by enhancing SpillableHostBufferHandle. In SpillableHostBufferHandle,
materialize()
is replaced bymaterialize(unspill: Boolean = false)
, so that user can express the intention to guarantee the host buffer is unspilled to host (so that he won't pay the price of reading from disk for this buffer anymore).Internally, if set to true, an unspill step will be added to put the buffer back to host store after reading from disk (if applies). The tricky part is when
materialize(true)
andspill()
are called at the same time, to be more specific,materialize(true)
happens afterspilling
is set to true but disk writing has not finished. In this PR we use a flag calledspillingInterrupted
to indicate that the host memory buffer has been rescued and the spilling thread should, on seeing thespillingInterrupted
equals true, abandon what has been already written to disk and abort current spilling procedure.In the PR, I also tried upgrading mockito verson from 3.12.4 to 4.11.0, in the hope of it fixing a static mocking problem for me. Unfortunately it didn't work out. However I think it's still worthwhile to keep the upgrade work to keep mockito version consistent with org.scalatestplus/mockito-4-11, which literally speaking requires mockito 4.11