Skip to content

[#1264][FEATURE][SPARK] Support cancel async thread of handle blockEvent and rpc when writer is killed #2381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

summaryzb
Copy link
Contributor

@summaryzb summaryzb commented Mar 6, 2025

What changes were proposed in this pull request?

Cancel all the runnable that are wait to be executed or blocked in waiting for rpc callback

Why are the changes needed?

This is helpful when current task is killed since speculation task attempts succeed, but the rpc of which send data still keep retrying

Fix: #1264

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT

@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 51.21%. Comparing base (8ad0f8d) to head (7b0f563).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...ffle/client/request/RssSendShuffleDataRequest.java 0.00% 4 Missing ⚠️
...ffle/client/impl/grpc/ShuffleServerGrpcClient.java 0.00% 1 Missing ⚠️
...client/impl/grpc/ShuffleServerGrpcNettyClient.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2381      +/-   ##
============================================
- Coverage     51.34%   51.21%   -0.14%     
+ Complexity     3615     3016     -599     
============================================
  Files           571      481      -90     
  Lines         32892    23193    -9699     
  Branches       2833     2140     -693     
============================================
- Hits          16890    11878    -5012     
+ Misses        14932    10569    -4363     
+ Partials       1070      746     -324     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

github-actions bot commented Mar 6, 2025

Test Results

 3 008 files   - 11   3 008 suites   - 11   6h 38m 27s ⏱️ - 6m 23s
 1 178 tests ± 0   1 175 ✅  -  2   1 💤 ±0  0 ❌ ±0  2 🔥 +2 
14 911 runs   - 11  14 893 ✅  - 14  15 💤 ±0  0 ❌ ±0  3 🔥 +3 

For more details on these errors, see this check.

Results for commit be8ca4e. ± Comparison against base commit 94e211c.

♻️ This comment has been updated with latest results.

@summaryzb
Copy link
Contributor Author

@jerqi @LuciferYang PTAL

@LuciferYang
Copy link
Contributor

also cc @advancedxy

@LuciferYang
Copy link
Contributor

Seems we should add a new test to cover this

@summaryzb summaryzb force-pushed the eagerly_cancel branch 2 times, most recently from 26b3240 to 56fa5ad Compare March 7, 2025 16:21
@summaryzb
Copy link
Contributor Author

gentle ping @LuciferYang @advancedxy

@@ -29,26 +30,29 @@ public class RssSendShuffleDataRequest {
private int retryMax;
private long retryIntervalMax;
private Map<Integer, Map<Integer, List<ShuffleBlockInfo>>> shuffleIdToBlocks;
private Supplier<Boolean> needCancel;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is kind of leaking details or making RssSendShuffleDataRequest holding references to the sending class, for spark, it's DataPusher. I'm not sure this is the elegant way to do that.

Is it possible for
boolean result = ClientUtils.waitUntilDoneOrFail(futures, allowFastFail); in ShuffleWriteClientImpl to be aware of interruption/spark cancellation, and cancels all the sending futures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, current Datapusher leake details to sending class, this pr does not make it worse, but achive a eagerly cancel in rpc retry level.
Aware of interruption/spark cancellation is a good idea, i'll follow this way

@summaryzb summaryzb marked this pull request as draft March 17, 2025 11:25
@summaryzb summaryzb force-pushed the eagerly_cancel branch 2 times, most recently from 212a7f9 to 2ddac28 Compare March 21, 2025 11:33
@summaryzb summaryzb marked this pull request as ready for review April 3, 2025 22:46
@summaryzb summaryzb force-pushed the eagerly_cancel branch 2 times, most recently from b3bf49f to 18b7a80 Compare April 5, 2025 04:59
@summaryzb summaryzb changed the title [#2380] Improvement: Eagerly cancel rpc request [#1264][FEATURE][SPARK] Support cancel async thread of handle blockEvent and rpc when writer is killed Apr 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE][SPARK] Support cancel async thread of handle blockEvent and rpc when writer is killed
4 participants