-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add downgrade cancellation e2e tests #19252
Add downgrade cancellation e2e tests #19252
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted filessee 25 files with indirect coverage changes @@ Coverage Diff @@
## main #19252 +/- ##
==========================================
- Coverage 68.90% 68.82% -0.08%
==========================================
Files 420 420
Lines 35706 35706
==========================================
- Hits 24602 24576 -26
- Misses 9683 9705 +22
- Partials 1421 1425 +4 Continue to review full report in Codecov by Sentry.
|
7afa927
to
9da71dc
Compare
9da71dc
to
6004df8
Compare
/retest |
6004df8
to
b88f908
Compare
Rebased to main as #19244 is merged |
/retest |
@siyuanfoundation PTAL, thx |
fe94c09
to
172edc8
Compare
6359e49
to
5f6fd54
Compare
The more the number of nodes in the cluster is downgraded, the easier it is to cause the downgrade cancellation to timeout. I don't think I have a better idea of reducing flakiness so far, so I added this retry mechanism. Might not be elegant, but at least the CI tests should be somewhat stable. Tested with this command |
/retest |
1 similar comment
/retest |
73e07fd
to
b49c70b
Compare
/retest |
/retest |
1 similar comment
/retest |
36f4722
to
a15bcbd
Compare
Signed-off-by: Chun-Hung Tseng <[email protected]> Signed-off-by: Siyuan Zhang <[email protected]>
a15bcbd
to
4d1a207
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. Overall looks good to me.
After the downgrade is cancelled, users should be able to restore the cluster to its original state. They just need to replace the binary to previous version for each downgraded member. I think we should cover it as well. Of course, it can be addressed in followup PRs.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, henrybear327, siyuanfoundation The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I can quickly work on this. Could you elaborate regarding the "replace the binary to the previous version for each downgraded member"? Thank you @ahrtr! |
For example, for a 3 member cluster of 3.6. You downloaded 2 of them to 3.5, then cancelled the downgrade. The cluster version should be 3.5. The recovery process is straightforward, just replace binary for the 2 downloaded member one by one,
When it's done, the cluster version should be 3.6, and the cluster should be able to serve client requests correctly. |
Maybe I am mistaken. Isn't this what we are doing in |
You can reuse |
I will submit a follow-up PR and see if it meets your expectation! :) |
Part 2 of the e2e test - downgrade cancellation is called after the cluster has been partially downgraded (refer to part 1: #19244)
Reference: #17976
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.