Add downgrade cancellation e2e tests #19252

henrybear327 · 2025-01-21T22:26:20Z

Part 2 of the e2e test - downgrade cancellation is called after the cluster has been partially downgraded (refer to part 1: #19244)

Reference: #17976

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

codecov · 2025-01-21T22:44:40Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.82%. Comparing base (3cc3daf) to head (4d1a207).
Report is 12 commits behind head on main.

Additional details and impacted files

see 25 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19252      +/-   ##
==========================================
- Coverage   68.90%   68.82%   -0.08%     
==========================================
  Files         420      420              
  Lines       35706    35706              
==========================================
- Hits        24602    24576      -26     
- Misses       9683     9705      +22     
- Partials     1421     1425       +4

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3cc3daf...4d1a207. Read the comment docs.

tests/e2e/cluster_downgrade_test.go

henrybear327 · 2025-01-28T13:40:34Z

/retest

henrybear327 · 2025-01-28T13:53:14Z

Rebased to main as #19244 is merged

henrybear327 · 2025-01-28T16:41:33Z

/retest

ahrtr · 2025-01-28T20:08:39Z

@siyuanfoundation PTAL, thx

tests/e2e/cluster_downgrade_test.go

tests/framework/e2e/downgrade.go

henrybear327 · 2025-02-03T20:42:15Z

The more the number of nodes in the cluster is downgraded, the easier it is to cause the downgrade cancellation to timeout.

I don't think I have a better idea of reducing flakiness so far, so I added this retry mechanism. Might not be elegant, but at least the CI tests should be somewhat stable.

Tested with this command PASSES="release e2e" PKG=./tests/e2e TESTCASE="TestDowngradeCancellationAfterDowngrading4InClusterOf5" TIMEOUT=40m ./scripts/test.sh -v -count=50 -failfast, as this test usually times out on my machine every 5 runs or so.

henrybear327 · 2025-02-03T20:49:47Z

/retest

henrybear327 · 2025-02-03T21:23:31Z

/retest

tests/framework/e2e/downgrade.go

henrybear327 · 2025-02-04T13:07:36Z

/retest

siyuanfoundation · 2025-02-04T16:57:06Z

/retest

henrybear327 · 2025-02-04T17:10:06Z

/retest

Signed-off-by: Chun-Hung Tseng <[email protected]> Signed-off-by: Siyuan Zhang <[email protected]>

ahrtr

Thanks for working on this. Overall looks good to me.

After the downgrade is cancelled, users should be able to restore the cluster to its original state. They just need to replace the binary to previous version for each downgraded member. I think we should cover it as well. Of course, it can be addressed in followup PRs.

k8s-ci-robot · 2025-02-04T18:25:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, henrybear327, siyuanfoundation

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahrtr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

henrybear327 · 2025-02-04T18:27:14Z

Thanks for working on this. Overall looks good to me.

After the downgrade is cancelled, users should be able to restore the cluster to its original state. They just need to replace the binary to previous version for each downgraded member. I think we should cover it as well. Of course, it can be addressed in followup PRs.

I can quickly work on this.

Could you elaborate regarding the "replace the binary to the previous version for each downgraded member"?

Thank you @ahrtr!

ahrtr · 2025-02-04T18:41:41Z

Could you elaborate regarding the "replace the binary to the previous version for each downgraded member"?

For example, for a 3 member cluster of 3.6. You downloaded 2 of them to 3.5, then cancelled the downgrade. The cluster version should be 3.5.

The recovery process is straightforward, just replace binary for the 2 downloaded member one by one,

stop the member;
replace binary (using 3.6 binary in this example)
start the member

When it's done, the cluster version should be 3.6, and the cluster should be able to serve client requests correctly.

henrybear327 · 2025-02-04T19:12:49Z

Could you elaborate regarding the "replace the binary to the previous version for each downgraded member"?

For example, for a 3 member cluster of 3.6. You downloaded 2 of them to 3.5, then cancelled the downgrade. The cluster version should be 3.5.

The recovery process is straightforward, just replace binary for the 2 downloaded member one by one,

stop the member;

replace binary (using 3.6 binary in this example)

start the member

When it's done, the cluster version should be 3.6, and the cluster should be able to serve client requests correctly.

Maybe I am mistaken. Isn't this what we are doing in DowngradeUpgradeMembers now? :)

ahrtr · 2025-02-04T19:25:24Z

You can reuse DowngradeUpgradeMembers. The goal is to extend the e2e test case to verify that users are able to recover the partially downgraded cluster to its original state.

henrybear327 · 2025-02-04T19:52:00Z

You can reuse DowngradeUpgradeMembers. The goal is to extend the e2e test case to verify that users are able to recover the partially downgraded cluster to its original state.

I will submit a follow-up PR and see if it meets your expectation! :)

k8s-ci-robot added the area/etcdctl label Jan 21, 2025

henrybear327 self-assigned this Jan 21, 2025

k8s-ci-robot added the area/testing label Jan 21, 2025

henrybear327 marked this pull request as draft January 21, 2025 22:26

k8s-ci-robot added the do-not-merge/work-in-progress label Jan 21, 2025

henrybear327 requested a review from siyuanfoundation January 21, 2025 22:26

k8s-ci-robot added the size/L label Jan 21, 2025

henrybear327 mentioned this pull request Jan 21, 2025

Add downgrade cancellation e2e tests #17976

Open

henrybear327 commented Jan 21, 2025

View reviewed changes

tests/e2e/cluster_downgrade_test.go Show resolved Hide resolved

henrybear327 force-pushed the e2e/downgrade_cancellation_partial branch 3 times, most recently from 7afa927 to 9da71dc Compare January 28, 2025 12:34

henrybear327 requested review from serathius and ahrtr January 28, 2025 12:49

henrybear327 force-pushed the e2e/downgrade_cancellation_partial branch from 9da71dc to 6004df8 Compare January 28, 2025 13:00

henrybear327 marked this pull request as ready for review January 28, 2025 13:00

k8s-ci-robot removed the do-not-merge/work-in-progress label Jan 28, 2025

henrybear327 force-pushed the e2e/downgrade_cancellation_partial branch from 6004df8 to b88f908 Compare January 28, 2025 13:51

k8s-ci-robot added size/M and removed size/L labels Jan 28, 2025

siyuanfoundation reviewed Jan 28, 2025

View reviewed changes

tests/e2e/cluster_downgrade_test.go Outdated Show resolved Hide resolved

tests/framework/e2e/downgrade.go Show resolved Hide resolved

k8s-ci-robot added area/robustness-testing size/L and removed size/M labels Jan 29, 2025

henrybear327 force-pushed the e2e/downgrade_cancellation_partial branch from fe94c09 to 172edc8 Compare January 29, 2025 09:39

henrybear327 force-pushed the e2e/downgrade_cancellation_partial branch 4 times, most recently from 6359e49 to 5f6fd54 Compare February 3, 2025 20:38

siyuanfoundation reviewed Feb 3, 2025

View reviewed changes

tests/framework/e2e/downgrade.go Outdated Show resolved Hide resolved

tests/framework/e2e/downgrade.go Outdated Show resolved Hide resolved

henrybear327 force-pushed the e2e/downgrade_cancellation_partial branch 4 times, most recently from 73e07fd to b49c70b Compare February 4, 2025 09:43

siyuanfoundation approved these changes Feb 4, 2025

View reviewed changes

henrybear327 force-pushed the e2e/downgrade_cancellation_partial branch 2 times, most recently from 36f4722 to a15bcbd Compare February 4, 2025 17:20

Complete partial downgraded cluster cancellation

4d1a207

Signed-off-by: Chun-Hung Tseng <[email protected]> Signed-off-by: Siyuan Zhang <[email protected]>

henrybear327 force-pushed the e2e/downgrade_cancellation_partial branch from a15bcbd to 4d1a207 Compare February 4, 2025 17:31

ahrtr approved these changes Feb 4, 2025

View reviewed changes

k8s-ci-robot added the approved label Feb 4, 2025

ahrtr merged commit 3842a5a into etcd-io:main Feb 4, 2025
36 checks passed

henrybear327 deleted the e2e/downgrade_cancellation_partial branch February 4, 2025 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add downgrade cancellation e2e tests #19252

Add downgrade cancellation e2e tests #19252

henrybear327 commented Jan 21, 2025 •

edited

Loading

codecov bot commented Jan 21, 2025 •

edited

Loading

henrybear327 commented Jan 28, 2025

henrybear327 commented Jan 28, 2025

henrybear327 commented Jan 28, 2025

ahrtr commented Jan 28, 2025

henrybear327 commented Feb 3, 2025

henrybear327 commented Feb 3, 2025

henrybear327 commented Feb 3, 2025

henrybear327 commented Feb 4, 2025

siyuanfoundation commented Feb 4, 2025

henrybear327 commented Feb 4, 2025

ahrtr left a comment

k8s-ci-robot commented Feb 4, 2025

henrybear327 commented Feb 4, 2025

ahrtr commented Feb 4, 2025

henrybear327 commented Feb 4, 2025

ahrtr commented Feb 4, 2025

henrybear327 commented Feb 4, 2025 •

edited

Loading

Add downgrade cancellation e2e tests #19252

Add downgrade cancellation e2e tests #19252

Conversation

henrybear327 commented Jan 21, 2025 • edited Loading

codecov bot commented Jan 21, 2025 • edited Loading

Codecov Report

henrybear327 commented Jan 28, 2025

henrybear327 commented Jan 28, 2025

henrybear327 commented Jan 28, 2025

ahrtr commented Jan 28, 2025

henrybear327 commented Feb 3, 2025

henrybear327 commented Feb 3, 2025

henrybear327 commented Feb 3, 2025

henrybear327 commented Feb 4, 2025

siyuanfoundation commented Feb 4, 2025

henrybear327 commented Feb 4, 2025

ahrtr left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Feb 4, 2025

henrybear327 commented Feb 4, 2025

ahrtr commented Feb 4, 2025

henrybear327 commented Feb 4, 2025

ahrtr commented Feb 4, 2025

henrybear327 commented Feb 4, 2025 • edited Loading

henrybear327 commented Jan 21, 2025 •

edited

Loading

codecov bot commented Jan 21, 2025 •

edited

Loading

henrybear327 commented Feb 4, 2025 •

edited

Loading