PR Discussion - CI chart upgrade tests #1427

consideRatio · 2019-09-27T18:30:40Z

While seeing pangeo-data/pangeo-cloud-federation#418, and simply because releasing things for z2jh can be hard, I'd like to make a upgrade test for our CI system that does an helm chart upgrade from the latest minor release.

I feel hopeful about doing this now that I've spent a lot of time working on #1422!

consideRatio · 2019-09-28T09:15:48Z

Upgrade tests

helm upgrade's can change the chart version, as well as chart configuration, and we want to make sure we test some aspects. Note that changing the chart version can often be seen as changing the configuration.

Changing a Service `type` to `ClusterIP` from `LoadBalancer`

Transition to a Services using ClusterIP instead of using nodePort explicitly or indirectly as required for use with nginx-ingress. Here is a related code block that I think needs fixing. For an error example, see #641 (comment).

# upgrading to this could cause issues
proxy:
  service:
    type: ClusterIP

Changing Deployment's `spec.strategy` to `Recreate`

See issue (re-)found in #1404.

Upgrading from something before #1401 to #1404 and after could cause issues, but I'm not sure when, and if they are only caused without the use of --force or not. In the test we should stick to not using --force and realize if we require it for some changes.

Running users remain running (#786 (comment))

Such a test should also cover upgrading while there are running users to make sure that the updated Hub doesn't lose track of users and proxy routes, etc. Not every Hub version upgrade will allow this, but if it doesn't work we should know about it and advertise it loudly.

Other

Do we need to run --upgrade-db between tagged releases?

consideRatio · 2019-10-18T15:27:23Z

Upgrade tests takes longer time, and is perhaps most important before releases. So... I figure making them something you run now and then manually could make sense.

By triggering a custom build, and having a .travis.yml that builds different depending on if: type = api or not. one could also use the pretended commit message to signal what kind of build one would want in the .travis.yml file.

https://blog.travis-ci.com/2017-08-24-trigger-custom-build

manics · 2019-10-19T11:56:28Z

Does it matter if the upgrade test takes a long time, especially if it's part of a Travis matrix build so you can see the results of the non-upgrade tests early if you really need to? I think asking people to decide whether an upgrade test is required or not for a PR is an unnecessary extra step. If the PR shouldn't affect the upgrade the test should pass quietly and no-one cares, if it fails due to a unexpected side-effect it's done it's job!

If the upgrade test did turn out to be a problem it could be allowed to fail in the Travis matrix, so there's at least a record of when the process potentially broke.

consideRatio · 2019-10-19T12:06:57Z

@manics deal I agree.

To mitigate CI delay issues, we perhaps also limit ourselves to four jobs in our test stage. And also... feels kind of silly to fix this now and not sooner, but we should not rerun all tests if no changes has been made to the chart (#1451)

consideRatio · 2019-10-29T06:21:27Z

TravisCI will always run a primitive upgrade test now, but locally you wont, as of #1459. There is room for improvements, and the "primitive part" of the upgrade tests, I mean that it only verifies that when we go from 0.8.2 to current master, we land on a functional state without errors during the upgrade process if we use the save config before and after.

So, we should perhaps refer to this as an "chart version upgrade test" as compared to "configuration change upgrade test" or similar, which is also of relevance to try for various configurations.

manics · 2019-10-30T21:59:46Z

Supporting and testing a major upgrade with the same config sounds good and is the process that most people will go through.

Testing changes in configuration with or without an upgrade involves a potentially infinite number of combinations, might be nice to have but assuming(!) Helm cleans-up after itself the main persistent state will be storage volumes and the JupyterHub DB. The latter could be mostly tested in JupyterHub or some other repo outside Z2JH, there's no need for a full K8S deployment in most cases.

consideRatio · 2020-09-30T16:07:06Z

We are now doing "chart version upgrade tests" - closing this.

consideRatio changed the title ~~PR Discussion~~ PR Discussion - CI Upgrade test Sep 27, 2019

consideRatio changed the title ~~PR Discussion - CI Upgrade test~~ PR Discussion - CI chart upgrade tests Sep 28, 2019

manics mentioned this issue Oct 2, 2019

Adding a CI step to test the upgrade path #786

Closed

consideRatio mentioned this issue Oct 7, 2019

Release 0.9 #1313

Closed

12 tasks

consideRatio closed this as completed Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR Discussion - CI chart upgrade tests #1427

PR Discussion - CI chart upgrade tests #1427

consideRatio commented Sep 27, 2019

consideRatio commented Sep 28, 2019 •

edited

Loading

consideRatio commented Oct 18, 2019

manics commented Oct 19, 2019

consideRatio commented Oct 19, 2019 •

edited

Loading

consideRatio commented Oct 29, 2019

manics commented Oct 30, 2019

consideRatio commented Sep 30, 2020

PR Discussion - CI chart upgrade tests #1427

PR Discussion - CI chart upgrade tests #1427

Comments

consideRatio commented Sep 27, 2019

consideRatio commented Sep 28, 2019 • edited Loading

Upgrade tests

Changing a Service type to ClusterIP from LoadBalancer

Changing Deployment's spec.strategy to Recreate

Running users remain running (#786 (comment))

Other

consideRatio commented Oct 18, 2019

manics commented Oct 19, 2019

consideRatio commented Oct 19, 2019 • edited Loading

consideRatio commented Oct 29, 2019

manics commented Oct 30, 2019

consideRatio commented Sep 30, 2020

consideRatio commented Sep 28, 2019 •

edited

Loading

Changing a Service `type` to `ClusterIP` from `LoadBalancer`

Changing Deployment's `spec.strategy` to `Recreate`

consideRatio commented Oct 19, 2019 •

edited

Loading