Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR Discussion - CI chart upgrade tests #1427

Closed
consideRatio opened this issue Sep 27, 2019 · 7 comments
Closed

PR Discussion - CI chart upgrade tests #1427

consideRatio opened this issue Sep 27, 2019 · 7 comments

Comments

@consideRatio
Copy link
Member

While seeing pangeo-data/pangeo-cloud-federation#418, and simply because releasing things for z2jh can be hard, I'd like to make a upgrade test for our CI system that does an helm chart upgrade from the latest minor release.

I feel hopeful about doing this now that I've spent a lot of time working on #1422!

@consideRatio consideRatio changed the title PR Discussion PR Discussion - CI Upgrade test Sep 27, 2019
@consideRatio
Copy link
Member Author

consideRatio commented Sep 28, 2019

Upgrade tests

helm upgrade's can change the chart version, as well as chart configuration, and we want to make sure we test some aspects. Note that changing the chart version can often be seen as changing the configuration.

Changing a Service type to ClusterIP from LoadBalancer

Transition to a Services using ClusterIP instead of using nodePort explicitly or indirectly as required for use with nginx-ingress. Here is a related code block that I think needs fixing. For an error example, see #641 (comment).

# upgrading to this could cause issues
proxy:
  service:
    type: ClusterIP

Changing Deployment's spec.strategy to Recreate

See issue (re-)found in #1404.

Upgrading from something before #1401 to #1404 and after could cause issues, but I'm not sure when, and if they are only caused without the use of --force or not. In the test we should stick to not using --force and realize if we require it for some changes.

Running users remain running (#786 (comment))

Such a test should also cover upgrading while there are running users to make sure that the updated Hub doesn't lose track of users and proxy routes, etc. Not every Hub version upgrade will allow this, but if it doesn't work we should know about it and advertise it loudly.

Other

  • Do we need to run --upgrade-db between tagged releases?

@consideRatio consideRatio changed the title PR Discussion - CI Upgrade test PR Discussion - CI chart upgrade tests Sep 28, 2019
@consideRatio consideRatio mentioned this issue Oct 7, 2019
12 tasks
@consideRatio
Copy link
Member Author

Upgrade tests takes longer time, and is perhaps most important before releases. So... I figure making them something you run now and then manually could make sense.

By triggering a custom build, and having a .travis.yml that builds different depending on if: type = api or not. one could also use the pretended commit message to signal what kind of build one would want in the .travis.yml file.

https://blog.travis-ci.com/2017-08-24-trigger-custom-build

@manics
Copy link
Member

manics commented Oct 19, 2019

Does it matter if the upgrade test takes a long time, especially if it's part of a Travis matrix build so you can see the results of the non-upgrade tests early if you really need to? I think asking people to decide whether an upgrade test is required or not for a PR is an unnecessary extra step. If the PR shouldn't affect the upgrade the test should pass quietly and no-one cares, if it fails due to a unexpected side-effect it's done it's job!

If the upgrade test did turn out to be a problem it could be allowed to fail in the Travis matrix, so there's at least a record of when the process potentially broke.

@consideRatio
Copy link
Member Author

consideRatio commented Oct 19, 2019

@manics deal I agree.

To mitigate CI delay issues, we perhaps also limit ourselves to four jobs in our test stage. And also... feels kind of silly to fix this now and not sooner, but we should not rerun all tests if no changes has been made to the chart (#1451)

@consideRatio
Copy link
Member Author

TravisCI will always run a primitive upgrade test now, but locally you wont, as of #1459. There is room for improvements, and the "primitive part" of the upgrade tests, I mean that it only verifies that when we go from 0.8.2 to current master, we land on a functional state without errors during the upgrade process if we use the save config before and after.

So, we should perhaps refer to this as an "chart version upgrade test" as compared to "configuration change upgrade test" or similar, which is also of relevance to try for various configurations.

@manics
Copy link
Member

manics commented Oct 30, 2019

Supporting and testing a major upgrade with the same config sounds good and is the process that most people will go through.

Testing changes in configuration with or without an upgrade involves a potentially infinite number of combinations, might be nice to have but assuming(!) Helm cleans-up after itself the main persistent state will be storage volumes and the JupyterHub DB. The latter could be mostly tested in JupyterHub or some other repo outside Z2JH, there's no need for a full K8S deployment in most cases.

@consideRatio
Copy link
Member Author

We are now doing "chart version upgrade tests" - closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants