Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: iteration on aws k8s upgrade docs #4099

Merged
merged 4 commits into from
May 21, 2024

Conversation

consideRatio
Copy link
Contributor

@consideRatio consideRatio commented May 19, 2024

  • No longer assumes empty user nodes
    Our docs previously assumed there were no users on the user nodes, assuming a re-creation upgrade strategy could be used. This assumption is removed in this iteration.
  • Removed out of scope section
    I removed the pre-requiesite section "Consider changes to template.jsonnet" that I now consider too out of scope to be suggested in docs to be done during a k8s upgrade.
  • Refactoring to centralize misc content
    Node upgrade strategies, AWS auth, notes on version skew, and maybe something more.
  • Check cluster status and activity
    Added step to overview what goes on in the cluster before upgrading. This can for example be used to rule out that something broke because of the upgrade (because it was already broken).
  • Facilitate upgrading multiple clusters in parallell
    Upgrading multiple clusters in parallell is reasonable, and I've now made it so that the guide is easier to scale to run in parallell.

Related

This was worked in preparation for #4009. If it wasn't done now, it would be harder to handle #4009 even though it wasn't part of scope of #4009 to get this done.

Review

I think the time efficient approach is to let this be practically reviewed by merging it and then using it - iterating on it further if needed to fix issues with it.

```{warning}
This upgrade will cause disruptions for users and trigger alerts for
[](uptime-checks). To help other engineers, communicate that your are starting a
cluster upgrade in the `#maintenance-notices` Slack channel and setup a [snooze](uptime-checks:snoozes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need a snooze, so that was removed. I relocated the note on communicating in slack to a dedicated step.

Comment on lines -11 to -15
```{warning}
We haven't yet established a policy for planning and communicating maintenance
procedures to users. So preliminary, only make a k8s cluster upgrade while the
cluster is unused or that the maintenance is communicated ahead of time.
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still haven't, but as long as we are only very little disruptive (brief networking etc) we can probably make upgrades while clusters are active anyhow.

I opted to remove this rather than refine this. I acknowledge we should have a policy, but I'd like us to collectively iterate and find agreement after collective experience rather than me declaring and motivating one in a PR about how to make a k8s upgrade technically.

@consideRatio consideRatio force-pushed the pr/upgrade-aws-upgrade-docs branch 2 times, most recently from e9d59e2 to 090ba6c Compare May 20, 2024 14:53
@consideRatio consideRatio force-pushed the pr/upgrade-aws-upgrade-docs branch from 090ba6c to 42dee07 Compare May 21, 2024 08:29
@consideRatio
Copy link
Contributor Author

Thanks for reviewing @GeorgianaElena!! I rebased and added a commit with some adjustments after trying them out myself.

@consideRatio consideRatio merged commit 482a255 into 2i2c-org:main May 21, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants