-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s maintenance: k8s node pool upgrades on EKS clusters #4009
Comments
Should be paired with someone else (TBD) |
@sgibson91 agreed to do this work this during this sprint. I'm around to discuss anything if requested. |
Most clusters are done, there are only the remaining clusters which have some user servers still up
|
2 of the nasa-cryo user servers have been up for >2days and 1 of the nasa-ghg user serves has been up for 36 hours. I suspect these might be abandoned so maybe they're fine to kick off? |
You could do a rolling upgrade without using drain on the node pool they are using, and in the "taint and wait step" you push the changes up to that point and get a PR merged - leaving only a comment saying we need to also delete a node pool, something that hopefully can be done next week at least. |
I've done that for nasa-cryo. I was about to do it for nasa-ghg, but the public ssh key isn't in the repo and so eksctl commands failed 😕 |
Is there a way to reverse this command?
|
@sgibson91 ah hmmm okay! Hmmmm, deleting a taint can be done by |
Thanks Erik. I used that command to remove the taint, and have opened #4122 |
Upgrading the core node group for nasa-ghg was also going to remove node group "nb-c5-4xlarge", which I believe comes from #4100, so I used |
I've put in a reminder for myself on Tuesday to check if the old node groups have drained (Monday is a holiday in the UK and I'll be coming back from a weekend away) |
@consideRatio I learned a one-liner! Example, to remove the
The key is the extra So I imagine a more targeted command would be
|
Thank you for sharing this @sgibson91!! |
#4007 upgraded all control planes to 1.29, but we need to bring the node pools to 1.29 as well. Documentation on how to do this was updated in #4099 and is made available at https://infrastructure.2i2c.org/howto/upgrade-cluster/aws/ - where step 4 about upgrading the control plane is already done.
Clusters with node groups to upgrade
Can't be upgraded due to missing public ssh key, tracked in nasa-ghg cluster missing public ssh key from the repo #4122The text was updated successfully, but these errors were encountered: