generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 320
Open
Description
I have a question about recovering EKS-A provisioned clusters from a broken state.
Suppose a cluster has failed machines in both the control plane and worker nodes, and these machines are assumed to be unrecoverable (=physically broken and have to add new baremetal machines). How should this be handled when we want to execute cluster upgrade?
example@example-admin:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
example-cp3-26 Ready control-plane 191d v1.28.15
example-cp3-27 NotReady,SchedulingDisabled control-plane 191d v1.29.13
example-cp5-26 Ready control-plane 191d v1.28.15
example-gpu-wk3-9 NotReady,SchedulingDisabled <none> 191d v1.29.13
example-gpu-wk5-11 Ready <none> 191d v1.28.15
Are we expected to manually add healthy control plane and worker nodes to proceed with the cluster upgrade?
Or are we expected to re-provision the cluster from scratch and execute backups?
I’m trying to understand the intended recovery path when the cluster is in an unstable state and cannot be restored using the originally provisioned machines.
Metadata
Metadata
Assignees
Labels
No labels