Use Etcd druid to set tolerations, node affinity and TSC policies for HA etcd clusters #899
Labels
area/control-plane
Control plane related
area/high-availability
High availability related
kind/enhancement
Enhancement, improvement, extension
How to categorize this issue?
/area control-plane
/area high-availability
/kind enhancement
What would you like to be added:
Add capability in
etcd-druid
to determine and add NodeAffinity, TSC and Tolerations to etcd StatefulSet pods for a HA etcd cluster.Why is this needed:
When etcd-druid is used in gardener today then gardener-resource-manager HA webhook does the following:
Unfortunately while mutating the TSC policies the
LabelSelector
is set taking the labels from PodTemplateSpec.Labels (of the respective STS). See code. This is problematic as new labels can be added during an upgrade and all labels are not used to uniquely identify the pods belonging to a StatefulSet (i.e. an etcd cluster).So imagine the following scenario:
Starting State:
There is an non-HA etcd-cluster (replicas=1). Lets assume that the pods of a StatefulSet provisioned for the etcd cluster has the following labels:
Pod
etcd-test-0
is currently scheduled inzone-A
.Upgrade etcd cluster to HA
druid.gardener.cloud/etcd-cluster-size
is added to the STS.etcd-test-1
andetcd-test-2
come up they have the new label as well.etcd-pod-1
andetcd-pod-2
pods will be visible.etcd-pod-0
is not included in the set as the labels differ.zone-A
and lets assume the other pod gets scheduled inzone-B
.etcd-pod-0
is now updated and post update it has the new label as well. So now when TSC is evaluated again then this pod cannot be placed inzone-A
because there is already one more pod there and TSC saysmaxSkew
across zones is 1. It also cannot be scheduled onto any other zone because its PV is bound tozone-A
therefore this pod will remain pending.After discussing with @timuthy it was agreed that since there is no generic way to find out subset of labels that will be used to uniquely create the label selector for a StatefulSet, therefore it is prudent to allow
etcd-druid
to set the TSC. Since etcd-druid is setting the TSC it can also then set the other things as well - tolerations and node affinity.Gardener should still mutate the replicas to 3.
The text was updated successfully, but these errors were encountered: