You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Gardener use case shoots are hibernated and as part of hibernating a shoot the following is done:
A full snapshot is taken for clusters where backup is enabled. This ensures that when the etcd-cluster wakes up it does not lose any data and can reliably restore from the last known state.
etcd-cluster is scaled down to 0.
This issue proposes that hibernation and wake-up from hibernation be offered as functionalities directly in etcd-druid which can be used in and outside of gardener context. The consumer can mark an etcd-cluster to be hibernated and watch the Etcd.Status to know the status of hibernation. The steps that are done in order to hibernate a cluster are determined based on the Etcd resource itself.
Following points can be considered when defining the API:
API provision should be made to signal request to hibernate and request to wake-up.
An etcd cluster can be created with backup disabled. For these clusters hibernation will not really include taking a full snapshot as there is no safety net configured to backup full and delta snapshots.
Optimize costs on cluster hibernation #859 talks about how to optimize costs when hibernating etcd clusters. This could be made configurable and could be used for clusters which do not wish to have a backup but would want to preserve the data only via network attached disks that are attached to the node (PV) and used by etcd pods. (This is optional).
Taking and uploading a full snapshot could error out. One could offer two modes to take backups before scaling down - preferred | required. If it is required then consider providing a timeout beyond which etcd-druid will no longer retry and will report failure. Manual intervention is then required to correct the issue blocking the taking of full snapshot and uploading it to a bucket and then the operation can be re-triggered.
Why is this needed:
Hibernation and wake up of etcd clusters are already supported in gardener via reconcile loops running in gardenlet. However it is not exposed as a functionality to non-gardener users. Having a clear and well-defined API to signal hibernation and wake-up of an etcd cluster would ease consumption. It also semantically makes sense for etcd-druid (an etcd operator) to abstract all activities that are performed as part of hibernation and wake-up reducing the burden on the consumers to understand the intricacies/details.
The text was updated successfully, but these errors were encountered:
How to categorize this issue?
/area control-plane
/kind enhancement
What would you like to be added:
In
Gardener
use case shoots are hibernated and as part of hibernating a shoot the following is done:This issue proposes that hibernation and wake-up from hibernation be offered as functionalities directly in etcd-druid which can be used in and outside of gardener context. The consumer can mark an etcd-cluster to be hibernated and watch the
Etcd.Status
to know the status of hibernation. The steps that are done in order to hibernate a cluster are determined based on theEtcd
resource itself.Following points can be considered when defining the API:
preferred
|required
. If it isrequired
then consider providing a timeout beyond which etcd-druid will no longer retry and will report failure. Manual intervention is then required to correct the issue blocking the taking of full snapshot and uploading it to a bucket and then the operation can be re-triggered.Why is this needed:
Hibernation and wake up of etcd clusters are already supported in gardener via reconcile loops running in gardenlet. However it is not exposed as a functionality to non-gardener users. Having a clear and well-defined API to signal hibernation and wake-up of an etcd cluster would ease consumption. It also semantically makes sense for etcd-druid (an etcd operator) to abstract all activities that are performed as part of hibernation and wake-up reducing the burden on the consumers to understand the intricacies/details.
The text was updated successfully, but these errors were encountered: