Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Ephemeral Storage #887

Open
vlerenc opened this issue Oct 8, 2024 · 0 comments
Open

Support Ephemeral Storage #887

vlerenc opened this issue Oct 8, 2024 · 0 comments
Labels
area/cost Cost related

Comments

@vlerenc
Copy link
Member

vlerenc commented Oct 8, 2024

What would you like to be added:
Please support the operation of ETCD with ephemeral persistent volumes (sounds like a contradiction), e.g. hostpath or better/safer yet local, so that network attached persistent volumes can be avoided that are often a scarce machine resource (e.g. AWS can only attach 26 resp. 32 volumes for most machine types; Alicloud and Azure even less).

Why is this needed:
We observe that machines can rarely be fully utilised because of the high ratio of pods-with-volumes to pods-without-volumes in a Gardener managed shoot cluster control plane. If the ETCD for events could be configured to avoid network attached persistent volumes, we could improve the machine utilisation considerably (at the expense of only limited additional network costs to "catch up" when a pod is moved to another node).

Considerations:

  • Losing 1 of 1 pods (non-HA) or 2 of 3 pods (HA) will result in an unrecoverable permanent quorum loss. Because without network attached persistent volumes this could happen more frequently, ETCD druid should detect that and in the case of ephemeral persistent volumes, discard the statefulset and recreate it from scratch (in the context of events, this seems acceptable in many cases as the default events TTL is anyway only 1h and events are no critical/essential resource for the operation of a cluster).
  • While backup and restore can be added (later), it doesn't have to be added right from the start. Whoever uses ETCD druid should have the liberty to decide for ephemeral persistent volumes.
  • In order to stick to stateful sets (we don't have to, but it would make things easier), we need to find a PV(C) type that would work for us, e.g. local. So we need to experiment with it and see whether it works as expected, can be dynamically configured (now multiple ETCD pods would need different local paths on the node), and also the cleanup works (data is deleted once the pod is descheduled from the node).
@vlerenc vlerenc added the area/cost Cost related label Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cost Cost related
Projects
None yet
Development

No branches or pull requests

1 participant