CA-MCM overhaul #251

himanshu-kun · 2023-09-25T09:48:57Z

Reason Discussion:

Currently there are some CA-MCM interaction issues, which we want to fix. One solution is to change the entire CA-MCM working which is seen currently.
This issue is to discuss the feasability of such approaches

Terms for the discussion (to avoid confusion):

k/CA = kubernetes CA
g/CA = gardener CA (fork of k/CA)
new-CA = new CA code we'll implement which could be a component or a library

Dimensions of discussion:

Possible Goals
1. Use new-CA as a library inside MCM, new-CA library is just recommending and MCM is deciding. Currently g/CA has a binding recommendation
  - Ditch entire g/CA , design, implement from scratch. Basically leverage more kube-scheduler predicates directly
  - Get rid of node-groups
  - Benefit
    - Can support more than 1000 nodes as CA only supports
    - Can fit more pods on the nodes
2. Leverage current k/CA
  - Combine MCM into g/CA, so CA runs MCM controller, and ditch current MCM controller completely
  - we still maintain the fork, but the aim is to leverage the current features and community support with upstream offers
  - Benefit:
    - solves MCM is down and CA is up kind of issues
    - Targeted removal of machine can be easier
High Demand stories (which use current design)
- Allow deletion of a node (True deletion API) #227
- Early abort/backoff support for Gardener nodegroups a.k.a machinedeployments #154
- other relatively smaller bugfixes list in CA-MCM board
Impact of overhaul to deal with current problems
- What current CA functionality which are unpleasant (need to verify them)
  - Kube-scheduler config can be different from CA imported scheduler code
  - Limitation of 1 machine type per node grp
  - Many CLI flags in k/CA which could confuse customer
  - Can’t handle waitForFirstConsumer PVs
  - Increase utilisation of seeds , but doesn't seem to be done with current CA
  - Scale-down treated secondary, Scale-up treated as primary goal
    - Scale-down not supported in same RunOnce() flow, if scale-up happened / until it happens, or scale-down in cool-down
Time required to be invested (excluding any time spent on current design and other dev tasks)
- 1 yr min.
Maintenance effort, Support
- need to deal with all the issues(verifying them), implementing them even if they are provided by k/CA
- community support will be lost
Rollout strategy (if implementing)
- keeping the current design running , and deploying MCM with recommendary CA (Goal 1) and compare the recommendations

The text was updated successfully, but these errors were encountered:

himanshu-kun · 2023-09-25T09:49:39Z

/assign @elankath @unmarshall @rishabh-11

himanshu-kun · 2023-09-25T09:49:48Z

/assign

vlerenc · 2023-09-25T10:03:59Z

I was wondering whether we can leverage the "ground truth" a.k.a. the kube-scheduler more directly (no CA at all), e.g.:

Have "simulated" (non-existing) nodes and provide the machines after the kube-scheduler scheduled non-daemonset pods to it and then move the pods over once the machine has joined the cluster, but that's probably too ugly/visible to the end users
Run a second kube-scheduler that accesses a restricted KAPI proxy that proxies the real KAPI in which these fake nodes are added (the rest is similar to above: once the machine comes up and joins the cluster the pods are moved to the real nodes and the fake one remains "open"; if a fake node is full, more fake nodes are added, in case more capacity is required).

The point is, every simulation that is not based on the real kube-scheduler will be flawed, so why not find a way to trick it into doing what we need instead of an approximation (like the CA tries)?

elankath · 2023-09-25T10:10:49Z

Running a second kube-scheduler (can run in-process) that operates on a simulated model is a pretty nice idea. It would also alleviate the implementation efforts for such a large task. Though do we have any gardener customers that run custom schedulers ?

vlerenc · 2023-09-25T10:50:07Z

@elankath I don't know, but the Kubernetes or Gardener CA or our own simulation would also not match such a custom scheduler. The reason we introduced kube-scheduler configurability (e.g. bin-packing) was because even large teams didn't want to run control plane components by themselves. I don't think we need to consider those cases and if, then we probably shouldn't consider an automated solution and rather provide them with an API so that they can provision and deprovision nodes themselves. They have then to build the bridge between their scheduler and our API themselves, but unless somebody asks, I wouldn't even consider that. I have difficulties imagining, many/some/anybody running their own kube-scheduler.

himanshu-kun added kind/enhancement Enhancement, improvement, extension kind/discussion Discussion (enaging others in deciding about multiple options) priority/1 Priority (lower number equals higher priority) labels Sep 25, 2023

gardener-robot assigned elankath, rishabh-11 and unmarshall Sep 25, 2023

gardener-robot assigned himanshu-kun Sep 25, 2023

gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jun 3, 2024

gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA-MCM overhaul #251

CA-MCM overhaul #251

himanshu-kun commented Sep 25, 2023

himanshu-kun commented Sep 25, 2023

himanshu-kun commented Sep 25, 2023

vlerenc commented Sep 25, 2023

elankath commented Sep 25, 2023

vlerenc commented Sep 25, 2023

CA-MCM overhaul #251

CA-MCM overhaul #251

Comments

himanshu-kun commented Sep 25, 2023

himanshu-kun commented Sep 25, 2023

himanshu-kun commented Sep 25, 2023

vlerenc commented Sep 25, 2023

elankath commented Sep 25, 2023

vlerenc commented Sep 25, 2023