Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-MCM overhaul #251

Open
himanshu-kun opened this issue Sep 25, 2023 · 5 comments
Open

CA-MCM overhaul #251

himanshu-kun opened this issue Sep 25, 2023 · 5 comments
Assignees
Labels
kind/discussion Discussion (enaging others in deciding about multiple options) kind/enhancement Enhancement, improvement, extension lifecycle/stale Nobody worked on this for 6 months (will further age) priority/1 Priority (lower number equals higher priority)

Comments

@himanshu-kun
Copy link

Reason Discussion:

Currently there are some CA-MCM interaction issues, which we want to fix. One solution is to change the entire CA-MCM working which is seen currently.
This issue is to discuss the feasability of such approaches

Terms for the discussion (to avoid confusion):

k/CA = kubernetes CA
g/CA = gardener CA (fork of k/CA)
new-CA = new CA code we'll implement which could be a component or a library

Dimensions of discussion:

  1. Possible Goals
    1. Use new-CA as a library inside MCM, new-CA library is just recommending and MCM is deciding. Currently g/CA has a binding recommendation

      • Ditch entire g/CA , design, implement from scratch. Basically leverage more kube-scheduler predicates directly
      • Get rid of node-groups
      • Benefit
        • Can support more than 1000 nodes as CA only supports
        • Can fit more pods on the nodes
    2. Leverage current k/CA

      • Combine MCM into g/CA, so CA runs MCM controller, and ditch current MCM controller completely
      • we still maintain the fork, but the aim is to leverage the current features and community support with upstream offers
      • Benefit:
        • solves MCM is down and CA is up kind of issues
        • Targeted removal of machine can be easier
  2. High Demand stories (which use current design)
  3. Impact of overhaul to deal with current problems
    • What current CA functionality which are unpleasant (need to verify them)
      • Kube-scheduler config can be different from CA imported scheduler code
      • Limitation of 1 machine type per node grp
      • Many CLI flags in k/CA which could confuse customer
      • Can’t handle waitForFirstConsumer PVs
      • Increase utilisation of seeds , but doesn't seem to be done with current CA
      • Scale-down treated secondary, Scale-up treated as primary goal
        • Scale-down not supported in same RunOnce() flow, if scale-up happened / until it happens, or scale-down in cool-down
  4. Time required to be invested (excluding any time spent on current design and other dev tasks)
    • 1 yr min.
  5. Maintenance effort, Support
    • need to deal with all the issues(verifying them), implementing them even if they are provided by k/CA
    • community support will be lost
  6. Rollout strategy (if implementing)
    • keeping the current design running , and deploying MCM with recommendary CA (Goal 1) and compare the recommendations
@himanshu-kun himanshu-kun added kind/enhancement Enhancement, improvement, extension kind/discussion Discussion (enaging others in deciding about multiple options) priority/1 Priority (lower number equals higher priority) labels Sep 25, 2023
@himanshu-kun
Copy link
Author

/assign @elankath @unmarshall @rishabh-11

@himanshu-kun
Copy link
Author

/assign

@vlerenc
Copy link
Member

vlerenc commented Sep 25, 2023

I was wondering whether we can leverage the "ground truth" a.k.a. the kube-scheduler more directly (no CA at all), e.g.:

  • Have "simulated" (non-existing) nodes and provide the machines after the kube-scheduler scheduled non-daemonset pods to it and then move the pods over once the machine has joined the cluster, but that's probably too ugly/visible to the end users
  • Run a second kube-scheduler that accesses a restricted KAPI proxy that proxies the real KAPI in which these fake nodes are added (the rest is similar to above: once the machine comes up and joins the cluster the pods are moved to the real nodes and the fake one remains "open"; if a fake node is full, more fake nodes are added, in case more capacity is required).

The point is, every simulation that is not based on the real kube-scheduler will be flawed, so why not find a way to trick it into doing what we need instead of an approximation (like the CA tries)?

@elankath
Copy link

Running a second kube-scheduler (can run in-process) that operates on a simulated model is a pretty nice idea. It would also alleviate the implementation efforts for such a large task. Though do we have any gardener customers that run custom schedulers ?

@vlerenc
Copy link
Member

vlerenc commented Sep 25, 2023

@elankath I don't know, but the Kubernetes or Gardener CA or our own simulation would also not match such a custom scheduler. The reason we introduced kube-scheduler configurability (e.g. bin-packing) was because even large teams didn't want to run control plane components by themselves. I don't think we need to consider those cases and if, then we probably shouldn't consider an automated solution and rather provide them with an API so that they can provision and deprovision nodes themselves. They have then to build the bridge between their scheduler and our API themselves, but unless somebody asks, I wouldn't even consider that. I have difficulties imagining, many/some/anybody running their own kube-scheduler.

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/discussion Discussion (enaging others in deciding about multiple options) kind/enhancement Enhancement, improvement, extension lifecycle/stale Nobody worked on this for 6 months (will further age) priority/1 Priority (lower number equals higher priority)
Projects
None yet
Development

No branches or pull requests

6 participants