Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-state-metrics with autosharding stops updating shards when the labels of the statefulset are updated #2355

Open
pkoutsovasilis opened this issue Mar 29, 2024 · 20 comments · May be fixed by #2347
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@pkoutsovasilis
Copy link

pkoutsovasilis commented Mar 29, 2024

What happened:

When I updated the labels of the ksm statefulset when kube-state-metrics are deployed with autosharding updates, and thus the recalculation of shards, stop getting to pods that don't get restarted

What you expected to happen:

I would expect shards update to happen even when statefulset labels are updated, as in k8s this is allowed and it doesn't cause the pods to get restarted

How to reproduce it (as minimally and precisely as possible):

  1. clone kube-state-metrics repo git clone https://github.com/kubernetes/kube-state-metrics && cd kube-state-metrics
  2. with the editor of your choice add a label to the metadata of the sts prone to change in examples/autosharding/statefulset.yaml e.g. app.revision: '1' and set replicas to 2
  3. deploy ksm with autosharding kubectl apply -k ./examples/autosharding
  4. change the label kubectl patch sts kube-state-metrics -n kube-system --patch '{"metadata": {"labels": {"app.revision": "2"}}}'
  5. scale the statefulset to 3 replicas kubectl scale statefulsets kube-state-metrics --replicas=3 -n kube-system
  6. checks the logs of pod/kube-state-metrics-0 and pod/kube-state-metrics-1 they didn't update their shards calculation. This is recovered only after all pods are restarted and labels are not changed again.

Anything else we need to know?:

Environment:

  • kube-state-metrics version: 2.11.0
  • Kubernetes version (use kubectl version):
    Client Version: v1.29.3
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.27.7-gke.1121002
    
  • Cloud provider or hardware configuration: Google Kubernetes Engine (GKE)
  • Other info: N/A
@pkoutsovasilis pkoutsovasilis added the kind/bug Categorizes issue or PR as related to a bug. label Mar 29, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 29, 2024
@pkoutsovasilis pkoutsovasilis changed the title kube-state-metrics with autosharding stop updating shard when statefulset labels are updated kube-state-metrics with autosharding stops updating shards when the labels of the statefulset are updated Mar 29, 2024
@pkoutsovasilis
Copy link
Author

@CatherineF-dev I tag you since I see you active throughout the issues and I really thank you in advance; would you know who is more appropriate to have a look at this issue and the PR I have opened about it? 🙂

@CatherineF-dev
Copy link
Contributor

I can have a look. Will try reproducing this issue to understand what happened.

@CatherineF-dev
Copy link
Contributor

I prefer label selectors which is native in k8s.

qq: why do you want to change labels? Want to see whether it's a common use case or not.

@pkoutsovasilis
Copy link
Author

pkoutsovasilis commented Apr 9, 2024

I prefer label selectors which is native in k8s.

qq: why do you want to change labels? Want to see whether it's a common use case or not.

field selector is native in k8s as well?! So most, if not all, operators use label updating to keep track of things as this doesn't cause underlying pods to get restarted based on native behavior of k8s. So I was trying to embed kube-state metric in an operator when it happened to stumble upon this. In my understanding field selector is more robust as it eliminates any scenario that pods stop calculating properly the shards. Would you care expanding a little bit more why you prefer label selectors vs field selector and if you can what scenario you see that field selector falls short against label one?

@CatherineF-dev
Copy link
Contributor

From my experience, label selector is more dynamic and unique compared to field selector.

I was considering a case where there are two KSMs running (it happens when deploying another KSM using commercial k8s ). Using label selector is easier than field selector.

Could you use label selector and other ways to mitigate this issue?

@pkoutsovasilis
Copy link
Author

pkoutsovasilis commented Apr 9, 2024

if field selector is a show stopper, how about defining a cli arg that allows to specify labels to be excluded from the label selector, you know the ones that are prone to change? Thus we can maintain the label selector and give the user the option to manually mitigate this?

@CatherineF-dev
Copy link
Contributor

change the label kubectl patch sts kube-state-metrics -n kube-system --patch '{"metadata": {"labels": {"app.revision": "2"}}}'

qq: why do you want to update label here? What are the use cases of this new label?

@pkoutsovasilis
Copy link
Author

As a simple example, it is a common pattern for operators to save some short of a hash in the labels of the k8s object, a statefulset in our case, so next time a reconcile fires it can compute the hash of the statefulset and compare it with the one in the labels. If they match it means that reconcile loop of the operator can go to the next step, if any. On the opposite, this is our scenario, it does any step it needs to do regarding this k8s object, e.g. update a secret, and write the new hash in the labels. Thus an operator by using the labels that don't cause any workload restart can keep a state of what he has processed through its reconcile loop and what it hasn't. So this label is changed by the operator. Does this example help a little bit more? 🙂

@pkoutsovasilis
Copy link
Author

pkoutsovasilis commented Apr 9, 2024

From my experience, label selector is more dynamic and unique compared to field selector.

I was considering a case where there are two KSMs running (it happens when deploying another KSM using commercial k8s ). Using label selector is easier than field selector.

Could you use label selector and other ways to mitigate this issue?

I am still trying to wrap my head around that. So here in the AddFunc of the SharedIndexInformer and in the UpdateFunc we check for the name isn't that equivalent of a field selector? because if we filter out the events by sts name what is the purpose of the label selector?!

@pkoutsovasilis
Copy link
Author

and on that page this is a little bit more tricky because when somebody deploys kube-state-metric without any labels on the statefulset level, these are not mandatory, we get events for all sts in the namespace?! fields.SelectorFromSet(ss.Labels) with empty ss.Labels returns Everything() selector. IMO that's another valid reason for this to be a field selector 🙂

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Apr 9, 2024

So here in the AddFunc of the SharedIndexInformer and in the UpdateFunc we check for the name isn't that equivalent of a field selector? because if we filter out the events by sts name what is the purpose of the label selector?!

The existing code can handle 2 KSMs case easier.

  • Watch: labelSelector
  • UpdateFunc/AddFunc: a simple check

@pkoutsovasilis
Copy link
Author

pkoutsovasilis commented Apr 9, 2024

Why field selector doesn't support 2 KSMs easy!? It is bound to namespace and it essentially allows the same events, wrt the label selector, that pass the if checks to flow through the SharedIndexInformer. But probably I am missing something so could you please help me

@pkoutsovasilis
Copy link
Author

this is a kube-state-metric deployed with my PR that contains field-selector so:

  • two KSMs separate namespace different name
output.mp4
  • two KSMs same namespace different name
output2.mp4

I can do more experiments if you help with the case that field selector is not able to cover

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Apr 9, 2024

two KSMs separate namespace different name

Was considering same name case. It might be a breaking change to users who use labelSelector to differentiate two KSMs.

@CatherineF-dev
Copy link
Contributor

This is recovered only after all pods are restarted and labels are not changed again.

Is it self-recovered?

@pkoutsovasilis
Copy link
Author

pkoutsovasilis commented Apr 9, 2024

Was considering same name case. It might be a breaking change to users who use labelSelector to differentiate two KSMs.

even in that case field selector (different namespace same name) under my open PR is working as expected @CatherineF-dev

output3.mp4

Is it self-recovered?

nope if the labels change is not self-recovered you have to restart the pods to see the new labels. On the contrary, with field selector nothing can break the sharding calculation.

It is simple, you have some criteria to keep updating your shards properly and you can set these only at the start time of the app. However, the current criteria (aka labelselector) by k8s-design are allowed to change without any pod invalidation, so no restart here. Such a change could make your criteria invalid as it may no longer correspond to any object. Every time I try to wrap my head around this, for keeping the shards of a statefulset always in-line and proper, field selector is what I would choose.

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Apr 17, 2024

even in that case field selector (different namespace same name) under my open PR is working as expected

Cool. The tricky case might be same namespace name and different labels.

@pkoutsovasilis
Copy link
Author

@CatherineF-dev are you joking me?

quoting from k8s page

Each object in your cluster has a Name that is unique for that type of resource. Every Kubernetes object also has a UID that is unique across your whole cluster.

For example, you can only have one Pod named myapp-1234 within the same namespace, but you can have one Pod and one Deployment that are each named myapp-1234.

For non-unique user-provided attributes, Kubernetes provides labels and annotations.

From the above, and personal experience, I deem that it is not allowed by k8s to have an object of the same type (statefulset in our case) in the same namespace with the same name. But please if you know otherwise educate me

@CatherineF-dev
Copy link
Contributor

You're right, forgot about that.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
3 participants