Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory consumption due to ConfigMap watches #1050

Open
jotak opened this issue Nov 2, 2023 · 4 comments
Open

High memory consumption due to ConfigMap watches #1050

jotak opened this issue Nov 2, 2023 · 4 comments

Comments

@jotak
Copy link

jotak commented Nov 2, 2023

Describe the bug
Before anything, note than I am not a ArgoCD user: I'm a developer of another OLM-based operator and, while investigating memory issues, out of curiosity I wanted to test a bunch of other operators to see who else was impacted by the same issue, and it seems ArgoCD operator is. I haven't done a deep investigation on argocd-operator in particular, if you think that this is a false-positive then I apologize for the inconvenience and you can close this issue.

The problem: I ran a simple test: installing a bunch of operators, monitoring memory consumption, created a dummy namespace and many configmaps in that namespace. On some operators, the memory consumption remained stable; on others like this one, it increased linearly with the created configmaps.

My assumption is that there is little chance that your operator actually needs to watch every configmaps (is it correct?). This is a quite common problem that has been documented here: https://sdk.operatorframework.io/docs/best-practices/designing-lean-operators/#overview :

"One of the pitfalls that many operators are failing into is that they watch resources with high cardinality like secrets possibly in all namespaces. This has a massive impact on the memory used by the controller on big clusters."

From my experience, with some customers this can count in gigabytes of overhead. And I would add that it's not only about memory usage, it's also stressing Kube API with a lot of traffic.

The article above suggests a remediation using cache configuration: if this would solve the problem for you, that's great! In case it doesn't, you might want to chime in here: kubernetes-sigs/controller-runtime#2570 . I'm proposing to add to controller-runtime more possibilities regarding cache management, but for that I would like to probe a bit the different use cases among OLM users, in order to understand if the solution that I'm suggesting would help others or not. I guess the goal is to find a solution that suits for the most of us, rather than each implementing its own custom cache management.

To Reproduce

  • Install the operator
  • Watch memory used
  • kubectl create namespace test
  • for i in {1..500}; do kubectl create cm test-cm-$i -n test --from-file=<INSERT BIG FILE HERE> ; done

Expected behavior

The memory should remain quite stable, but it increases for every configmap created.

Screenshots

Capture d’écran du 2023-10-27 09-03-34

@chengfang
Copy link
Contributor

@jotak thanks for reporting it. Will evaluate it.

@chengfang
Copy link
Contributor

This issue will be tracked in https://issues.redhat.com/browse/GITOPS-3602

@chetan-rns
Copy link
Collaborator

@jotak I couldn't reproduce this issue. I followed the steps by running the operator locally using an OpenShift cluster and created close to 500 configmaps in a test namespace. The memory slightly increased initially(although negligible) but became stable soon. Just to be sure I also tried repeating the steps in an argocd namespace where the instance is running but got the same results. I wonder if there was something else that caused the memory to linearly increase in your case. Can you please share the operator and K8s version that was used for testing?

I investigated the code to check how the operator watches the configmaps. AFAIK it watches the configmaps in two cases:

  1. If the configmap is owned by the operand. For example: Argo CD workload configmaps.
  2. Configmaps that are used for Appset GitLab TLS. with a specific name.

Except for the above cases, the operator shouldn't watch configmaps in all namespaces. @chengfang @jaideepr97 Please feel free to correct me.

@jotak
Copy link
Author

jotak commented Jan 10, 2024

@chetan-rns I think this commit has fixed the issue: c8e4909#diff-2873f79a86c0d8b3335cd7731b0ecf7dd4301eb19a82ef7a1cba7589b5252261R264-R284
It was probably not there when I first tried.

PS: I confirm I don't reproduce the issue today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants