Reconciliation takes too long to execute #1097

skhalash · 2024-05-21T11:31:26Z

Description

Fixing the managed Kyma dashboards exposed an issue with the CR reconciliation duration across all three pipeline types and the Telemetry CR. The median reconciliation duration for the pipelines is approximately 1 second, with the 99th percentile reaching around 4 seconds for long-running pipelines that were deployed months ago. Ideally, after an initial deployment each reconciliation should be a no-op since there have been no changes. The Telemetry CR fares slightly better, but its reconciliation duration is still within the same order of magnitude.

What can cause the problem?

Client cache configuration contains a list of concrete GVKs to be cached. However, this list has not been maintained for a while. That's why it does not contain all GVKs deployed by different operator controllers (e.g. Fleunt Bit, OTel Collector and Self-Monior resources). We could instead use the DefaultNamespace cache option and automatically cache everything in the kyma-system namespace.
There is a hypothesis that CreateOrUpdate utils have never actually worked and always perform an API call instead of checking a diff and returning early.

Expected result

A no-op reconciliation should not take that long

Actual result

A no-op reconciliation takes seconds

Steps to reproduce

Troubleshooting

Release Notes

The text was updated successfully, but these errors were encountered:

skhalash · 2024-05-21T11:59:29Z

Here people stumble upon the same problem with comparing resources kubernetes-sigs/kubebuilder#592

github-actions · 2024-07-21T00:11:53Z

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs.
Thank you for your contributions.

github-actions · 2024-07-28T00:12:29Z

This issue has been automatically closed due to the lack of recent activity.
/lifecycle rotten

skhalash added area/logs LogPipeline area/metrics MetricPipeline area/traces TracePipeline area/manager Manager or module changes kind/bug Categorizes issue or PR as related to a bug. labels May 21, 2024

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 28, 2024

kyma-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 28, 2024

skhalash reopened this Jul 28, 2024

skhalash removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 28, 2024

rakesh-garimella assigned rakesh-garimella and unassigned rakesh-garimella Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconciliation takes too long to execute #1097

Reconciliation takes too long to execute #1097

skhalash commented May 21, 2024

skhalash commented May 21, 2024

github-actions bot commented Jul 21, 2024

github-actions bot commented Jul 28, 2024

Reconciliation takes too long to execute #1097

Reconciliation takes too long to execute #1097

Comments

skhalash commented May 21, 2024

skhalash commented May 21, 2024

github-actions bot commented Jul 21, 2024

github-actions bot commented Jul 28, 2024