Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin/metrics: Update metrics in-place when possible #1277

Merged
merged 5 commits into from
Feb 26, 2025

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Feb 21, 2025

Fixes #1276.

Currently, the way we update these node metrics is by removing all the old ones, then adding back the current values. We do it that way so that the old values can be cleaned up when there's label changes.

However: if metrics are scraped in between removing the old and adding the new, we can end up with single-datapoint gaps for one node at a time.

So to fix this, we should avoid removing the old metrics if and only if the labels are unchanged -- which we can check just by storing the previous labels we used.

Fixes #1276.

Currently, the way we update these node metrics is by removing all the
old ones, then adding back the current values.

If metrics are scraped in between removing the old and adding the new,
we can end up with single-datapoint gaps for one node at a time.

So to fix this, we should avoid removing the old metrics if and only if
the labels are unchanged -- which we can check just by storing the
previous labels we used.
@sharnoff sharnoff requested review from a team and petuhovskiy and removed request for a team February 21, 2025 13:58
Copy link

github-actions bot commented Feb 21, 2025

No changes to the coverage.

HTML Report

Click to open

@mikhail-sakhnov
Copy link
Contributor

question (to gain more project context): why do we delete metrics for previous node?

@sharnoff
Copy link
Member Author

why do we delete metrics for previous node?

We include some node labels in the metrics, and if those node labels change, the metrics update needs to also remove the series with the old labels.

@petuhovskiy petuhovskiy assigned sharnoff and unassigned petuhovskiy Feb 25, 2025
@sharnoff sharnoff assigned petuhovskiy and unassigned sharnoff Feb 25, 2025
@sharnoff sharnoff requested a review from petuhovskiy February 25, 2025 20:41
@petuhovskiy petuhovskiy assigned sharnoff and unassigned petuhovskiy Feb 26, 2025
@sharnoff sharnoff merged commit 3656682 into main Feb 26, 2025
33 checks passed
@sharnoff sharnoff deleted the sharnoff/plugin-better-node-metrics-updates branch February 26, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Gaps under load in scheduler's node resource metrics
3 participants