-
Notifications
You must be signed in to change notification settings - Fork 204
fix(power): correct power attribution by separating used vs idle power #2125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(power): correct power attribution by separating used vs idle power #2125
Conversation
32eedaa
to
1c74d29
Compare
1c74d29
to
cf402c6
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## reboot #2125 +/- ##
==========================================
+ Coverage 92.46% 92.65% +0.19%
==========================================
Files 33 33
Lines 2587 2779 +192
==========================================
+ Hits 2392 2575 +183
- Misses 154 160 +6
- Partials 41 44 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
962c3c2
to
f461dfa
Compare
38d74ab
to
2b1b99b
Compare
2b1b99b
to
c1d1b52
Compare
…ibution Previously, workloads were attributed power from total node consumption, causing over-attribution during low CPU usage periods. This commit implements proportional power attribution using CPU utilization ratios. Changes: - Add NodeUsage type with active/idle power breakdown - Calculate CPU usage ratio from /proc/stat - Attribute only active power to workloads based on CPU time - Export separate active/idle power metrics Fixes power attribution accuracy by ensuring workloads are only attributed power proportional to their actual CPU utilization. Signed-off-by: Sunil Thaha <[email protected]>
Signed-off-by: Sunil Thaha <[email protected]>
c1d1b52
to
913a0be
Compare
// active over total, where active = total - idle | ||
// and total = user + nice + system + idle + iowait + irq + softirq + steal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment to say active = total - (idle + iowait)
@@ -14,32 +14,37 @@ import ( | |||
"k8s.io/utils/clock" | |||
) | |||
|
|||
type Node struct { | |||
CPUTimeDelta float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this field contains sum(proc.CPUTimeDelta)
, not related to node
ch <- prometheus.MustNewConstMetric( | ||
c.nodeCPUActiveJoulesDesc, | ||
prometheus.CounterValue, | ||
energy.ActiveEnergy.Joules(), | ||
zoneName, path, | ||
) | ||
|
||
ch <- prometheus.MustNewConstMetric( | ||
c.nodeCPUIdleJoulesDesc, | ||
prometheus.CounterValue, | ||
energy.IdleEnergy.Joules(), | ||
zoneName, path, | ||
) | ||
|
||
// watts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you plotting? rate ?
You are right ... I am not incrementing (prev + current) the counter and only calculating the irate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, not with rate
@vimalk78 I am merging this so that pod can be rebased on this. I will continue to fix the incorrect implementation of ActiveEnergy counter as a follow up commit. |
This PR fixes a critical power attribution bug where workloads (processes/containers/VMs)
were being attributed the full node power consumption instead of only the power
corresponding to their actual CPU utilization.
Key Changes:
kepler_node_cpu_[active|idle]_[watts|joules_total]
kepler_node_cpu_usage_ratio
metricImpact: Provides significantly more accurate power attribution that correctly reflects the relationship between CPU utilization and energy consumption, eliminating over-attribution to workloads.