Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear values for vmware_vm_cpu_usage_average metric #3

Closed
pryorda opened this issue Jun 29, 2018 · 16 comments
Closed

Unclear values for vmware_vm_cpu_usage_average metric #3

pryorda opened this issue Jun 29, 2018 · 16 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@pryorda
Copy link
Owner

pryorda commented Jun 29, 2018

From @dannyk81 on February 10, 2018 1:53

I can't quite figure out the values of vmware_vm_cpu_usage_average metric, for example:

vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz1"} | 202
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz2"} | 225
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz3"} | 4015
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz4"} | 207
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz5"} | 209

according to this https://www.vmware.com/support/developer/converter-sdk/conv61_apireference/cpu_counters.html

The description of this counter is Amount of actively used virtual CPU, as a percentage of total available CPU, but the values I'm seeing do not seem like percentages.

Any clues?

Copied from original issue: rverchere/vmware_exporter#29

@pryorda
Copy link
Owner Author

pryorda commented Jun 29, 2018

From @dannyk81 on February 15, 2018 17:47

@rverchere

So, seems like dividing the value by 100 gets the correct result 😄 (compared to figures we see in vCenter)

Perhaps this is due to converting the value to float here: https://github.com/rverchere/vmware_exporter/blob/aeccb035d368dcc8e6bc52628d7eef786345725b/vmware_exporter/vmware_exporter.py#L386

@pryorda
Copy link
Owner Author

pryorda commented Jun 29, 2018

From @dannyk81 on February 15, 2018 17:54

same issue with vmware_vm_mem_usage_average metric, need to divide by 100 the value to get correct result.

@dannyk81
Copy link
Collaborator

dannyk81 commented Oct 7, 2018

@pryorda was this actually fixed in #16?

@pryorda
Copy link
Owner Author

pryorda commented Oct 7, 2018 via email

@dannyk81
Copy link
Collaborator

@pryorda

Here's a sample:

# HELP vmware_vm_cpu_usage_average vmware_vm_cpu_usage_average
# TYPE vmware_vm_cpu_usage_average gauge
vmware_vm_cpu_usage_average{cluster_name="MAD-PROD",dc_name="MAD",host_name="esx-prod-4.foo.bar",vm_name="ELASTICDATA-03.PRD.MOVES.MAD"} 533.0
vmware_vm_cpu_usage_average{cluster_name="MAD-PROD",dc_name="MAD",host_name="esx-prod-4.foo.bar",vm_name="BLUE-KUBM-01.PRD.MOVES.MAD"} 774.0
vmware_vm_cpu_usage_average{cluster_name="MAD-PROD",dc_name="MAD",host_name="esx-prod-4.foo.bar",vm_name="PROMETHEUS-01.MAD"} 1474.0
vmware_vm_cpu_usage_average{cluster_name="MAD-PROD",dc_name="MAD",host_name="esx-prod-4.foo.bar",vm_name="BLUE-KUBW-01.PRD.MOVES.MAD"} 888.0
vmware_vm_cpu_usage_average{cluster_name="MAD-PROD",dc_name="MAD",host_name="esx-prod-4.foo.bar",vm_name="KAF-05.PRD.MOVES.MAD"} 175.0

Above should be percentages, only dividing them by 100 do I get a meaningful value.

Can you confirm it's the same in your case?

@pryorda
Copy link
Owner Author

pryorda commented Oct 20, 2018

Being divided by 100 def looks better. I'll add a PR later tonight.

@pryorda
Copy link
Owner Author

pryorda commented Dec 27, 2018

Ugh, never got around to this but based on the verbiage in here: https://www.vmware.com/support/developer/converter-sdk/conv61_apireference/cpu_counters.html

VM - Amount of actively used virtual CPU, as a percentage of total available CPU. This is the host's view of the CPU usage, not the guest operating system view. It is the average CPU utilization over all available virtual CPUs in the virtual machine. For example, if a virtual machine with one virtual CPU is running on a host that has four physical CPUs and the CPU usage is 100%, the virtual machine is using one physical CPU completely.

and

Memory usage as percentage of total configured or available memory

I'm wondering if we should create some kind of mapping to get the correct values?

Something like:

mem_usage == percent. 
cpu_usage == percent. 

if type ==  percent:
  value / 100

@dannyk81
Copy link
Collaborator

dannyk81 commented Dec 27, 2018

The perf metrics data object returned should include the Unit information which can be used to normalize the values.

https://www.vmware.com/support/developer/converter-sdk/conv61_apireference/vim.PerformanceManager.CounterInfo.Unit.html

This way, we should be able to use a generic function to check the Unit of the metric and apply any relevant normalization.

@pryorda
Copy link
Owner Author

pryorda commented Jun 19, 2019

According to this: https://www.vmware.com/support/developer/converter-sdk/conv61_apireference/cpu_counters.html

This is how it gets the value: virtual CPU usage = usagemhz / (# of virtual CPUs x core frequency)

@jdelvecchio
Copy link

Thanks for you reply ! However, this is what I get running a few tests.

Example for a VM :
Usagemhz : 18953
Number of virtual CPUs : 8
Core frequency : 2593.993 MHz

vmware_vm_cpu_usage_average = 18953 / (8*2593.993) = 0.9133

Then it is multiplied by 10 000 because the value I get in prometheus is 9133 so the correct formula is :
vmware_vm_cpu_usage_average = usagemhz / (# virtual CPUs * core frequency) * 10 000

Or I'm getting the wrong unit in core frequency, because 18953 / (8 * 0.2593993) = 9133

@pryorda
Copy link
Owner Author

pryorda commented Jun 26, 2019

I'm not sure. I dont think we do any mangling of that, but I can double check. I "assume" its the second formula.

@jdelvecchio
Copy link

jdelvecchio commented Jun 26, 2019

Has someone found a way to use this value ? Like how to convert it to %cpu used ?
I don't seem to get anything from it apart from a number that indicates a cpu workload without any real unit.

Would be helpful!

@pryorda
Copy link
Owner Author

pryorda commented Jun 27, 2019

I usually just graph all the vms and find the outliers. I don't alert on cpu usage just load.

@dannyk81
Copy link
Collaborator

dannyk81 commented Jun 28, 2019

@jdelvecchio I use this metric in various dashboards and simply divide the value by 100.

running vmware_vm_cpu_usage_average /100>100 returns no data for all our deployments (~1000 VMs), so value is always 0~100.

I wonder if this has something to do with sockets/core? (though it shouldn't) in our case the cores per socket is always 1, how about you?

@jdelvecchio
Copy link

@dannyk81 running vmware_vm_cpu_usage_average /100>100 also returns no data for me.

I got my maths wrong, it seems to be %used. Thanks to both of you for the details and the help.

As for sockets/core it depends on the VM, we have a bit of both.

@dannyk81
Copy link
Collaborator

dannyk81 commented Jul 4, 2019

indeed, this metrics is average %used.

however the value returned does not have a decimal point, hence the need to divide by 100.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants