Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing VMs metrics #15

Closed
pryorda opened this issue Jun 29, 2018 · 11 comments
Closed

Missing VMs metrics #15

pryorda opened this issue Jun 29, 2018 · 11 comments
Labels
enhancement New feature or request

Comments

@pryorda
Copy link
Owner

pryorda commented Jun 29, 2018

From @rverchere on May 31, 2017 8:6

Add more VMs metrics

Copied from original issue: rverchere/vmware_exporter#3

@pryorda
Copy link
Owner Author

pryorda commented Jun 29, 2018

From @is4it-lab on July 5, 2017 13:44

Hi,
this additional counters would be useful:

            cpu.usagemhz.average
            cpu.ready.summation
            disk.totalLatency.average

disk.usage.average can be omitted - its only a sum of read and write

Robert

@dannyk81
Copy link
Collaborator

dannyk81 commented Dec 27, 2018

cpu.usagemhz.average and cpu.ready.summation are already covered.

disk.totalLatency.average would be an interesting metric to add, wdyt @pryorda?

EDIT: I would probably add disk.totalReadLatency.average and disk.totalWriteLatency.average

and disk.usage.average indeed seems redundant, since the same can extracted from disk.read.average + disk.write.average

@pryorda
Copy link
Owner Author

pryorda commented Dec 27, 2018

Since it's not an additional pull I don't think we need to omit anything anymore. I want to consider creating a larger story to take in all stats. Thoughts?

@dannyk81
Copy link
Collaborator

Still, the amount of available metrics is considerable, pulling them all could create other concerns.

I think some discretion is still required.

@pryorda
Copy link
Owner Author

pryorda commented Dec 27, 2018

We could have an exclusion pattern for stuff we don't want to be displayed. I know you can do this already on the prometheus server side.

@dannyk81
Copy link
Collaborator

Sure, you can drop metrics during ingestion using metric_relabel_configs but you still need to:

  1. Query all the perfManager data from vCenter/ESXi by the exporter
  2. Transfer all the metric data to Prometheus during the scrape

You do reduce the impact on the TSDB as you drop the unwanted metrics, but these two above could be both resource and time consuming, especially on the more larger environments (thinking of setup with Ks of VMs)

Perhaps we can provide perf collector categories (like: cpu, memory, disk, network, power, etc..) and allow enabling/disabling them with some general use case defaults.

@jnovack
Copy link

jnovack commented Oct 7, 2019

There's a small number that will only be in alarm when you are deep-diving, which in that case, you might want to be in vCenter anyway (#gatekeeping). There's around 530+ metrics in vCenter 6.7, and they can definitely be adjudicated between "want", "need", and "why?"

I guess one (of a few) good measure of if it you may want to use use it in this project is if you want to alarm on it.

I would want to alarm disk.deltaused.latest, because I need to know when my snapshots get out of control. But I might not want to want to alarm on mem.llSwapOut.average because by the time that alarms, I'll already have 4 other metrics in alarm. If I'm already troubleshooting an out of control VM, I'm not going to be yelling at my VMware admin asking "Where's the graph of mem.llSwapOut.average?!"

Or perhaps, metric parity with telegraf?

@pryorda
Copy link
Owner Author

pryorda commented Oct 8, 2019

@jnovack Your metrics should not determine how your alert logic is. That should be up to the alert manager. If you have multiple alerts for a vm or any prometheus metric the alerts should be bundled together. https://prometheus.io/docs/alerting/alertmanager/#grouping

@jnovack
Copy link

jnovack commented Oct 9, 2019

Of course, and in the land of unicorns and rainbows, we can accept every metric without delay. I'm merely suggesting that when resources are not infinite and magical, a good place to start for metrics to collect are the metrics you want to report or alert on.

@pryorda
Copy link
Owner Author

pryorda commented Oct 9, 2019

@jnovack Whats the delay and issue you're trying to solve? Is the scraping taking to long? Do you have logs you can provide from the exporter?

You can also fork the code base and remove the metrics you don't want from the script. Keep in mind there is some code spaghetti that checks the values of some of those metrics before getting other metrics.

@pryorda
Copy link
Owner Author

pryorda commented Feb 14, 2022

Closing due to inactivity.

@pryorda pryorda closed this as completed Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants