Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors during vm or host metrics collection #217

Open
seyfbarhoumi opened this issue Jul 24, 2020 · 7 comments
Open

Errors during vm or host metrics collection #217

seyfbarhoumi opened this issue Jul 24, 2020 · 7 comments
Labels
enhancement New feature or request

Comments

@seyfbarhoumi
Copy link

seyfbarhoumi commented Jul 24, 2020

2020-07-24 14:33:36,624 INFO:Starting vm metrics collection
2020-07-24 14:33:36,624 INFO:Fetching vim.VirtualMachine inventory
2020-07-24 14:33:36,624 INFO:Retrieving service instance content
2020-07-24 14:33:36,627 INFO:START: _vmware_get_vm_perf_manager_metrics
2020-07-24 14:33:37,288 INFO:Retrieved service instance content
2020-07-24 14:33:58,121 INFO:FIN: _vmware_get_vm_perf_manager_metrics
2020-07-24 14:33:58,186 INFO:Finished collecting metrics from bams-vcenter.bams.corp
2020-07-24 14:33:59,125 ERROR:Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
StopIteration: [<prometheus_client.core.GaugeMetricFamily object at 0x7f1f70c75710>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f70d13f28>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f70d13358>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73b86c50>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73b86358>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73b86fd0>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f70c7bcc0>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f70a23550>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f70a23eb8>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f70a10048>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f70a10898>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f715f7b38>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f715f7a20>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80320>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e803c8>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e804a8>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e805f8>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80630>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e805c0>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e806d8>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80748>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80780>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e807f0>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80898>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80828>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80908>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80940>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e808d0>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e809b0>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80ba8>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80eb8>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80ef0>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80e80>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e80f98>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e8d0b8>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e8d0f0>, <prometheus_client.core.GaugeMetricFamily object at 0x7f1f73e8d128>]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/vmware_exporter/vmware_exporter.py", line 1786, in _async_render_GET
yield self.generate_latest_metrics(request)
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.6/site-packages/vmware_exporter/vmware_exporter.py", line 1833, in generate_latest_metrics
request.finish()
File "/usr/local/lib/python3.6/site-packages/twisted/web/server.py", line 286, in finish
return http.Request.finish(self)
File "/usr/local/lib/python3.6/site-packages/twisted/web/http.py", line 1080, in finish
"Request.finish called on a request after its connection was lost; "
RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.

2020-07-24 14:33:59,126 INFO:Fetched vim.VirtualMachine inventory (0:00:22.501970)
Unhandled error in Deferred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 501, in errback
self._startRunCallbacks(fail)
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
self._runCallbacks()
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1475, in gotResult
_inlineCallbacks(r, g, status)
--- ---
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/usr/local/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/usr/local/lib/python3.6/site-packages/vmware_exporter/vmware_exporter.py", line 1791, in _async_render_GET
request.finish()
File "/usr/local/lib/python3.6/site-packages/twisted/web/server.py", line 286, in finish
return http.Request.finish(self)
File "/usr/local/lib/python3.6/site-packages/twisted/web/http.py", line 1080, in finish
"Request.finish called on a request after its connection was lost; "
builtins.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.

2020-07-24 14:34:00,061 INFO:Finished vm metrics collection
2020-07-24 14:34:01,579 ERROR:Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/vmware_exporter/vmware_exporter.py", line 1786, in _async_render_GET
yield self.generate_latest_metrics(request)
twisted.internet.defer.FirstError: FirstError[#1, [Failure instance: Traceback: <class 'twisted.internet.defer.FirstError'>: FirstError[#0, [Failure instance: Traceback: <class 'pyVmomi.VmomiSupport.vmodl.fault.ManagedObjectNotFound'>: (vmodl.fault.ManagedObjectNotFound) {
dynamicType = ,
dynamicProperty = (vmodl.DynamicProperty) [],
msg = 'This object has been deleted or haven't been entirely created',
faultCause = ,
faultMessage = (vmodl.LocalizableMessage) [],
obj = 'vim.VirtualMachine:vm-10546'
}
/usr/local/lib/python3.6/threading.py:916:_bootstrap_inner
/usr/local/lib/python3.6/threading.py:864:run
/usr/local/lib/python3.6/site-packages/twisted/_threads/_threadworker.py:46:work
/usr/local/lib/python3.6/site-packages/twisted/_threads/_team.py:190:doWork
--- ---
/usr/local/lib/python3.6/site-packages/twisted/python/threadpool.py:250:inContext
/usr/local/lib/python3.6/site-packages/twisted/python/threadpool.py:266:
/usr/local/lib/python3.6/site-packages/twisted/python/context.py:122:callWithContext
/usr/local/lib/python3.6/site-packages/twisted/python/context.py:85:callWithContext
/usr/local/lib/python3.6/site-packages/pyVmomi/VmomiSupport.py:706:
/usr/local/lib/python3.6/site-packages/pyVmomi/VmomiSupport.py:512:_InvokeMethod
/usr/local/lib/python3.6/site-packages/pyVmomi/SoapAdapter.py:1397:InvokeMethod
]]
--- ---
/usr/local/lib/python3.6/site-packages/vmware_exporter/vmware_exporter.py:1342:_vmware_get_vm_perf_manager_metrics
/usr/local/lib/python3.6/site-packages/vmware_exporter/defer.py:99:parallelize
]]

Unhandled Error
Traceback (most recent call last):
File "/usr/local/bin/vmware_exporter", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/vmware_exporter/vmware_exporter.py", line 1899, in main
reactor.run()
File "/usr/local/lib/python3.6/site-packages/twisted/internet/base.py", line 1283, in run
self.mainLoop()
File "/usr/local/lib/python3.6/site-packages/twisted/internet/base.py", line 1292, in mainLoop
self.runUntilCurrent()
--- ---
File "/usr/local/lib/python3.6/site-packages/twisted/internet/base.py", line 886, in runUntilCurrent
f(*a, **kw)
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 501, in errback
self._startRunCallbacks(fail)
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
self._runCallbacks()
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 699, in _runCallbacks
current.result.cleanFailure()
File "/usr/local/lib/python3.6/site-packages/twisted/python/failure.py", line 627, in cleanFailure
self.value.traceback = None
File "/usr/local/lib/python3.6/site-packages/pyVmomi/VmomiSupport.py", line 663, in setattr
CheckField(self._GetPropertyInfo(name), val)
File "/usr/local/lib/python3.6/site-packages/pyVmomi/VmomiSupport.py", line 468, in GetPropertyInfo
raise AttributeError(name)
builtins.AttributeError: traceback

2020-07-24 14:34:06,631 INFO:Start collecting metrics from bams-vcenter.bams.corp
2020-07-24 14:34:06,631 INFO:Starting vm metrics collection
2020-07-24 14:34:06,631 INFO:Fetching vim.VirtualMachine inventory
2020-07-24 14:34:06,631 INFO:Retrieving service instance content
2020-07-24 14:34:06,634 INFO:START: _vmware_get_vm_perf_manager_metrics
2020-07-24 14:34:07,118 INFO:Retrieved service instance content
Unhandled error in Deferred:

Traceback (most recent call last):
--- ---
File "/usr/local/lib/python3.6/site-packages/vmware_exporter/vmware_exporter.py", line 1342, in _vmware_get_vm_perf_manager_metrics
self.vm_labels,
File "/usr/local/lib/python3.6/site-packages/vmware_exporter/defer.py", line 99, in parallelize
results = yield defer.DeferredList(args, fireOnOneErrback=True)
twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'pyVmomi.VmomiSupport.vmodl.fault.ManagedObjectNotFound'>: (vmodl.fault.ManagedObjectNotFound) {
dynamicType = ,
dynamicProperty = (vmodl.DynamicProperty) [],
msg = 'This object has been deleted or haven't been entirely created',
faultCause = ,
faultMessage = (vmodl.LocalizableMessage) [],
obj = 'vim.VirtualMachine:vm-10546'
}
/usr/local/lib/python3.6/threading.py:916:_bootstrap_inner
/usr/local/lib/python3.6/threading.py:864:run
/usr/local/lib/python3.6/site-packages/twisted/_threads/_threadworker.py:46:work
/usr/local/lib/python3.6/site-packages/twisted/_threads/_team.py:190:doWork
--- ---
/usr/local/lib/python3.6/site-packages/twisted/python/threadpool.py:250:inContext
/usr/local/lib/python3.6/site-packages/twisted/python/threadpool.py:266:
/usr/local/lib/python3.6/site-packages/twisted/python/context.py:122:callWithContext
/usr/local/lib/python3.6/site-packages/twisted/python/context.py:85:callWithContext
/usr/local/lib/python3.6/site-packages/pyVmomi/VmomiSupport.py:706:
/usr/local/lib/python3.6/site-packages/pyVmomi/VmomiSupport.py:512:_InvokeMethod
/usr/local/lib/python3.6/site-packages/pyVmomi/SoapAdapter.py:1397:InvokeMethod
]]

2020-07-24 14:34:18,899 INFO:Fetched vim.VirtualMachine inventory (0:00:12.267172)
2020-07-24 14:34:19,117 INFO:Finished vm metrics collection
2020-07-24 14:34:23,446 INFO:FIN: _vmware_get_vm_perf_manager_metrics
2020-07-24 14:34:23,462 INFO:Finished collecting metrics from bams-vcenter.bams.corp

@seyfbarhoumi seyfbarhoumi changed the title Errors during vm_metrics collection Errors during vm or host metrics collection Jul 24, 2020
@pryorda
Copy link
Owner

pryorda commented Aug 1, 2020

Update the prometheus timeout to this endpoint.

@seyfbarhoumi
Copy link
Author

seyfbarhoumi commented Aug 4, 2020

This resolved the problem, but there's something that i wanted to let you know, sometimes for some reason a scrape duration get more than expected and it exceed the scrape timeout. The problem is that a kind of concurrency is created between the scape that timed out and the next scape, this make the scrape take too long and at some point the exporter get stuck and no longer fetch any metrics.

@pryorda
Copy link
Owner

pryorda commented Aug 17, 2020

If i'm understanding you correctly, we need to implement some sort of lock to prevent multiple scrapes from happening?

@vsulimanec
Copy link

vsulimanec commented Oct 21, 2020

Hi,

would say yes, the same problem we have on version 0.13.0.
We'll test today 0.13.2

@billabongrob
Copy link

I'm finding that in large instances, this will occur and am unsure as to whether or not it's a slow response of the vSphere (6.7) API or the exporter itself. If I disable VM collection, it works relatively good. As soon as it's enabled, it hits the fan. First run will take ~26 seconds, second run upwards of 10-15 minutes. Unsure of what the best practice would be for multiple datacenters/clusters.

@pryorda
Copy link
Owner

pryorda commented Mar 12, 2021

@billabongrob You can use different sections.

@kong62
Copy link

kong62 commented Jun 29, 2021

I'm finding that in large instances, this will occur and am unsure as to whether or not it's a slow response of the vSphere (6.7) API or the exporter itself. If I disable VM collection, it works relatively good. As soon as it's enabled, it hits the fan. First run will take ~26 seconds, second run upwards of 10-15 minutes. Unsure of what the best practice would be for multiple datacenters/clusters.

version: pryorda/vmware_exporter:v0.16.1

  1. I am same probrom, now I disabled VMS. it's good.

  2. prometheus timeout can't solve the problem:

spec:
  podMetricsEndpoints:
  - interval: 60s
    scrapeTimeout: 55s
    path: /metrics
    port: http
  1. when I set LIMITED section, no error, but no metrics too
# vi config.yml 
kind: ConfigMap
metadata:
  labels:
    app: vmware-exporter
  name: vmware-exporter-config
apiVersion: v1
data:
  VSPHERE_USER: "[email protected]"
  VSPHERE_HOST: "vCenterSRV01.hupu.local"
  VSPHERE_IGNORE_SSL: "True"
  VSPHERE_COLLECT_HOSTS: "True"
  VSPHERE_COLLECT_DATASTORES: "True"
  VSPHERE_COLLECT_SNAPSHOTS: "true"
  VSPHERE_COLLECT_VMS: "false"
  VSPHERE_COLLECT_VMGUESTS: "false"
  VSPHERE_LIMITED_USER: "[email protected]"
  VSPHERE_LIMITED_HOST: "vCenterSRV01.hupu.local"
  VSPHERE_LIMITED_PASSWORD: "ss%%m#sE2L"
  VSPHERE_LIMITED_IGNORE_SSL: "True"
  VSPHERE_LIMITED_COLLECT_HOSTS: "false"
  VSPHERE_LIMITED_COLLECT_DATASTORES: "false"
  VSPHERE_LIMITED_COLLECT_SNAPSHOTS: "false"
  VSPHERE_LIMITED_COLLECT_VMS: "true"
  VSPHERE_LIMITED_COLLECT_VMGUESTS: "false"

@pryorda pryorda added the enhancement New feature or request label Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants