Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafana dashboard #13

Closed
pryorda opened this issue Jun 29, 2018 · 20 comments
Closed

Grafana dashboard #13

pryorda opened this issue Jun 29, 2018 · 20 comments
Labels
enhancement New feature or request hacktoberfest help wanted Extra attention is needed

Comments

@pryorda
Copy link
Owner

pryorda commented Jun 29, 2018

From @rverchere on June 27, 2017 21:7

Add grafana dashboard using this exporter.

Copied from original issue: rverchere/vmware_exporter#8

@pryorda pryorda added hacktoberfest help wanted Extra attention is needed labels Jun 29, 2018
@pryorda pryorda added the enhancement New feature or request label Jun 29, 2018
@akurach
Copy link
Contributor

akurach commented Sep 9, 2018

I think its very personal. And based on your VC config. I created smthing like this
5de2f252-b55b-47d6-a366-1004280fba88

@pryorda
Copy link
Owner Author

pryorda commented Sep 9, 2018

@akurach looks good. Want to do a PR for it? I want to get mine added, but yours looks better. Did you build any alert manager rules for it?

Currently we have this:

ALERT Host_Warn_Cpu_Usage
  IF
    avg(vmware_host_cpu_usage / vmware_host_cpu_max) by (host_name, environment) * 100 >= 80
  FOR 30m
  LABELS {
    severity = "warning",
    alert_category = "vmware",
    instance = "{{ $labels.host_name }}",
    team = "prod-services",
    run_book = "exampe.com/wiki/what-to-do"
  }
  ANNOTATIONS {
    summary = "High cpu usage on {{ $labels.host_name }}: {{ $value | printf \"%.2f\" }}%",
    description = "High cpu usage on {{ $labels.host_name }}: {{ $value | printf \"%.2f\" }}%"
  }

ALERT Host_Crit_Cpu_Usage
  IF
    avg(vmware_host_cpu_usage / vmware_host_cpu_max) by (host_name, environment) * 100 >= 95
  FOR 10m
  LABELS {
    severity = "critical",
    alert_category = "vmware",
    instance = "{{ $labels.host_name }}",
    team = "prod-services",
    run_book = "exampe.com/wiki/what-to-do"
  }
  ANNOTATIONS {
    summary = "High cpu usage on {{ $labels.host_name }}: {{ $value | printf \"%.2f\" }}%",
    description = "High cpu usage on {{ $labels.host_name }}: {{ $value | printf \"%.2f\" }}%"
  }

ALERT Host_Warn_Mem_Usage
  IF
    avg(vmware_host_memory_usage / vmware_host_memory_max) by (host_name, environment) * 100 >= 80
  FOR 30m
  LABELS {
    severity = "warning",
    alert_category = "vmware",
    instance = "{{ $labels.host_name }}",
    team = "prod-services",
    run_book = "exampe.com/wiki/what-to-do"
  }
  ANNOTATIONS {
    summary = "High memory usage on {{ $labels.host_name }}: {{ $value | printf \"%.2f\" }}%",
    description = "High memory usage on {{ $labels.host_name }}: {{ $value | printf \"%.2f\" }}%. Consider rebalancing of virtualmachines on the cluster in vmware."
  }

ALERT Host_Crit_Mem_Usage
  IF
    avg(vmware_host_memory_usage / vmware_host_memory_max) by (host_name, environment) * 100 >= 98
  FOR 5m
  LABELS {
    severity = "critical",
    alert_category = "vmware",
    instance = "{{ $labels.host_name }}",
    team = "prod-services",
    run_book = "exampe.com/wiki/what-to-do"
  }
  ANNOTATIONS {
    summary = "High memory usage on {{ $labels.host_name }}: {{ $value | printf \"%.2f\" }}%",
    description = "High memory usage on {{ $labels.host_name }}: {{ $value | printf \"%.2f\" }}%. Rebalance virtrtual machines on the cluster in vmware."
  }

ALERT Predict_Disk_Space_Warn
  IF
    (avg(vmware_datastore_freespace_size) by (ds_name, environment, vcenter_host)/((avg(vmware_datastore_freespace_size offset 7d ) by (ds_name, environment, vcenter_host) - avg(vmware_datastore_freespace_size) by (ds_name, environment, vcenter_host))/7+1) >= 0) <= 1
  FOR 60m
  LABELS {
    severity = "warning",
    alert_category = "vmware",
    instance = "{{ $labels.vcenter_host }}:{{ $labels.ds_name }}",
    team = "prod-services",
    run_book = "exampe.com/wiki/what-to-do"
  }
  ANNOTATIONS {
    summary = "Disk space on vmware datastore {{ $labels.ds_name }} could run out in {{ $value | printf \"%.2f\" }} days",
    description = "Disk space on vmware datastore {{ $labels.ds_name }} could run out in {{ $value | printf \"%.2f\" }} days"
  }

ALERT Predict_Disk_Space_Crit
  IF
    (avg(vmware_datastore_freespace_size) by (ds_name, environment, vcenter_host)/((avg(vmware_datastore_freespace_size offset 6h ) by (ds_name, environment, vcenter_host) - avg(vmware_datastore_freespace_size) by (ds_name, environment, vcenter_host))/6 + 1) >= 0) <= 6
  FOR 5m
  LABELS {
    severity = "critical",
    alert_category = "vmware",
    instance = "{{ $labels.vcenter_host }}:{{ $labels.ds_name }}",
    team = "prod-services",
    run_book = "exampe.com/wiki/what-to-do"
  }
  ANNOTATIONS {
    summary = "Disk space on vmware datastore {{ $labels.ds_name }} could run out in {{ $value | printf \"%.2f\" }} hours",
    description = "Disk space on vmware datastore {{ $labels.ds_name }} could run out in {{ $value | printf \"%.2f\" }} hours"
  }

@akurach
Copy link
Contributor

akurach commented Sep 9, 2018

I can try to make my dash more templated....

About alerts - now i use default grafana alerts via pushover and telegram. But i want to migrate to alertmanager in future.

@pryorda
Copy link
Owner Author

pryorda commented Sep 9, 2018

@akurach No rush, I think we will find use in your dashboard.

@pryorda
Copy link
Owner Author

pryorda commented Oct 18, 2018

@akurach Any progress on this?

@pete-leese
Copy link

Has anyone got any good example dashboards to share ? Cheers.

@pryorda
Copy link
Owner Author

pryorda commented Jan 23, 2019

There should be one in the grafana dir

@pete-leese
Copy link

Yes I saw the esx hosts dashboard but I was wondering if there was any more examples such as Datastore storage, iops, vm’s before I got stuck into making my own.

@pryorda
Copy link
Owner Author

pryorda commented Jan 26, 2019

At this time there is not.

@akurach
Copy link
Contributor

akurach commented Jun 28, 2019

for now I've got something like this

I can give examples of some of them)

Screenshot 2019-06-28 at 16 14 55

Screenshot 2019-06-28 at 16 15 12

Screenshot 2019-06-28 at 16 15 25

@icanhazbeer
Copy link

Hello,
Are there any examples or templates out there using this exporter? I could not find any on the grafana website.

@noesberger
Copy link

for now I've got something like this

I can give examples of some of them)

Screenshot 2019-06-28 at 16 14 55 Screenshot 2019-06-28 at 16 15 12 Screenshot 2019-06-28 at 16 15 25

Hi

Can you provide the dashboards you've build in Grafana for the Prometheus VMware Exporter? They look great.

@PickingUpPieces
Copy link

@akurach They look great actually!
Is it possibly to contribute those?

@dannyk81
Copy link
Collaborator

dannyk81 commented Oct 9, 2019

I always find that it's almost impossible to provide a one-size-fit-all dashboard that will work well for everyone, since our monitoring/observability requirements differ and we all look at slightly different things.

My recommendation here would be to suggest users that would like to share/contribute their dashbaords to use Grafana's dashboard library - https://grafana.com/grafana/dashboards

This way we can have multiple variants published and maintained in an appropriately designed repository so that users can mix-and-match or use the variant that better fits their needs.

@pryorda this seems a more sensible approach to me, wdyt?

@pryorda
Copy link
Owner Author

pryorda commented Oct 9, 2019

I think having a standard template would be good. RAM/CPU/DISK. Other then that you're correct.

@PickingUpPieces
Copy link

@dannyk81 I feel like, that people feel more responsible to update their dashboards, if they're in the git repository than on grafana dashboards. I'm with @pryorda on this, I would even like a more stacked version, so everyone can just delete panels that they don't need.

@dannyk81
Copy link
Collaborator

well, I see at least 3 versions above and my own dashboards are quite different... even display RAM/CPU/DISK can be done in so many ways 😄

I would instead create list of sample queries and alerts which focus on various aspects of the system's health, users can then use these samples to build their alerting rules and dashboards.

again, this is my preference and 2 cents on this subject...

@PickingUpPieces
Copy link

Sounds alright to me :) But I'm still thinking, that some finished example boards which are working out of the box (more or less), would be pretty helpful for newbies

@medeirosjrm
Copy link

pryorda , I found the alerts created in the above example in response to @akurach very good, I have great difficulty creating these alerts for disk space and virtual machines down, would you have any example?

Thank you very much in advance

@pryorda
Copy link
Owner Author

pryorda commented Feb 14, 2022

I would start by understanding the metrics you are wanting to grab and then possibly look at the predictive alerts in prometheus. Let me know if this doesn't help

@pryorda pryorda closed this as completed Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hacktoberfest help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

8 participants