Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide example useful queries that can be run with this exporter #17

Open
KushalP opened this issue Oct 14, 2021 · 8 comments
Open

Provide example useful queries that can be run with this exporter #17

KushalP opened this issue Oct 14, 2021 · 8 comments

Comments

@KushalP
Copy link

KushalP commented Oct 14, 2021

Something that would be useful is to collect sample queries that can be used to produce useful information.

My hope is that this issue can act as a starting point for this. Ideally it would extend the two queries shown in the README.

@mblaschke
Copy link
Member

Any special information you're looking for?

@KushalP
Copy link
Author

KushalP commented Oct 14, 2021

Two goals here:

  1. Make it easier to create recording rules to speed up this process
  2. To understand what data the exporter does and doesn't have

I feel like I'm not entirely sure about what data can be extracted for point 2, as I'm not sure how the creators of this exporter expected it to be used?

@KushalP
Copy link
Author

KushalP commented Oct 14, 2021

Some useful queries to begin with:

  • Total number of incidents by service by time duration (can be shown as different lines for created/acked/etc)
  • Count of tags on each incident, over time (I don't think tags are shown?)
  • Heatmap of incidents by weekday

It would be great to hear from others how they're using this exporter. Especially to better understand how they're visualising this data (Grafana?) and what plugins they're using.

@mblaschke
Copy link
Member

mblaschke commented Oct 17, 2021

"total number of incidents" -> working on that.. thinking about limiting the stats to "month to date" to reduce the load on pagerduty servers.

can you explain why you need the count of tags? how do you use them?

heatmap would be the result from "total number of incidents"

@mblaschke
Copy link
Member

please try 21.10.0-beta1, does metric pagerduty_summary_incident_statuschange_count (counter) help you?

also added metric pagerduty_summary_incident_count(gauge) which reflects the number of found incidents (by summary duration) and metric pagerduty_summary_incident_resolve_duration (histogram) by resolve duration in buckets

@KushalP
Copy link
Author

KushalP commented Oct 18, 2021

working on that.. thinking about limiting the stats to "month to date" to reduce the load on pagerduty servers.

What about limiting the count to the serviceIDs? If you're a big enough organisation that aggregate value can be quite noisy.

can you explain why you need the count of tags? how do you use them?

Two examples come to mind: tags can be used to track specific microservices/subsystems against an incident. They can also be used to track any other things we're aware of. I'd like to use it as a view on the "weekly" theme for incidents.

@mblaschke
Copy link
Member

What about limiting the count to the serviceIDs? If you're a big enough organisation that aggregate value can be quite noisy.

the exporter is now fetching 31 days and generates the summary metrics. it uses that list also to check what status changes were done. that should not create that much load.
serviceID filter should be easy 🤔

Two examples come to mind: tags can be used to track specific microservices/subsystems against an incident. They can also be used to track any other things we're aware of. I'd like to use it as a view on the "weekly" theme for incidents.

not used tags at the moment.. thinking about how they can be integrated

feel free to provide feedback if the current solution with 21.10.0-beta1 is going in the right direction.

@KushalP
Copy link
Author

KushalP commented Oct 27, 2021

feel free to provide feedback if the current solution with 21.10.0-beta1 is going in the right direction.

It looks like a good start. Need to run this for a month to have enough data to be able to provide some reasonable feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants