-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IF event's attribute (environment) value filter not working in Issue Alert #83685
Comments
Assigning to @getsentry/support for routing ⏲️ |
Routing to @getsentry/product-owners-alerts for triage ⏲️ |
@ceorourke - can you help take a look at this one? Thanks! |
I have spent a fair amount of time looking into this without being able to reproduce it. First I tried to reproduce it with the simplest parts - I made a rule using the environment picker and a rule using the filters for the environment. Both rules fired. Then I wrote a test for the same so I could trace what may have been happening, but they both fired as well. We then added in the additional filters the customer's rule had and still didn't encounter any problems. Next we dug through the rule processing pipeline and how we evaluate the environment in the two different ways shown in the rule but can't find any problem. We dug through the logs and tried to figure out where it may have encountered a problem but could not find anything. Our best guess for now is that because the rules have the "Number of events in an issue is more than 100 in 1w" condition that means it goes through our delayed processing pipeline (a buffer that's flushed every minute) and perhaps the rules were put into different buckets so that when we got the number of events in the last hour it was on two slightly different timestamps and it failed on that. We can't find any problem with the environment options. I did notice that the rules were created one minute after the other as if the creator already had some problem with a separate rule, is that the case? It seems unlikely that someone would choose to test the two different environment options. Maybe we can look deeper into the original problem rule, if it exists. |
Really appreciate the detailed notes @ceorourke!
Yes, this is the case. @realkosty may have linked it in the customer case you have, it will have been invalidated since then and I could provide him the latest link. We observed a false negative in the slightly more complicated alert rule (it mainly includes more actions, and the tag filter uses
This is an interesting theory, but I'm not sure if it applies. Based on your description here, it sounds like after 1 minute, this bucket mismatch should no longer be a concern? But it appears that these issues that are triggering one alert and not the other have >120 issues, and receive them spread over hours, not seconds. Does that disprove that, or did I misunderstand the delayed pipeline? |
For the delayed pipeline it'd be like if the time window was 5 minutes the buckets would be from say 1:05 - 1:10 and then 1:06 - 1:11, so the total number of events in each bucket may be different. I don't see an alert link for the original rule but I'll ask about it. |
Got it -- the alert window here is 7days, threshold at 100, and these had volumes of 120, 140, etc. It sounds like this theory doesn't hold up then? I'm going to adjust the the two debugging alerts to remove this filter and see what we can learn. |
I am facing the same issue. The event attribute filter doesn't work for |
Hi @mitsuyuki418, We are investigating the issue. Do you mind sharing the other conditions on your rule? In the meantime, please continue to use the tag filter for |
@mifu67 I'm sorry, but I found it's only about the preview in my case. When I use the environment tag, it shows the preview items, but with the environment attribute, it show nothing like this: But an alert was triggered for both tags and attribute. So there's no problem in my case for the alert triggers. ( Hopefully, it would be nice to show the correct preview for the env attributes :) ) Thanks, |
Thank you for the additional information! |
@ceorourke @mifu67 we made progress investigating it with customer:
|
Are you able to reproduce it outside of their project, or figure out what it is about their events that's different and can answer why we can't reproduce it anywhere else? |
@ceorourke yes! we're working hard on a proper repro by capturing event envelopes and hoping to replay them in a test org so that hopefully you can then run it locally in a debugger. Question: how does our alerting behave wrt super delayed events? I.e. should we fudge timestamps to make them fresh when creating a repro? |
Alerts are evaluated just after event ingestion as a post processing step |
@ceorourke do you know if alert evaluation step discards late events or treats them as if they just occurred (only looking and |
When we make the query to determine if the event frequency filter passes we use the current time and the duration set on the alert to determine the window of time we're looking at |
@ceorourke gotcha, thanks! and the triggering event itself - doesn't matter if it's a time capsule from a month ago, we will still alert? |
That might be a question better directed towards SNS or whichever team manages event ingestion - I don't know if delayed events retain the original timestamp somewhere. If they did I could imagine that might change the results of the snuba query - as far as alerting is concerned, we run through the logic to determine if it should fire just after ingestion. |
Environment
SaaS (https://sentry.io/)
Steps to Reproduce
event's {attribute="environment"} value {match="equals"} {value="beta"}
filter, and the other one has the same environment selected at the top instead.Expected Result
both alerts fire
Actual Result
only the alert with top-level environment filter fires
Product Area
Alerts
Link
No response
DSN
No response
Version
No response
The text was updated successfully, but these errors were encountered: