-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User-Agent filter list in the UI / new crawler request #12753
Comments
That would be awesome. A way to add custom crawlers (by adding a user agent reg ex) which are not part of "Filter out known web crawlers". In my case requests coming from site24x7.com. |
We would find this feature really useful. Right now we get a lot of events from a variety of bots, and it would be useful if we could apply inbound filters to ignore these. The majority of the bots we get are:
|
Routing to @getsentry/product-owners-issues for triage ⏲️ |
One relatively easy first step would be to add a note in product (or a link in product to the docs) suggesting either filing an issue asking to update (or just creating a PR to update) either https://github.com/getsentry/relay/blob/master/relay-filter/src/web_crawlers.rs or https://github.com/getsentry/relay/blob/master/relay-filter/src/browser_extensions.rs as appropriate. |
Summary
Add a setting in the interface to ban bots based on a User-Agent. Or maybe we at least can file the User-Agents so that you could block them manually?
Motivation
There is a toggle called "Filter out known web crawlers" on the Inbound Filters settings page. Sometimes new crawlers appear that slip by that filter. The bad thing is that ignoring them in
beforeSend
doesn't always work even if a User-Agent is clear. They might be using some kind of cached versions of pages or they might be stripping script objects from pages - in any case, for us the flood of about 200 event per hour hasn't stopped even after adding this code inbeforeSend()
:Additional Context
Here is the issue page: https://sentry.io/organizations/policeone/issues/982063112/events/latest/?project=67360 It actually started after upgrading to the new JS SDK (4.6.6) from raven (3.26.4). Prior to that even the
window.navigator.userAgent
was sufficient for blocking errors from that bot.The crawler's UA in almost 100% of the cases is
The text was updated successfully, but these errors were encountered: