Skip to content

User-Agent filter list in the UI / new crawler request #12753

@dryoma

Description

@dryoma

Summary

Add a setting in the interface to ban bots based on a User-Agent. Or maybe we at least can file the User-Agents so that you could block them manually?

Motivation

There is a toggle called "Filter out known web crawlers" on the Inbound Filters settings page. Sometimes new crawlers appear that slip by that filter. The bad thing is that ignoring them in beforeSend doesn't always work even if a User-Agent is clear. They might be using some kind of cached versions of pages or they might be stripping script objects from pages - in any case, for us the flood of about 200 event per hour hasn't stopped even after adding this code in beforeSend():

  if (/lyticsbot/.test(window.navigator.userAgent) ||
    event.request && event.request.headers && event.request.headers['User-Agent'].search('lyticsbot') !== -1) {
    return null;
  }

Additional Context

Here is the issue page: https://sentry.io/organizations/policeone/issues/982063112/events/latest/?project=67360 It actually started after upgrading to the new JS SDK (4.6.6) from raven (3.26.4). Prior to that even the window.navigator.userAgent was sufficient for blocking errors from that bot.

The crawler's UA in almost 100% of the cases is

User-Agent: lyticsbot-external

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions