Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosen IP address scrubbing in span.domain tag #3572

Closed
gggritso opened this issue May 9, 2024 · 17 comments
Closed

Loosen IP address scrubbing in span.domain tag #3572

gggritso opened this issue May 9, 2024 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@gggritso
Copy link
Member

gggritso commented May 9, 2024

Problem Statement

IP addresses like 8.8.8.8 get fully scrubbed out to *.*.*.*. When users look at the requests module, all requests to IP addresses get mashed together.

Solution Brainstorm

  • no scrubbing at all (IPv4 addresses have a known cardinality. It's only 4 billion! That's fine, right? I'm kidding)
  • partial scrubbing (e.g., 8.8.8.8 becomes *.*.8.8 or *.*.*.8 or 8.8.*.*)
  • customizable IP scrubbing allowlist (let people add IPs that don't get scrubbed)
  • allow self-hosted users to turn IP scrubbing off (people who have their own infra might want to turn this off, and bear the weight of high cardinality)

Product Area

Ingestion and Filtering

@gggritso gggritso changed the title Loosen IP address scrubbing in Requests module Loosen IP address scrubbing in span.domain tag May 9, 2024
@gggritso gggritso transferred this issue from getsentry/sentry May 9, 2024
@gggritso gggritso added the enhancement New feature or request label May 9, 2024
@jjbayer
Copy link
Member

jjbayer commented May 10, 2024

customizable IP scrubbing allowlist (let people add IPs that don't get scrubbed)

I wonder if we could get to an "80%" solution by hardcoding a list of"special" IPs. E.g. reserved IPs, DNS servers, .... Unless this does not cover the majority of the cases.

@aldy505
Copy link
Contributor

aldy505 commented May 11, 2024

I'd like to see private IPs being not redacted on SaaS and self-hosted. For public IPs, I think it's better if it's redacted on SaaS and not redacted on self-hosted. Since on self-hosted, it's a different environment, it's only possible that the outgoing request is made only for those defined on the tracePropagationTargets config (from the sdk side).

@aldy505
Copy link
Contributor

aldy505 commented Jul 6, 2024

Hey @jjbayer @gggritso seeing this PR I think I can implement this one.. partially. Right now we only allow 127.0.0.1 as seen on here:

Ipv4Addr::LOCALHOST => "127.0.0.1",

But considering what @jjbayer said, it's a good idea to whitelist "special IPs", with what I had in mind are:

  • 192.168.0.0–192.168.255.255 (65,536 addresses)
  • 10.0.0.0–10.255.255.255 (16,777,216 addresses)
  • 100.64.0.0–100.127.255.255 (4,194,304 addresses)
  • 172.16.0.0–172.31.255.255 (1,048,576 addresses)
  • 192.0.0.0–192.0.0.255 (256 addresses)
  • `198.18.0.0–198.19.255.255 (131,072 addresses)

Private IPv6 address has higher address count, so I believe that's out of the question. For popular IPs like DNS servers and such, I don't think there's any good use case for it to be included in a span. Although I might be wrong here.

@Dav1dde
Copy link
Member

Dav1dde commented Jul 8, 2024

I think we should just use Ipv4Addr::is_private, if we decide to implement it.

@jjbayer
Copy link
Member

jjbayer commented Jul 8, 2024

Private addresses can still be high cardinality, so I wouldn't unconditionally allowlist them all. IMO the simplest thing we can start with is add a global option that allows self-hosted users to disable IP scrubbing. @gggritso what do you think?

@aldy505
Copy link
Contributor

aldy505 commented Jul 8, 2024

Private addresses can still be high cardinality, so I wouldn't unconditionally allowlist them all.

I'm wondering about how "high" is the high cardinality for you?

IMO the simplest thing we can start with is add a global option that allows self-hosted users to disable IP scrubbing. @gggritso what do you think?

Yes, this should also work, since self-hosted has known range. But the default should be set to false (disabled) though.

@jjbayer
Copy link
Member

jjbayer commented Jul 8, 2024

I'm wondering about how "high" is the high cardinality for you?

Can't give you an exact number, but the purpose of the scrubbing is to merge spans into "meaningful" groups. So accepting every single private IP as a separate group would be too fine grained IMO, but we could do partial scrubbing for private IPs like the issue description proposes (e.g. 192.168.*.*).

@gggritso
Copy link
Member Author

gggritso commented Jul 8, 2024

We haven't heard a lot of complaints about this issue lately, so I don't have a strong opinion. I think starting with allowing self-hosted users to turn off scrubbing completely feels like a good solution even though it'll let people degrade their own experience with Sentry. Either that, or an IP allowlist for self-hosted folks seems like it'd solve a lot of these problems!

@aldy505
Copy link
Contributor

aldy505 commented Jul 9, 2024

I'm wondering about how "high" is the high cardinality for you?

Can't give you an exact number, but the purpose of the scrubbing is to merge spans into "meaningful" groups. So accepting every single private IP as a separate group would be too fine grained IMO, but we could do partial scrubbing for private IPs like the issue description proposes (e.g. 192.168.*.*).

I'd rather go with @gggritso's solution on whitelisting a certain IPs. I'm also thinking of specifying CIDRs? Something along the line of 192.168.18.0/24 instead of declaring all 255 IPs on the relay config.

@jjbayer
Copy link
Member

jjbayer commented Jul 11, 2024

an IP allowlist for self-hosted folks

Sounds good to me. @aldy505 I'll add it to our backlog, I cannot make any promises on when this will be implemented though.

@aldy505
Copy link
Contributor

aldy505 commented Jul 11, 2024

an IP allowlist for self-hosted folks

Sounds good to me. @aldy505 I'll add it to our backlog, I cannot make any promises on when this will be implemented though.

I might be able to work on this later this weekend, can you point out to me somethings that I need to take a look at? (Other than this file https://github.com/getsentry/relay/blob/master/relay-event-normalization/src/normalize/span/description/mod.rs)

@jjbayer
Copy link
Member

jjbayer commented Jul 11, 2024

@aldy505 that would be great and very welcome!

Best would probably be to register an option for the allowlist on sentry side, add it to this list: https://github.com/getsentry/sentry/blob/23ccc70817cc6dbc43230ddd86c2f6329ecb34ab/src/sentry/relay/globalconfig.py#L13-L14, then declare it in relay as well:

/// All options passed down from Sentry to Relay.
#[derive(Default, Clone, Debug, Serialize, Deserialize, PartialEq)]
#[serde(default)]
#[cfg_attr(feature = "jsonschema", derive(JsonSchema))]
pub struct Options {
and pass the allowlist into the normalization module you referenced.

@aldy505
Copy link
Contributor

aldy505 commented Jul 11, 2024

@jjbayer Quick question, since I need to implement the allowlist here:

pub fn scrub_host(host: Host<&str>) -> Cow<'_, str> {
match host {
Host::Ipv4(ip) => Cow::Borrowed(scrub_ipv4(ip)),
Host::Ipv6(ip) => Cow::Borrowed(scrub_ipv6(ip)),
Host::Domain(domain) => scrub_domain_name(domain),
}
}

And since we can't make GlobalOptions as Arc<> since there's no struct that would hold it for long, how would you recommend me to pass the allowlist? I'm thinking of modifying the function signature, but that wouldn't be nice.

@jjbayer
Copy link
Member

jjbayer commented Jul 11, 2024

I have no better idea than passing it all the way through the call stack, like we do here:

let ai_model_costs = global_config.ai_model_costs.clone().ok();

@aldy505
Copy link
Contributor

aldy505 commented Jul 11, 2024

I see, thanks. Might do this tomorrow.

jjbayer added a commit that referenced this issue Aug 2, 2024
As discussed on #3572.

### Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated
in the State of Delaware in 2015 as Functional Software, Inc. and is
gonna need some rights from me in order to utilize my contributions in
this here PR. So here's the deal: I retain all rights, title and
interest in and to my contributions, and by keeping this boilerplate
intact I confirm that Sentry can use, modify, copy, and redistribute my
contributions, under Sentry's choice of terms.

---------

Co-authored-by: Joris Bayer <[email protected]>
@jjbayer
Copy link
Member

jjbayer commented Aug 2, 2024

The allowlist approach is now implemented and merged (see linked PRs).

@jjbayer jjbayer closed this as completed Aug 2, 2024
@gggritso
Copy link
Member Author

gggritso commented Aug 2, 2024

Awesome!

jjbayer added a commit to getsentry/sentry that referenced this issue Aug 5, 2024
…74195)

Complementary relay config for
getsentry/relay#3813, based on the discussion on
getsentry/relay#3572

### Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated
in the State of Delaware in 2015 as Functional Software, Inc. and is
gonna need some rights from me in order to utilize my contributions in
this here PR. So here's the deal: I retain all rights, title and
interest in and to my contributions, and by keeping this boilerplate
intact I confirm that Sentry can use, modify, copy, and redistribute my
contributions, under Sentry's choice of terms.

---------

Co-authored-by: Joris Bayer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

4 participants