Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule based anonymization of URL patterns #422

Closed
m90 opened this issue Jul 7, 2020 · 0 comments
Closed

Rule based anonymization of URL patterns #422

m90 opened this issue Jul 7, 2020 · 0 comments

Comments

@m90
Copy link
Member

m90 commented Jul 7, 2020

Often, potentially sensitive (in a sense that it could identify the user) information can be present in URLs. Some of these issues can be resolved using canonical meta tags, but that mostly applies to when the information is present in the query string (example: example.com/editprofile?id=1234 => example.com/editprofile).

However, there are cases where information that is part of the path can be considered sensitive, but there is no canonical URL to reduce these too, as they do point to different content. For example example.com/profiles/[id] might reveal information about users activity if the URL is only accessible after logging in and the ids are specific to the user.

Approach: Anonymizing data in the Auditorium

Operators define rules for anonymizing data in the Auditorium (most likely using a pattern / regexp based approach like:

  • /profiles/{user_id:[a-z0-9]*} is defined as an anonymization rule
  • Pageviews on /profiles/alice and /profiles/bob will be collapsed and aggregated into /profiles/user_id

Rules are stored on the server. On load the Auditorium fetches the ruleset and applies it to the query results before displaying these.

Pros

  • Easy to implement technically
  • Rules can be changed at any time and will be applied retroactively, i.e. leaks can be fixed after they have been discovered

Cons

  • Pattern language for defining rules might be complicated for some operators. (How are bad patterns handled?)
  • Data is only being anonymized at display time, i.e. if going great lengths, operators can undo the application of these rules.

Approach: Anonymizing data on collection

Operators define rules for anonymizing data when deploying the Offen script, either using a data attribute or a JavaScript global, e.g.:

<script src="/script.js" data-anonymize="/profiles/{user_id:[a-z0-9]*}" data-account-id="the-account-id">

The script reads these rules and sanitizes data before they are being encrypted.

Pros

  • Data is never being stored in the non-anonymized form, privacy is preserved perfectly

Cons

  • Deployment is relatively complicated for operators, there is no graphical UI that can give you immediate feedback (the Auditorium could contain a "tester" applet though, or a offen rule subcommand is added to the command)
  • Rules cannot be applied retroactively, data that has been altered by a rule once cannot be changed anymore.
@m90 m90 added this to Backlog in Roadmap Jul 7, 2020
@m90 m90 changed the title Rule based anonymization of data Rule based anonymization of URL patterns Jul 8, 2020
@m90 m90 closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Roadmap
  
Backlog
Development

No branches or pull requests

1 participant