Skip to content

Support mixed types in Privacy Metrics #134

Open
@npatki

Description

@npatki

Problem Description

The Privacy Metrics assume an adversarial attack model where a user with access to a few key_fields might be able to predict sensitive_fields.

I understand that we need to fit different models based on whether the sensitive_fields are categorical vs. numeric. However, it is expected that all the key_fields are also of the same type. Does this need to be the case? What if I think some categorical columns might be crucial in leaking numeric data (and vice versa)?

Expected behavior

Depending on the type of the sensitive_fields, it would be nice to convert the input columns so that they are compatible with the tests.

  1. If the sensitive_fields are numeric, then we can convert categorical key_fields to numeric similar to how we do it in KSTestExtended
  2. If the sensitive_fields are categorical, then it may be possible to bin the key_fields

Additional context

  • What should the user API be? It would be ideal to guide the user into making a choice (to drop the columns or convert them)
  • Should we be converting the columns ourselves or should we expect users to do this first (eg. using a transformer)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions