Open
Description
Problem Description
The Privacy Metrics assume an adversarial attack model where a user with access to a few key_fields
might be able to predict sensitive_fields
.
I understand that we need to fit different models based on whether the sensitive_fields
are categorical vs. numeric. However, it is expected that all the key_fields
are also of the same type. Does this need to be the case? What if I think some categorical columns might be crucial in leaking numeric data (and vice versa)?
Expected behavior
Depending on the type of the sensitive_fields
, it would be nice to convert the input columns so that they are compatible with the tests.
- If the
sensitive_fields
are numeric, then we can convert categoricalkey_fields
to numeric similar to how we do it in KSTestExtended - If the
sensitive_fields
are categorical, then it may be possible to bin thekey_fields
Additional context
- What should the user API be? It would be ideal to guide the user into making a choice (to drop the columns or convert them)
- Should we be converting the columns ourselves or should we expect users to do this first (eg. using a transformer)?