Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy Metrics error if target column has missing values #135

Open
npatki opened this issue Nov 23, 2021 · 1 comment
Open

Privacy Metrics error if target column has missing values #135

npatki opened this issue Nov 23, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@npatki
Copy link
Contributor

npatki commented Nov 23, 2021

Environment Details

  • SDV version: 0.13.0
  • Python version: 3.8.9
  • Operating System: MacOS

Error Description

The Numerical Privacy Metrics throw an error whenever the target columns (sensitive_fields) contain missing values.

Steps to Reproduce

Go through the User Guide to import & load data. Then, scroll down to the Privacy Metrics section.

The following code should work as-is according to the user guide.

NumericalLR.compute( real_data, synthetic_data,
    key_fields=['second_perc', 'mba_perc', 'degree_perc'],
    sensitive_fields=['salary'])

However, when I try to run this, I get an error from sklearn because the salary column contains NaN values:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Note: The same error is thrown when any of the key_fields containing missing values too. Eg. if I switch around salary and degree_perc in the above example.

Suggested Fix

This used to work, so either this was a recent change on SDV or in sklearn. What were we doing before? Were we dropping the NaN values, filling them or imputing them?

Also, maybe it's ok if it crashes upon first running. Maybe the user can re-run with a flag for handling them missing values.

@npatki npatki added bug Something isn't working metrics labels Nov 23, 2021
@npatki npatki transferred this issue from sdv-dev/SDV Jun 10, 2022
@npatki npatki removed the metrics label Jul 14, 2022
@npatki
Copy link
Contributor Author

npatki commented Jul 14, 2022

Closing a duplicate of #58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant