You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, when I try to run this, I get an error from sklearn because the salary column contains NaN values:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Note: The same error is thrown when any of the key_fields containing missing values too. Eg. if I switch around salary and degree_perc in the above example.
Suggested Fix
This used to work, so either this was a recent change on SDV or in sklearn. What were we doing before? Were we dropping the NaN values, filling them or imputing them?
Also, maybe it's ok if it crashes upon first running. Maybe the user can re-run with a flag for handling them missing values.
The text was updated successfully, but these errors were encountered:
Environment Details
Error Description
The Numerical Privacy Metrics throw an error whenever the target columns (sensitive_fields) contain missing values.
Steps to Reproduce
Go through the User Guide to import & load data. Then, scroll down to the Privacy Metrics section.
The following code should work as-is according to the user guide.
However, when I try to run this, I get an error from
sklearn
because thesalary
column containsNaN
values:Note: The same error is thrown when any of the
key_fields
containing missing values too. Eg. if I switch aroundsalary
anddegree_perc
in the above example.Suggested Fix
This used to work, so either this was a recent change on
SDV
or insklearn
. What were we doing before? Were we dropping theNaN
values, filling them or imputing them?Also, maybe it's ok if it crashes upon first running. Maybe the user can re-run with a flag for handling them missing values.
The text was updated successfully, but these errors were encountered: