You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem
When using DateToUnitCircleTransformer, null dates are replaced with (0,0), which is not on the unit circle.
Also with the example of DateToUnitCircleTransformer with TimePeriod HourOfDay, dates with format MM-DD-YYYY are converted to MM-DD-YYYY 00h00m00s, hence will have a circular representation of (1, 0).
We would expect the null values being (1, 0) as well.
Solution
Using (1, 0) instead of (0, 0) for null default value.
Alternatives
Alternatives do not only concern this transformer but the other vectorizer that can return the mode as imputation technique.
Instead of getting the mode, randomly select an existing non null value so that the distribution of the feature is not changed.
However, this remains difficult :
DateToUnitCircleTransformer is not an estimator
As an estimator, you would store as a fitted param all the distinct non null values of the dataset.
Additional context
This is in the context where we have this HourOfDay circular representation of a MM-DD-YYYY 00h00m00s date not being thrown out by SanityChecker because of Variance being not 0.
The text was updated successfully, but these errors were encountered:
Problem
When using DateToUnitCircleTransformer, null dates are replaced with (0,0), which is not on the unit circle.
Also with the example of DateToUnitCircleTransformer with TimePeriod HourOfDay, dates with format
MM-DD-YYYY
are converted toMM-DD-YYYY 00h00m00s
, hence will have a circular representation of (1, 0).We would expect the null values being (1, 0) as well.
Solution
Using (1, 0) instead of (0, 0) for null default value.
Alternatives
Alternatives do not only concern this transformer but the other
vectorizer
that can return the mode as imputation technique.Instead of getting the mode, randomly select an existing non null value so that the distribution of the feature is not changed.
However, this remains difficult :
Additional context
This is in the context where we have this HourOfDay circular representation of a
MM-DD-YYYY 00h00m00s
date not being thrown out by SanityChecker because of Variance being not 0.The text was updated successfully, but these errors were encountered: