Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null Imputation for DateUnitCircleVectorizer #555

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

michaelweilsalesforce
Copy link
Contributor

@michaelweilsalesforce michaelweilsalesforce commented Jun 7, 2021

Related issues
Issue

Describe the proposed solution
Using (1, 0) instead of (0, 0) for null default value.

Describe alternatives you've considered
Alternatives do not only concern this transformer but the other vectorizer that can return the mode as imputation technique.
Instead of getting the mode, randomly select an existing non null value so that the distribution of the feature is not changed.
However, this remains difficult :

  • DateToUnitCircleTransformer is not an estimator
  • As an estimator, you would store as a fitted param all the distinct non null values of the dataset.

Additional context
This is in the context where we have this HourOfDay circular representation of a MM-DD-YYYY 00h00m00s date not being thrown out by SanityChecker because of Variance not being 0.

@salesforce-cla
Copy link

salesforce-cla bot commented Jun 7, 2021

Thanks for the contribution! It looks like @mweilsalesforce is an internal user so signing the CLA is not required. However, we need to confirm this.

@codecov
Copy link

codecov bot commented Jun 7, 2021

Codecov Report

Merging #555 (49425a8) into master (8dc772a) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 49425a8 differs from pull request most recent head 0cab7fb. Consider uploading reports for the commit 0cab7fb to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #555      +/-   ##
==========================================
- Coverage   86.78%   86.78%   -0.01%     
==========================================
  Files         347      347              
  Lines       12026    12024       -2     
  Branches      403      387      -16     
==========================================
- Hits        10437    10435       -2     
  Misses       1589     1589              
Impacted Files Coverage Δ
...s/impl/feature/DateMapToUnitCircleVectorizer.scala 100.00% <100.00%> (ø)
...ges/impl/feature/DateToUnitCircleTransformer.scala 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f82a301...0cab7fb. Read the comment docs.

Copy link

@emitc2h emitc2h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty clean to me!

@michaelweilsalesforce michaelweilsalesforce changed the title [WIP] Null Imputation for DateUnitCircleVectorizer Null Imputation for DateUnitCircleVectorizer Jun 7, 2021
Copy link
Collaborator

@TuanNguyen27 TuanNguyen27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@michaelweilsalesforce
Copy link
Contributor Author

How do I get rid of this CLA stuff?

@TuanNguyen27
Copy link
Collaborator

@michaelweilsalesforce did you fill out the form ?

@michaelweilsalesforce
Copy link
Contributor Author

michaelweilsalesforce commented Jun 7, 2021 via email

@anish anish closed this Jun 15, 2021
@anish anish reopened this Jun 15, 2021
@anish
Copy link

anish commented Jun 15, 2021

kicking cla bot

@anish anish closed this Jun 15, 2021
@anish anish reopened this Jun 15, 2021
@anish
Copy link

anish commented Jun 15, 2021

@michaelweilsalesforce do you have your salesforce email associated with @mweilsalesforce as well ? Or did you accidentally switch accounts and not realize it ? The cla bot is seeing two authors and it needs both authors to sign cla.

@michaelweilsalesforce
Copy link
Contributor Author

@michaelweilsalesforce do you have your salesforce email associated with @mweilsalesforce as well ? Or did you accidentally switch accounts and not realize it ? The cla bot is seeing two authors and it needs both authors to sign cla.

Both emails are [email protected] and [email protected]

@michaelweilsalesforce
Copy link
Contributor Author

Somehow I couldn't attach my salesforce account because it was linked to github enterprise. That's why I have created this other account

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants