Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Integer Feature Type #564

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add Integer Feature Type #564

wants to merge 2 commits into from

Conversation

mrunfeldt
Copy link
Contributor

In conversion from data type to OP type and back, the FeatureSparkTypes class in TMOG converts Int data to Long. This causes failures downstream when the types do not match up. This has become a problem now because we are splitting AutoML into 3 separate, modular stages (DataPrep, FeatEng, Modeling).

Describe the proposed solution
Add an Integer feature type. This adds lots of code added to the tmog dependency, but clean logic in modular automl (we don't have to track type conversions).

Describe alternatives you've considered
When the DataPrepStage converts Int features to Long, we can just convert them all back to Int. This would require no changes to Tmog (delete this PR) and small code change in automl, but then we will always

Additional context
Any insight into why an Integral[Long] feature was used for input data of [Int] would be helpful. My guess is that there was no need to maintain input type when this was all one stage (output is always scores and metrics), so it made sense to use the type with more bits.

@mrunfeldt
Copy link
Contributor Author

Hello @tovbinm @leahmcguire . We'd love your input on this if you find some time

@tovbinm
Copy link
Collaborator

tovbinm commented Oct 7, 2021

The original reason to only have Integral (backed by Long) was to minimize the amount of types we had to manage as it comes with a cost. Since Long can contain Int values, but not vice versa, we thought it would serve us great for both scenarios.

I would like to know more what kind of problems downstream you're experiencing. We can perhaps discuss this in a private conversation?

@leahmcguire
Copy link
Collaborator

If you are moving in and out of Tmog it makes sense to add this. When writing code that interacted across pure spark I wrote converters back and forth in the past but that required that you realize that you are going to get this error with type definitions when you move back and forth. This is certainly the cleaner solution. One point is if you have looked through all of the existing types to ensure that this doesn't change (and should not change) the inheritance model.

@mrunfeldt
Copy link
Contributor Author

Thank you so very much for your feedback, Leah and Matthew!

Matthew, if you'd like to discuss more, please email me and we can set-up a time for hangout or just talk via email:
[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants