-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Scala 2.12 / Spark 3 upgrade #550
base: master
Are you sure you want to change the base?
Conversation
… made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b'
@@ -192,7 +192,7 @@ class FeatureDistributionTest extends FlatSpec with PassengerSparkFixtureTest wi | |||
val fd2 = FeatureDistribution("A", None, 20, 20, Array(2, 8, 0, 0, 12), Array.empty) | |||
fd1.hashCode() shouldBe fd1.hashCode() | |||
fd1.hashCode() shouldBe fd1.copy(summaryInfo = fd1.summaryInfo).hashCode() | |||
fd1.hashCode() should not be fd1.copy(summaryInfo = Array.empty).hashCode() | |||
fd1.hashCode() shouldBe fd1.copy(summaryInfo = Array.empty).hashCode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tovbinm I just want to make sure this is correct. In principle hashCode equality and equals should be consistent and this is what I'm trying to accomplish here, but I figured you might have had a reason for wanting something different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test was invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for deleting all the join functionality from data readers in this PR?
@leahmcguire I think it is just not being used - #550 (comment) |
We should be careful in how we define unused in a public project. Also that functionality would be needed to migrate projects on Transmogrifai V0... |
…eration in scala 2.12
…tMapVectorizerTest
… rely on spark.ml)
Hey @tovbinm,there's a unit test failure I've been investigating that's the result of a bug in Spark: https://issues.apache.org/jira/browse/SPARK-34805?focusedCommentId=17337491&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17337491. I'm wondering why I'm assuming Spark won't fix this any time soon, and I'm having trouble finding an alternative way of putting in the metadata in the schema of |
Also pinging @Jauntbox (we know you're out there!) for question above. |
This is a known issue indeed. We have been copying over the metadata between fields each time we apply our transformers, e.g |
I mean that there is a new problem with Spark 3.1. Even |
|
… in OpWorkflowModelLocal
…ata actually works well enough when all input data schema metadata is set properly
Hello, any estimation on when we can get this PR ready? Thank you! |
@hedibejaoui , we are running internal forks of TransmogrifAI and MLeap on Spark 3.1.1, so the bulk of the work has been done. For public release, the MLeap dependency needs to be upgraded now that they're on Spark 3 too: combust/mleap#765 But since they've upgraded to Spark 3.0.2 and TransmogrifAI to 3.1.1, we have some testing left to do. |
@nicodv Thanks for the information. Actually, we are using Spark 3.0.x because of some internal dependencies, any chance we get a public release of TransmogrifAI for that version? |
Hello, When do you think this PR will merged for the public use? Thank you! |
Hi, Thank you |
Hi, we are waiting for the new PR adds. When it will be available ? Thanks |
Related issues
#336
#332
Describe the proposed solution
Upgrade to Scala 2.12 and Spark 3
Describe alternatives you've considered
Living in the past, suffering from security issues and missing out on feature and speed improvements
Additional context
Add any other context about the changes here.