Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 2.4 support #402

Merged
merged 64 commits into from
Feb 19, 2020
Merged

Spark 2.4 support #402

merged 64 commits into from
Feb 19, 2020

Conversation

tovbinm
Copy link
Collaborator

@tovbinm tovbinm commented Sep 4, 2019

Related issues
We are currently on Spark 2.3, but 2.4 is out

Describe the proposed solution
Upgrade to Spark 2.4

Describe alternatives you've considered
N/A

Additional Context
This branch to be parked here until we are ready to upgrade to Spark 2.4

@codecov
Copy link

codecov bot commented Sep 4, 2019

Codecov Report

Merging #402 into master will decrease coverage by 5.31%.
The diff coverage is 95.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #402      +/-   ##
==========================================
- Coverage    86.9%   81.59%   -5.32%     
==========================================
  Files         340      341       +1     
  Lines       11450    11507      +57     
  Branches      364      374      +10     
==========================================
- Hits         9951     9389     -562     
- Misses       1499     2118     +619
Impacted Files Coverage Δ
...sforce/op/stages/impl/feature/Transmogrifier.scala 95.82% <100%> (-2.23%) ⬇️
...c/main/scala/com/salesforce/op/ModelInsights.scala 93.06% <100%> (+0.31%) ⬆️
...m/salesforce/op/utils/spark/OpVectorMetadata.scala 85.93% <85.71%> (+0.48%) ⬆️
...om/salesforce/op/SensitiveFeatureInformation.scala 97.14% <97.14%> (ø)
...ce/op/stages/impl/feature/TextLenTransformer.scala 100% <0%> (ø) ⬆️
...op/stages/impl/feature/TimePeriodTransformer.scala 100% <0%> (ø) ⬆️
...ala/com/salesforce/op/utils/tuples/RichTuple.scala 100% <0%> (ø) ⬆️
...force/op/stages/impl/feature/OpIndexToString.scala 100% <0%> (ø) ⬆️
...alesforce/op/aggregators/TimeBasedAggregator.scala 100% <0%> (ø) ⬆️
...com/salesforce/op/test/TestOpWorkflowBuilder.scala 100% <0%> (ø) ⬆️
... and 62 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c64eb51...d79bcc3. Read the comment docs.

@tovbinm tovbinm mentioned this pull request Sep 25, 2019
@@ -128,7 +128,7 @@ Start by picking TransmogrifAI version to match your project dependencies from t

| TransmogrifAI Version | Spark Version | Scala Version | Java Version |
|-------------------------------------------------------|:-------------:|:-------------:|:------------:|
| 0.6.2 (unreleased, master) | 2.3 | 2.11 | 1.8 |
| 0.7.0 (unreleased, master) | 2.4 | 2.11 | 1.8 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for a future PR: I really hope we start making strides to automating this stuff. We should have here placeholders that are populated from build. Even if it's some quick and dirty version of sed script

build.gradle Show resolved Hide resolved
build.gradle Show resolved Hide resolved
@@ -69,20 +69,16 @@ task copyTemplates(type: Copy) {
fileName.replace(".gradle.template", ".gradle")
}
expand([
databaseHostname: 'db.company.com',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we absolutely need to change this file ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a cleanup, why not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the PR is big enough, and for easier maintenance I hoped to keep it focussed on the Spark upgrade

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this has no material difference for the change. You always get a bonus of such with my PRs 😛

@@ -138,7 +138,7 @@ case class AutomaticSchema(recordClassName: String)(dataFile: File) extends Sche
case Some(actualType) =>
val newSchema = Schema.create(actualType)
val schemaField =
new Schema.Field(field.name, newSchema, "auto-generated", orgSchemaField.defaultValue)
new Schema.Field(field.name, newSchema, "auto-generated", orgSchemaField.defaultVal())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this change? defaultValue is deprecated but still there, technically not required to change until it's dropped entirely

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am keeping the existing functionality as is. We can change the behavior separately.

@tovbinm
Copy link
Collaborator Author

tovbinm commented Jan 30, 2020

@gerashegalov thanks for reviewing. I hope we can get this merged and released soon. Especially because I don't even see an option now to download Spark lower than 2.4 from the official website, so our users might struggle to run current version of TransmogrifAI.

Copy link
Contributor

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGETM

@gerashegalov
Copy link
Contributor

@nicodv feel free to address the comments that make sense now (or not) and merge.

@Helw150
Copy link

Helw150 commented Feb 13, 2020

@tovbinm Do you have an ETA on merging this? I'm using Spark 2.4, but would love to get started doing a proposed POC with TransmogrifAI so am waiting excitedly!

@tovbinm
Copy link
Collaborator Author

tovbinm commented Feb 14, 2020

I bumped Spark to 2.4.5 (it was just recently released). Should we bump other dependencies as well, e.g. mleap?

@nicodv
Copy link
Contributor

nicodv commented Feb 15, 2020

@tovbinm , could you please revert the commits related to Spark 2.4.5? We're planning to do an internal release of TMog on 2.4.4 soon, and don't want to redo any testing / take on additional risk (even if it's just a patch release).

We can do a separate upgrade of TMog to Spark 2.4.5 soon after that in a separate PR.

@tovbinm
Copy link
Collaborator Author

tovbinm commented Feb 15, 2020

@nicodv ok

@nicodv nicodv merged commit fbad63d into master Feb 19, 2020
@nicodv nicodv deleted the revert-399-mt/revert-spark-2.4 branch February 19, 2020 20:18
@leahmcguire
Copy link
Collaborator

Woohooo!

@tovbinm
Copy link
Collaborator Author

tovbinm commented Feb 19, 2020

Made my day!

@salesforce-cla
Copy link

Thanks for the contribution! It looks like @Jauntbox is an internal user so signing the CLA is not required. However, we need to confirm this.

@salesforce-cla
Copy link

Thanks for the contribution! Before we can merge this, we need @wsuchy to sign the Salesforce.com Contributor License Agreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants