Skip to content

Commit 9713dfd

Browse files
krasinskih2o-ops
andauthored
[GH-5670] Add Spark 3.5 support (#5664)
* Add Spark 3.5 support * fix py docs * add py 3.10 3.11 * Update Docker Image Version * rowencoder expression encoder fix --------- Co-authored-by: h2o-ops <[email protected]>
1 parent 342fa82 commit 9713dfd

File tree

19 files changed

+298
-26
lines changed

19 files changed

+298
-26
lines changed

README.rst

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,9 @@ Getting Started
1818
User Documentation
1919
~~~~~~~~~~~~~~~~~~
2020

21-
`Read the documentation for Spark 3.4 <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/index.html>`__ (or
22-
`3.3 <http://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/index.html>`__ ,
21+
`Read the documentation for Spark 3.5 <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/index.html>`__ (or
22+
`3.4 <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/index.html>`__ ,
23+
`3.3 <http://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html>`__ ,
2324
`3.2 <http://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/index.html>`__ ,
2425
`3.1 <http://docs.h2o.ai/sparkling-water/3.1/latest-stable/doc/index.html>`__,
2526
`3.0 <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/index.html>`__,
@@ -29,7 +30,8 @@ User Documentation
2930
Download Binaries
3031
~~~~~~~~~~~~~~~~~
3132

32-
`Download the latest version for Spark 3.4 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.4/latest.html>`__ (or
33+
`Download the latest version for Spark 3.5 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.5/latest.html>`__ (or
34+
`3.4 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.4/latest.html>`__,
3335
`3.3 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.3/latest.html>`__,
3436
`3.2 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.2/latest.html>`__,
3537
`3.1 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.1/latest.html>`__,
@@ -95,20 +97,20 @@ Use Sparkling Water with PySpark
9597
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9698
Sparkling Water can be also used directly from PySpark and the integration is called PySparkling.
9799

98-
See `PySparkling README <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/pysparkling.html>`__ to learn about PySparkling.
100+
See `PySparkling README <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/pysparkling.html>`__ to learn about PySparkling.
99101

100102
Use Sparkling Water via Spark Packages
101103
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102104

103-
To see how Sparkling Water can be used as Spark package, please see `Use as Spark Package <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/tutorials/use_as_spark_package.html>`__.
105+
To see how Sparkling Water can be used as Spark package, please see `Use as Spark Package <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/tutorials/use_as_spark_package.html>`__.
104106

105107
Use Sparkling Water in Windows environments
106108
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
107-
See `Windows Tutorial <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/tutorials/run_on_windows.html>`__ to learn how to use Sparkling Water in Windows environments.
109+
See `Windows Tutorial <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/tutorials/run_on_windows.html>`__ to learn how to use Sparkling Water in Windows environments.
108110

109111
Sparkling Water examples
110112
~~~~~~~~~~~~~~~~~~~~~~~~
111-
To see how to run examples for Sparkling Water, please see `Running Examples <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/devel/running_examples.html>`__.
113+
To see how to run examples for Sparkling Water, please see `Running Examples <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/devel/running_examples.html>`__.
112114

113115
Maven packages
114116
~~~~~~~~~~~~~~
@@ -140,26 +142,26 @@ backend. The backend can be specified before creation of the
140142
``H2OContext``.
141143

142144
For more details regarding the internal or external backend, please see
143-
`Backends <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/deployment/backends.html>`__.
145+
`Backends <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/deployment/backends.html>`__.
144146

145147
--------------
146148

147149
FAQ
148150
---
149151

150-
List of all Frequently Asked Questions is available at `FAQ <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/FAQ.html>`__.
152+
List of all Frequently Asked Questions is available at `FAQ <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/FAQ.html>`__.
151153

152154
--------------
153155

154156
Development
155157
-----------
156158

157-
Complete development documentation is available at `Development Documentation <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/devel/devel.html>`__.
159+
Complete development documentation is available at `Development Documentation <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/devel/devel.html>`__.
158160

159161
Build Sparkling Water
160162
~~~~~~~~~~~~~~~~~~~~~
161163

162-
To see how to build Sparkling Water, please see `Build Sparkling Water <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/devel/build.html>`__.
164+
To see how to build Sparkling Water, please see `Build Sparkling Water <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/devel/build.html>`__.
163165

164166
Develop applications with Sparkling Water
165167
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -187,7 +189,7 @@ We also respond to questions tagged with sparkling-water and h2o tags on the `St
187189
Change Logs
188190
~~~~~~~~~~~
189191

190-
Change logs are available at `Change Logs <http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/CHANGELOG.html>`__.
192+
Change logs are available at `Change Logs <http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/CHANGELOG.html>`__.
191193

192194
---------------
193195

core/src/test/scala/ai/h2o/sparkling/TestUtils.scala

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,11 @@
1616
*/
1717
package ai.h2o.sparkling
1818

19+
import ai.h2o.sparkling.sql.catalyst.encoders.RowEncoder
1920
import java.io.File
20-
import java.nio.file.Files
2121
import java.sql.Timestamp
22-
2322
import org.apache.spark.mllib
2423
import org.apache.spark.rdd.RDD
25-
import org.apache.spark.sql.catalyst.encoders.{ExpressionEncoder, RowEncoder}
2624
import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
2725
import org.apache.spark.sql.functions.{lit, rand}
2826
import org.apache.spark.sql.types._
@@ -130,7 +128,7 @@ object TestUtils extends Matchers {
130128
spark: SparkSession,
131129
schemaHolder: SchemaHolder,
132130
settings: GenerateDataFrameSettings): DataFrame = {
133-
implicit val encoder: ExpressionEncoder[Row] = RowEncoder(schemaHolder.schema)
131+
implicit val encoder = RowEncoder(schemaHolder.schema)
134132
val numberOfPartitions = Math.max(1, settings.numberOfRows / settings.rowsPerPartition)
135133
spark
136134
.range(settings.numberOfRows)

gradle-spark3.3.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
sparkVersion=3.3.2
22
minSupportedJavaVersion=1.8
3-
supportedEmrVersion=emr-6.10.0
3+
supportedEmrVersion=emr-6.11.1
44
unsupportedMinorSparkVersions=
55
scalaVersion=2.12.15
66
databricksVersion=11.0.x-cpu-ml-scala2.12

gradle-spark3.4.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
sparkVersion=3.4.1
22
minSupportedJavaVersion=1.8
3-
supportedEmrVersion=emr-6.10.0
3+
supportedEmrVersion=emr-6.13.0
44
unsupportedMinorSparkVersions=
55
scalaVersion=2.12.17
66
databricksVersion=13.0.x-cpu-ml-scala2.12

gradle-spark3.5.properties

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
sparkVersion=3.5.0
2+
minSupportedJavaVersion=1.8
3+
supportedEmrVersion=emr-6.10.0
4+
unsupportedMinorSparkVersions=
5+
scalaVersion=2.12.18
6+
databricksVersion=14.0.x-cpu-ml-scala2.12
7+
fabricK8sClientVersion=6.4.1
8+
executorOverheadMemoryOption=spark.executor.memoryOverhead
9+
driverOverheadMemoryOption=spark.driver.memoryOverhead
10+
supportedPythonVersions=3.8 3.9 3.10 3.11

gradle.properties

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,15 +19,15 @@ systemProp.org.gradle.internal.publish.checksums.insecure=true
1919
# Version of Terraform used in the script creating the docker image
2020
terraformVersion=0.12.25
2121
# Version of docker image used in Jenkins tests
22-
dockerImageVersion=83
22+
dockerImageVersion=84
2323
# Is this build nightly build
2424
isNightlyBuild=false
2525
# Supported Major Spark Versions
26-
supportedSparkVersions=2.3 2.4 3.0 3.1 3.2 3.3 3.4
26+
supportedSparkVersions=2.3 2.4 3.0 3.1 3.2 3.3 3.4 3.5
2727
# The list of python environments used in automated tests
2828
pythonEnvironments=3.6 3.7 3.8 3.9 3.10 3.11
2929
# Select for which Spark version is Sparkling Water built by default
30-
spark=3.4
30+
spark=3.5
3131
# Sparkling Water Version
3232
version=3.44.0.1-1-SNAPSHOT
3333
# Spark version from which is Kubernetes Supported

py-scoring/README.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ This package contains just functionality for scoring with Sparkling Water, H2O-3
77

88
Documentation describing scoring with H2O-3 MOJO models is located at:
99

10+
- For Spark 3.5 - https://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/deployment/load_mojo.html
1011
- For Spark 3.4 - https://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/deployment/load_mojo.html
1112
- For Spark 3.3 - https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/deployment/load_mojo.html
1213
- For Spark 3.2 - https://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/deployment/load_mojo.html

py/README.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ to use this package for scoring with Driverless AI MOJO models.
88

99
PySparkling Documentation is hosted at our documentation page:
1010

11+
- For Spark 3.5 - http://docs.h2o.ai/sparkling-water/3.5/latest-stable/doc/pysparkling.html
1112
- For Spark 3.4 - http://docs.h2o.ai/sparkling-water/3.4/latest-stable/doc/pysparkling.html
1213
- For Spark 3.3 - http://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/pysparkling.html
1314
- For Spark 3.2 - http://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/pysparkling.html

scoring/src/main/scala/ai/h2o/sparkling/ml/models/H2OMOJOPipelineModel.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,21 +17,21 @@
1717

1818
package ai.h2o.sparkling.ml.models
1919

20-
import java.io._
2120
import ai.h2o.mojos.runtime.MojoPipeline
2221
import ai.h2o.mojos.runtime.api.{MojoPipelineService, PipelineConfig}
2322
import ai.h2o.mojos.runtime.frame.MojoColumn.Type
2423
import ai.h2o.mojos.runtime.frame.MojoFrame
2524
import ai.h2o.sparkling.ml.params.{H2OAlgorithmMOJOParams, H2OBaseMOJOParams, HasFeatureTypesOnMOJO}
26-
import org.apache.spark.ml.param._
27-
import org.apache.spark.sql._
25+
import ai.h2o.sparkling.sql.catalyst.encoders.RowEncoder
2826
import com.google.common.collect.Iterators
2927
import org.apache.spark.annotation.DeveloperApi
3028
import org.apache.spark.ml.Model
31-
import org.apache.spark.sql.catalyst.encoders.RowEncoder
29+
import org.apache.spark.ml.param._
30+
import org.apache.spark.sql._
3231
import org.apache.spark.sql.functions._
3332
import org.apache.spark.sql.types._
3433

34+
import java.io._
3535
import scala.collection.JavaConverters._
3636

3737
class H2OMOJOPipelineModel(override val uid: String)

utils/src/main/scala/ai/h2o/sparkling/ml/utils/SchemaUtils.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@
1717

1818
package ai.h2o.sparkling.ml.utils
1919

20+
import ai.h2o.sparkling.sql.catalyst.encoders.RowEncoder
2021
import org.apache.spark.ml.attribute.AttributeGroup
2122
import org.apache.spark.rdd.RDD
22-
import org.apache.spark.sql.catalyst.encoders.RowEncoder
2323
import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
2424
import org.apache.spark.sql.functions.col
2525
import org.apache.spark.sql.types._

0 commit comments

Comments
 (0)