Skip to content

Releases: IBM/spark-s3-shuffle

Integrate build for Spark 3.2.3

05 Apr 07:39
Compare
Choose a tag to compare

This build integrates pre-built binaries for Spark 3.2.3.

Bug fixes and performance improvements

16 Mar 14:25
Compare
Choose a tag to compare

Configuration changes:

  • Renamed shuffle manager to from org.apache.spark.shuffle.S3ShuffleManager to org.apache.spark.shuffle.sort.S3ShuffleManager.

Bugfixes:

  • Fixed issue in serialized shuffle which prevent TeraSort to work properly.
  • Fixed off-by-one error in by-pass shuffle-block-manager (thanks to @fhalde).

Improvements:

  • This plugin now relies on S3ShuffleDataIO to write the shuffle output to the target location.
  • Created the optional interface createSingleFileMapOutputWriter for the S3ShuffleDataIO component. This improved performance when Spark is able to write a shuffle-file without spilling.
  • Removed S3SortShuffleWriter and S3BypassMergeSortShuffleWriter since these classes replicated already existing features.

Deprecated options

  • spark.shuffle.s3.forceBypassMergeSort
  • spark.shuffle.s3.allowSerializedShuffle
  • spark.shuffle.s3.sort.cloneRecords

CI:

  • Use travis to build releases.
  • Added Spark 3.3.2 as a build target.

Release version 0.5

13 Jul 15:49
3d2b56d
Compare
Choose a tag to compare

Changes from the initial open-source release:

  • Refactored the scala classes from the sort into the shuffle package: SparkS3Shuffle needs to be activated with
     --conf spark.shuffle.manager="org.apache.spark.shuffle.S3ShuffleManager"
     --conf spark.shuffle.sort.io.plugin.class="org.apache.spark.shuffle.S3ShuffleDataIO"
    
  • Creation of an io plugin class so that we are able to leverage more of the Spark Shuffle infrastructure
  • Integration of SerializedShuffle (can be disabled with spark.shuffle.s3.allowSerializedShuffle)
  • Added flag spark.shuffle.s3.sort.cloneRecords which copies Array[_] Key/value pairs in SortShuffle before insertion into the Spark ExternalSorter.
  • Migrated from maven to sbt.
  • Github Actions creates builds for Spark 3.2.0, 3.2.1 and 3.3.0 for both Scala 2.12 and 2.13.
  • Added additional tests for Sort Shuffle.

Initial release with SortShuffle fix.

13 Jul 09:56
5cc4f8c
Compare
Choose a tag to compare
v0.4

CI: Fix version in build. (#5)