Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Bump io.delta:delta-spark_2.12 from 3.0.0 to 3.1.0 #262

Closed

Conversation

dependabot[bot]
Copy link

@dependabot dependabot bot commented on behalf of github Feb 4, 2024

Bumps io.delta:delta-spark_2.12 from 3.0.0 to 3.1.0.

Release notes

Sourced from io.delta:delta-spark_2.12's releases.

Delta Lake 3.1.0

We are excited to announce the release of Delta Lake 3.1.0. This release includes several exciting new features.

Few Highlights

  • Delta-Spark: Support for merge with deletion vectors to reduce the write overhead for merge operations. This feature improves the performance of merge by several folds.
  • Delta-Spark: Support for optimizing min/max aggregation queries using the table metadata which improves the performance of simple aggregations queries (e.g SELECT min(x) FROM deltaTable) by up to 100x.
  • Delta-Spark: Support for querying tables shared through Delta Sharing protocol.
  • Kernel: Support for data skipping for given query predicates to reduce the number of files read during the table scan.
  • Uniform: Enhanced Iceberg support for Delta tables that enables MAP and LIST types and ease of use improvements to enable Uniform on a Delta table.
  • Delta-Flink: Flink write job startup time latency improvement using Kernel.

Details by each component.

Delta Spark

Delta Spark 3.1.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

The key features of this release are:

  • Support for merge with deletion vectors to reduce the write overhead for merge operations. This feature improves the performance of merge by several folds. Refer to the documentation on deletion vectors for more information.
  • Support for optimizing min/max aggregation queries using the table metadata which improves the performance of simple aggregations queries (e.g SELECT min(x) FROM deltaTable) by up to 100x.
  • (Preview) Liquid clustering for better table layout Now Delta allows clustering the data in a Delta table for better data skipping. Currently this is an experimental feature. See documentation and example for how to try out this feature.
  • Support for DEFAULT value columns. Delta supports defining default expressions for columns on Delta tables. Delta will generate default values for columns when users do not explicitly provide values for them when writing to such tables, or when the user explicitly specifies the DEFAULT SQL keyword for any such column. See documentation on how to enable this feature and try out.
  • Support for Hive Metastore schema sync. Adds a mechanism for syncing the table schema to HMS. External tools can now directly consume the schema from HMS instead of accessing it from the Delta table directory. See the documentation on how to enable this feature.
  • Auto compaction to address the small files problem during table writes. Auto compaction which runs at the end of the write query combines small files within partitions to large files to reduce the metadata size and improve query performance. See the documentation for details on how to enable this feature.
  • Optimized write is an optimization that repartitions and rebalances data before writing them out to a Delta table. Optimized writes improve file size and reduce the small file problem as data is written and benefit subsequent reads on the table. See the documentation for details on how to enable this feature.

Other notable changes include:

  • Peformance improvement by removing redundant jobs when performing DML operations with deletion vectors.
  • Update command now writes deletions vectors by default when the table has deletion vectors enabled.
  • Support for writing partition columns to data files.
  • Support for phaseout of v2 checkpoint table feature.
  • Fix an issue with case-sensitive column names in Merge.
  • Make VACCUM command to be Delta protocol aware so that it can only vacuum tables with protocol that it supports.

Delta Sharing Spark

This release of Delta adds a new module called delta-sharing-spark which enables reading Delta tables shared using the Delta Sharing protocol in Apache Spark™. It is migrated from https://github.com/delta-io/delta-sharing/tree/main/spark repository to https://github.com/delta-io/delta/tree/master/sharing repository. Last release version of delta-sharing-spark is 1.0.4 from the previous location. Next release of delta-sharing-spark is with the current release of Delta which is 3.1.0.

Supported read types are: read snapshot of the table, incrementally read the table using streaming or read the changes (Change Data Feed) between two versions of the table.

... (truncated)

Commits
  • 71b09f0 Setting version to 3.1.0
  • 12ee152 [Spark][Sharing] Fix Delta Sharing DataFrame not updated for Snapshot Query
  • 121c1c8 [Doc][3.1] Add a link to the V2 Checkpoint specification in the DROP TABLE Fe...
  • a2357eb [Docs] Add auto-compact docs
  • 98db14c [Docs] Update version in docs
  • 8f6f3c7 [Spark][Sharing] Add doc for delta sharing
  • 6704f0a [Docs] Fix documentation for default columns
  • 85c8cb7 [Docs] Add docs for dropping table feature
  • 559d1f8 [Spark] Add Writer Protocol check in Vacuum Command
  • e0c3bfd [Kernel] Update the usage docs to reflect the recent API changes
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [io.delta:delta-spark_2.12](https://github.com/delta-io/delta) from 3.0.0 to 3.1.0.
- [Release notes](https://github.com/delta-io/delta/releases)
- [Commits](delta-io/delta@v3.0.0...v3.1.0)

---
updated-dependencies:
- dependency-name: io.delta:delta-spark_2.12
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file java Pull requests that update Java code labels Feb 4, 2024
Copy link
Author

dependabot bot commented on behalf of github Feb 22, 2024

Looks like io.delta:delta-spark_2.12 is up-to-date now, so this is no longer needed.

@dependabot dependabot bot closed this Feb 22, 2024
@dependabot dependabot bot deleted the dependabot/gradle/io.delta-delta-spark_2.12-3.1.0 branch February 22, 2024 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file java Pull requests that update Java code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants