-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Allow type widening for all supported type changes with Spark 4.0 #3024
[Spark] Allow type widening for all supported type changes with Spark 4.0 #3024
Conversation
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
Will be in future an option to change the column type of a table from int to string without overwriting the entire table? Unless such an option is now available (but I don't remember that) |
There's no plan currently to support other type changes than the ones mentioned in the PR description. Converting values when reading from a table that had one of these widening type changes applied can be easily done directly in the Parquet reader, but other type changes are harder either because:
|
66006ff
to
1cf90d3
Compare
6f6825a
to
cce8b48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, left some comments
|
||
/** | ||
* Type widening only supports a limited set of type changes with Spark 3.5 due to the parquet | ||
* readers lacking the corresponding conversions that were added in Spark 4.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the mechanics but shouldn't this go in a scala-spark-4.0
directory instead of master? What happens when the 4.0 is cut/released?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark 4.0 isn't cut yet so that's not possible, the build system only knows master and latest (3.5) currently. I imagine once spark 4.0 is cut, the scala-spark-master
folder will be copied over to scala-spark-4.0
case (ByteType | ShortType, IntegerType) => true | ||
case _ => false | ||
} | ||
TypeWideningShims.isTypeChangeSupported(fromType, toType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my education, how is the TypeWideningShims object visible here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build script accepts an argument sparkVersion
that toggles between two different build targets, each pulling its own set of shim files:
https://github.com/delta-io/delta/blob/master/build.sbt#L163
TypeWideningShims
is declared in the same package as TypeWidening
so it's imported implicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
… 4.0 (delta-io#3024) This PR adds shims to ungate the remaining type changes that only work with Spark 4.0 / master. Spark 4.0 contains the required changes to Parquet readers to be able to read the data after applying the type changes. ## Description Extend the list of supported type changes for type widening to include changes that can be supported with Spark 4.0: - (byte, short, int) -> long - float -> double - date -> timestampNTZ - (byte, short, int) -> double - decimal -> decimal (with increased precision/scale that doesn't cause precision loss) - (byte, short, int, long) -> decimal Shims are added to support these changes when compiling against Spark 4.0/master and to only allow `byte` -> `short` - > `int` when compiling against Spark 3.5. ## How was this patch tested? Adding test cases for the new type changes in the existing type widening test suites. The list of supported / unsupported changes covered in tests differs between Spark 3.5 and Spark 4.0, shims are also provided to handle this. ## Does this PR introduce _any_ user-facing changes? Yes: allow using the listed type changes with type widening, either via `ALTER TABLE CHANGE COLUMN TYPE` or during schema evolution in MERGE and INSERT.
… 4.0 (delta-io#3024) This PR adds shims to ungate the remaining type changes that only work with Spark 4.0 / master. Spark 4.0 contains the required changes to Parquet readers to be able to read the data after applying the type changes. ## Description Extend the list of supported type changes for type widening to include changes that can be supported with Spark 4.0: - (byte, short, int) -> long - float -> double - date -> timestampNTZ - (byte, short, int) -> double - decimal -> decimal (with increased precision/scale that doesn't cause precision loss) - (byte, short, int, long) -> decimal Shims are added to support these changes when compiling against Spark 4.0/master and to only allow `byte` -> `short` - > `int` when compiling against Spark 3.5. ## How was this patch tested? Adding test cases for the new type changes in the existing type widening test suites. The list of supported / unsupported changes covered in tests differs between Spark 3.5 and Spark 4.0, shims are also provided to handle this. ## Does this PR introduce _any_ user-facing changes? Yes: allow using the listed type changes with type widening, either via `ALTER TABLE CHANGE COLUMN TYPE` or during schema evolution in MERGE and INSERT.
This PR adds shims to ungate the remaining type changes that only work with Spark 4.0 / master. Spark 4.0 contains the required changes to Parquet readers to be able to read the data after applying the type changes.
Description
Extend the list of supported type changes for type widening to include changes that can be supported with Spark 4.0:
Shims are added to support these changes when compiling against Spark 4.0/master and to only allow
byte
->short
- >int
when compiling against Spark 3.5.How was this patch tested?
Adding test cases for the new type changes in the existing type widening test suites. The list of supported / unsupported changes covered in tests differs between Spark 3.5 and Spark 4.0, shims are also provided to handle this.
Does this PR introduce any user-facing changes?
Yes: allow using the listed type changes with type widening, either via
ALTER TABLE CHANGE COLUMN TYPE
or during schema evolution in MERGE and INSERT.