-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Test type widening compatibility with other Delta features #3053
Conversation
cd9c3e6
to
d5ed864
Compare
d5ed864
to
b336db5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few suggestions for additional tests. Do we already have tests for constraints and generated columns. AFAIK we disallow type changes on generated columns, but constraints might be interesting as it's possible to create a constraint after changing the type.
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Show resolved
Hide resolved
7d9e6f3
to
aabad0e
Compare
There are tests already for generated columns and constraints, added as part of #2881 I added a test here to cover constraints + RESTORE which is an interesting case:
The constraint was added on type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…elta-io#3053) ## Description Additional tests covering type widening and: - Reading CDF - Column mapping - Time travel - RESTORE - CLONE ## How was this patch tested? Test only
…elta-io#3053) Additional tests covering type widening and: - Reading CDF - Column mapping - Time travel - RESTORE - CLONE Test only
…iles to rewrite (#3155) ## What changes were proposed in this pull request? The initial approach to identify files that contain a type that differs from the table schema and that must be rewritten before dropping the type widening table feature is convoluted and turns out to be more brittle than intended. This change switches instead to directly reading the file schema from the Parquet footer and rewriting all files that have a mismatching type. ### Additional Context Files are identified using their default row commit version (a part of the row tracking feature) and matched against type changes previously applied to the table and recorded in the table metadata: any file written before the latest type change should use a different type and must be rewritten. This requires multiple pieces of information to be accurately tracked: - Default row commit versions must be correctly assigned to all files. E.p. files that are copied over without modification must never be assigned a new default row commit version. On the other hand, default row commit versions are preserved across CLONE but these versions don't match anything in the new cloned table. - Type change history must be reliably recorded and preserved across schema changes, e.g. column mapping. Any bug will likely lead to files not being correctly rewritten before removing the table feature, potentially leaving the table in an unreadable state. ## How was this patch tested? Tests added in previous PR to cover CLONE and RESTORE: #3053 Tests added and updated in this PR to cover rewriting files with different column types when removing the table feature.
Description
Additional tests covering type widening and:
How was this patch tested?
Test only