-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds SQL support for Configurable Table Snapshot History #262
Conversation
...c/test/java/com/linkedin/openhouse/spark/e2e/extensions/SetSnapshotsRetentionPolicyTest.java
Outdated
Show resolved
Hide resolved
...c/test/java/com/linkedin/openhouse/spark/e2e/extensions/SetSnapshotsRetentionPolicyTest.java
Outdated
Show resolved
Hide resolved
...c/test/java/com/linkedin/openhouse/spark/e2e/extensions/SetSnapshotsRetentionPolicyTest.java
Outdated
Show resolved
Hide resolved
...nkedin/openhouse/spark/sql/catalyst/parser/extensions/OpenhouseSqlExtensionsAstBuilder.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Will-Lo. Overall looks good. Added some comments.
a2316b4
to
6b7999a
Compare
Can we please add details regarding the SQL API contract? If both versions and max age is present what is expected for example. |
.../antlr/com/linkedin/openhouse/spark/sql/catalyst/parser/extensions/OpenhouseSqlExtensions.g4
Show resolved
Hide resolved
...nkedin/openhouse/spark/sql/catalyst/parser/extensions/OpenhouseSqlExtensionsAstBuilder.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments. But overall LGTM
...nkedin/openhouse/spark/sql/catalyst/parser/extensions/OpenhouseSqlExtensionsAstBuilder.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring the tests and answering the questions. Added minor comment, otherwise LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
6039e0b
## Summary <!--- HINT: Replace #nnn with corresponding Issue number, if you are fixing an existing issue --> [[Issue](https://github.com/linkedin/openhouse/issues/#nnn)] Briefly discuss the summary of the changes made in this pull request in 2-3 lines. Following up from #262 and #259 this PR adds support for snapshot expiration table maintenance job to use the history policies defined. Most notably snapshot expiration will follow the settings of `maxAge`, `granularity`, and `versions` as follows: 1. If maxAge is provided, it will expire snapshots older than maxAge in granularity timeunit. 2. If versions is provided, it will retain the last versions snapshots regardless of their age. 3. If both are provided, it will prioritize maxAge; only retain up to versions number of snapshots younger than the maxAge. This is done by pruning the snapshots older than maxAge, and then running a second expiration to keeping N versions after that. Note: If versions are defined and there are less than N versions in the history, then there were not enough commits (within that timespan if defined). Snapshot expiration will always keep at least 1 version. The default behavior of snapshot expiration job will remain the same, keep snapshots within the last 3 days. ## Changes - [ ] Client-facing API Changes - [ ] Internal API Changes - [ ] Bug Fixes - [ ] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [ ] Tests For all the boxes checked, please include additional details of the changes made in this pull request. ## Testing Done <!--- Check any relevant boxes with "x" --> - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [x] Added new tests for the changes made. - [x] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request. # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [ ] Large PR broken into smaller PRs, and PR plan linked in the description. For all the boxes checked, include additional details of the changes made in this pull request.
Summary
Adds Spark SQL support for configurable table snapshots, which controls the versioning of the Openhouse tables.
Syntax is similar to retention but is instead defined as
HISTORY
.History configuration supports both
MAX_AGE
andVERSIONS
, where we retain all table snapshots that live withinMAX_AGE
and withinVERSIONS
.Example: A table with
MAX_AGE = 1d
will retain all snapshots that are within 1 day of when the snapshot retention job last ran.A table with
VERSIONS = 5
will retain the last 5 snapshots of the table without considering the age of the snapshotsIf both
MAX_AGE = 1d
andVERSIONS = 5
is defined, keep the last 5 snapshots within the last day. Note: If there are less than 5 snapshots, then there were less than 5 commits done in the past day.MAX_AGE
andVERSIONS
cannot be defined as less than 1.The default maximums of
MAX_AGE
andVERSIONS
defined in #259 are 3 days and 100 versions respectively.Examples:
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
Tested on local docker running spark:
Tested setting both policies
Setting only versions:
Setting only max age
Also tested negative cases (invalid numbers, past maximums defined in #259)
e.g.
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.