Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Handle case when Checkpoints.findLastCompleteCheckpoint is passed MAX_VALUE #3105

Merged

Conversation

prakharjain09
Copy link
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Fixes an issue where Checkpoints.findLastCompleteCheckpoint goes into an almost infinite loop if it is passed a Checkpoint.MAX_VALUE.

How was this patch tested?

UT

Does this PR introduce any user-facing changes?

No

@prakharjain09 prakharjain09 force-pushed the fixFindLastCompleteCheckpointBefore-2 branch from f95b428 to 2d51f80 Compare May 17, 2024 17:29
Comment on lines +458 to +461
// If someone passes the upperBound as 0 or sentinel value, we should not do backward
// listing. Instead we should list the entire directory from 0 and return the latest
// available checkpoint.
.filterNot(cv => cv.version < 0 || cv.version == CheckpointInstance.MaxValue.version)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix and there is a corresponding test: findLastCompleteCheckpoint with CheckpointInstance.MAX value .

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

Comment on lines +149 to +150
assert(eventData1("iterations") == "1")
assert(eventData1("numFilesScanned") == "1004")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for better error messages?

Suggested change
assert(eventData1("iterations") == "1")
assert(eventData1("numFilesScanned") == "1004")
assert(eventData1("iterations") === "1")
assert(eventData1("numFilesScanned") === "1004")

(bunch of others, including existing code... maybe worth a cleanup PR some day)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency in the test suite, I am keeping these as ==. Will open a PR which converts all == to ===.

@tdas tdas merged commit 57df2c0 into delta-io:master May 20, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants