-
Notifications
You must be signed in to change notification settings - Fork 2.4k
fix: File listing fix for Append handle on log file creation #14112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: File listing fix for Append handle on log file creation #14112
Conversation
? fileSliceOpt.get().getLatestLogFile() | ||
: Option.empty(); | ||
|
||
// Compute the next log file version: use latest version + 1 if available and append is not supported, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets do this only for v6 or below.
from v8, log file names will contain the inflight delta commit's instant time. so, every new write's first log file name created will not be present in storage at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsivabalan makes sense. this change only affects the v6 or below as this is added in the else block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't know what this PR what to fix, the log version already set up in line 327, and why by default add the log version by 1? this would makes the file append never work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danny0405 I see that we're always rolling over to the next log if current log file exists when writing for the first time. I see that append to existing log file is not supported.
https://github.com/vamsikarnika/hudi/blob/d4d719d45dfc7e55870590e03a53281b246d755b/hudi-hadoop-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java#L106
@nsivabalan Originally this change was intended to get rid of the unnecessary filesystem call to fetch the latest log version. But looks like we're not making that file system call anymore. We simply increase the fileversion, and if some other writer has written same og file, writer fails to createFile and simply moves to next log version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we have optimized the rollover
logic in 1.1 even for table version 6 compared to say 0.14.1.
Here is 0.14.1 rollover logic
int newVersion = FSUtils.computeNextLogVersion(fs, path.getParent(), fileId, extension, baseCommitTime); |
which does fs listing.
while in latest master,
public HoodieLogFile rollOver(String logWriteToken) { |
Considering this, this fix may not be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I refactored the fs listing out in 1.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs more clarification.
Describe the issue this Pull Request addresses
HoodieWriteHandle now creates a new log file version when the file scheme doesn’t support append operations. This avoids unnecessary scans of the table’s filesystem view to locate the latest log file slice for a file group.
Summary and Changelog
Improves write performance by reducing the number of hoodie table filesystem view calls to the driver.
Impact
Improves write performance by reducing the number of hoodie table filesystem view calls to the driver.
Risk Level
LOW
Documentation Update
NA
Contributor's checklist