Skip to content

Conversation

vamsikarnika
Copy link
Contributor

@vamsikarnika vamsikarnika commented Oct 17, 2025

Describe the issue this Pull Request addresses

HoodieWriteHandle now creates a new log file version when the file scheme doesn’t support append operations. This avoids unnecessary scans of the table’s filesystem view to locate the latest log file slice for a file group.

Summary and Changelog

Improves write performance by reducing the number of hoodie table filesystem view calls to the driver.

Impact

Improves write performance by reducing the number of hoodie table filesystem view calls to the driver.

Risk Level

LOW

Documentation Update

NA

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@vamsikarnika vamsikarnika marked this pull request as draft October 17, 2025 08:21
@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Oct 17, 2025
@vamsikarnika vamsikarnika marked this pull request as ready for review October 17, 2025 12:43
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

? fileSliceOpt.get().getLatestLogFile()
: Option.empty();

// Compute the next log file version: use latest version + 1 if available and append is not supported,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do this only for v6 or below.
from v8, log file names will contain the inflight delta commit's instant time. so, every new write's first log file name created will not be present in storage at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivabalan makes sense. this change only affects the v6 or below as this is added in the else block

Copy link
Contributor

@danny0405 danny0405 Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know what this PR what to fix, the log version already set up in line 327, and why by default add the log version by 1? this would makes the file append never work.

Copy link
Contributor Author

@vamsikarnika vamsikarnika Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 I see that we're always rolling over to the next log if current log file exists when writing for the first time. I see that append to existing log file is not supported.
https://github.com/vamsikarnika/hudi/blob/d4d719d45dfc7e55870590e03a53281b246d755b/hudi-hadoop-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java#L106

@nsivabalan Originally this change was intended to get rid of the unnecessary filesystem call to fetch the latest log version. But looks like we're not making that file system call anymore. We simply increase the fileversion, and if some other writer has written same og file, writer fails to createFile and simply moves to next log version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we have optimized the rollover logic in 1.1 even for table version 6 compared to say 0.14.1.
Here is 0.14.1 rollover logic

int newVersion = FSUtils.computeNextLogVersion(fs, path.getParent(), fileId, extension, baseCommitTime);

which does fs listing.

while in latest master,

public HoodieLogFile rollOver(String logWriteToken) {

Considering this, this fix may not be required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I refactored the fs listing out in 1.x.

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs more clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants