Skip to content

Conversation

yihua
Copy link
Contributor

@yihua yihua commented Oct 13, 2025

Describe the issue this Pull Request addresses

This PR addresses the inefficient memory usage during the serialization of log records to log blocks.

Summary and Changelog

  • Sort the records based on record key for HFile file format.
  • HFileUtils#serializeRecordsToLogBlock now assumes that the input records are sorted based on the record key. In this case, the sorted record map sortedRecordsMap is no longer needed, to avoid additional memory usage when writing HFile log block.

Impact

Reduces memory usage of writing HFile log block

Risk Level

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Oct 13, 2025
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a minor comment

@nsivabalan
Copy link
Contributor

hey @yihua : did you address the comment?

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed my own feedback. pushed out a commit.

@yihua yihua marked this pull request as ready for review October 20, 2025 04:32
@nsivabalan nsivabalan force-pushed the perf-improve-hfile-log-writing branch from f7a717d to ee89a96 Compare October 20, 2025 05:20
@nsivabalan nsivabalan force-pushed the perf-improve-hfile-log-writing branch from ee89a96 to f7bc7f7 Compare October 20, 2025 05:21
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants