refactor: migrate to ScanV2Internal API and remove ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN config #17520
+61
−513
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
This PR removes the deprecated
ENABLE_OPTIMIZED_LOG_BLOCKS_SCANconfiguration and migrates all log scanning operations to use the ScanV2Internal API as the default implementation. The change simplifies the codebase by eliminating the dual-path scanning logic that was maintained for backward compatibility.Summary and Changelog
Users will no longer need to configure
ENABLE_OPTIMIZED_LOG_BLOCKS_SCANas the optimized log scanning is now the default behavior. This change streamlines the log reading path and removes approximately 436 lines of legacy code.Changes:
ENABLE_OPTIMIZED_LOG_BLOCKS_SCANconfiguration fromHoodieReaderConfigandHoodieCompactionConfigenableOptimizedLogBlocksScanacross log scanning componentsAbstractHoodieLogRecordScannerandBaseHoodieLogRecordReaderby removing legacy scan pathHoodieMergedLogRecordReader,HoodieMergedLogRecordScanner, andHoodieUnMergedLogRecordScannerto use ScanV2Internal exclusivelyImpact
Breaking Change: The
ENABLE_OPTIMIZED_LOG_BLOCKS_SCANconfiguration option has been removed. Users who explicitly set this configuration will need to remove it from their configurations. The newdefault behavior is equivalent to having this config enabled.
Performance: No performance impact expected as ScanV2Internal was already the recommended and optimized path. Users who had the config disabled will see performance improvements.
Risk Level
Low - The ScanV2Internal API has been available and tested for several releases. This change only removes the legacy fallback path. All existing tests pass with the new default behavior.
Documentation Update
ENABLE_OPTIMIZED_LOG_BLOCKS_SCANContributor's checklist