[HUDI-18496] Respect configured record type when filtering index keys#18715
[HUDI-18496] Respect configured record type when filtering index keys#18715201573 wants to merge 1 commit into
Conversation
|
Gentle ping for review when convenient. Azure CI is green on the latest update. |
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR threads the configured record merger type through to index key filtering so Lance base files can use a Spark reader, while keeping AVRO as the default fallback. One potential regression worth double-checking in the inline comments around ORC/HFile base files when a Spark record merger is configured. Please take a look at any inline comments, and this should be ready for a Hudi committer or PMC member to take it from here.
| } | ||
|
|
||
| public HoodieRecord.HoodieRecordType resolveRecordType(HoodieRecord.HoodieRecordType fallback) { | ||
| return requiresSparkRecordType() ? HoodieRecord.HoodieRecordType.SPARK : fallback; |
There was a problem hiding this comment.
🤖 Could this introduce a regression for .orc (and .hfile) base files? Previously filterKeysFromFile always used the AVRO reader factory (which supports Parquet/ORC/HFile via HoodieAvroFileReaderFactory). After this change, when config.getRecordMerger().getRecordType() returns SPARK, resolveRecordTypeForExtension(".orc", SPARK) returns SPARK — and HoodieSparkFileReaderFactory.newOrcFileReader throws HoodieIOException("Not support read orc file") (same for HFile). It might be safer to special-case Lance only and keep returning AVRO for the others, or override the fallback only when requiresSparkRecordType() is true (which it already does — the concern is whether getRecordMerger().getRecordType() being SPARK + ORC base file was previously working via the AVRO path). @yihua could you confirm whether ORC bloom-index filtering with a Spark merger was previously functional?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
Describe the issue this Pull Request addresses
Closes #18496
Summary and Changelog
Impact
Keeps index key filtering consistent with the configured record merger type while preserving the existing AVRO fallback for existing callers.
Risk level
Low. The previous filterKeysFromFile signature still defaults to AVRO, and format-specific overrides are limited to the file format resolution helper.
Test Plan