Describe the problem
In HoodieIndexUtils.filterKeysFromFile the record type used to open the base file reader for key filtering is computed as:
HoodieRecord.HoodieRecordType recordType =
HoodieFileFormat.fromFileExtension(FSUtils.getFileExtension(filePath.toString()))
.resolveRecordType(HoodieRecord.HoodieRecordType.AVRO);
resolveRecordType only overrides the fallback when the file format requires a specific record type (currently only Lance). For all other formats (Parquet, ORC, HFile) the AVRO fallback is hardcoded and ignores the merger's configured record type. This predates the Lance work (#18375).
Impact
If a user configures a SPARK-type merger for a Parquet table, key filtering during index lookup will still open an AVRO reader, incurring unnecessary Avro<->InternalRow conversions in the key-filter code path.
This is not a correctness bug — filterRowKeys only reads record keys — but it is an efficiency/consistency gap flagged during review of #18375 (cf. review thread).
Proposed fix
Thread the configured merger's record type into filterKeysFromFile (or read it from config via an argument) and pass it as the fallback to resolveRecordType instead of hardcoding AVRO.
Context
Describe the problem
In
HoodieIndexUtils.filterKeysFromFilethe record type used to open the base file reader for key filtering is computed as:resolveRecordTypeonly overrides the fallback when the file format requires a specific record type (currently only Lance). For all other formats (Parquet, ORC, HFile) the AVRO fallback is hardcoded and ignores the merger's configured record type. This predates the Lance work (#18375).Impact
If a user configures a SPARK-type merger for a Parquet table, key filtering during index lookup will still open an AVRO reader, incurring unnecessary Avro<->InternalRow conversions in the key-filter code path.
This is not a correctness bug —
filterRowKeysonly reads record keys — but it is an efficiency/consistency gap flagged during review of #18375 (cf. review thread).Proposed fix
Thread the configured merger's record type into
filterKeysFromFile(or read it from config via an argument) and pass it as the fallback toresolveRecordTypeinstead of hardcodingAVRO.Context
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.javaaround line 254 (the// TODO: AVRO fallbackcomment).