Skip to content

HoodieIndexUtils.filterKeysFromFile hardcodes AVRO fallback, ignoring configured record merger type #18496

@wombatu-kun

Description

@wombatu-kun

Describe the problem

In HoodieIndexUtils.filterKeysFromFile the record type used to open the base file reader for key filtering is computed as:

HoodieRecord.HoodieRecordType recordType =
    HoodieFileFormat.fromFileExtension(FSUtils.getFileExtension(filePath.toString()))
        .resolveRecordType(HoodieRecord.HoodieRecordType.AVRO);

resolveRecordType only overrides the fallback when the file format requires a specific record type (currently only Lance). For all other formats (Parquet, ORC, HFile) the AVRO fallback is hardcoded and ignores the merger's configured record type. This predates the Lance work (#18375).

Impact

If a user configures a SPARK-type merger for a Parquet table, key filtering during index lookup will still open an AVRO reader, incurring unnecessary Avro<->InternalRow conversions in the key-filter code path.

This is not a correctness bug — filterRowKeys only reads record keys — but it is an efficiency/consistency gap flagged during review of #18375 (cf. review thread).

Proposed fix

Thread the configured merger's record type into filterKeysFromFile (or read it from config via an argument) and pass it as the fallback to resolveRecordType instead of hardcoding AVRO.

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions