Skip to content

Phase 24: Restore == comparison with HOODIE_METADATA_SCHEMA in HoodieMetadataPayload #17532

@voonhous

Description

@voonhous

Task Description

Restore .equals comparison to use == to restore performance.

Before migration in phase 5:

  • When schemas were read from log file headers, they went through AvroSchemaCache.intern(Schema)
  • AvroSchemaCache is keyed by Schema (using Schema.equals()), so even if a new Schema object was parsed, intern() would return the same cached reference as HOODIE_METADATA_SCHEMA

After migration in phase 5:

  • Schemas are parsed into HoodieSchema via HoodieSchema.parse()
  • HoodieSchemaCache.intern(HoodieSchema) caches the HoodieSchema wrapper
  • When .toAvroSchema() is called, it returns the underlying Avro Schema which was created by avroParser.parse()
  • this is a NEW object, not the same reference as HOODIE_METADATA_SCHEMA

During this transitionary PRs, we have the following mismatch:

  • AvroSchemaCache ensured Avro Schema reference reuse
  • HoodieSchemaCache ensures HoodieSchema wrapper reuse, but the wrapped Avro Schema is still a different object

The .equals() is a temporary bridge during the migration.

Once HoodieRecordPayload.getInsertValue() accepts HoodieSchema instead of Schema, the == comparison can be restored.

Details of context can be found in this comment:
#14340 (comment)

What needs to be done:
Restore .equals comparison to use == to restore performance in HoodieMetadataPayload when checking against HOODIE_METADATA_SCHEMA after 1Avro.SchematoHoodieSchemamigration forHoodieRecordPayload` .

Why this task is needed:
Restore O(1) comparison

Task Type

Code improvement/refactoring

Related Issues

Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions