-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Task Description
Restore .equals comparison to use == to restore performance.
Before migration in phase 5:
- When schemas were read from log file headers, they went through AvroSchemaCache.intern(Schema)
- AvroSchemaCache is keyed by Schema (using Schema.equals()), so even if a new Schema object was parsed, intern() would return the same cached reference as HOODIE_METADATA_SCHEMA
After migration in phase 5:
- Schemas are parsed into HoodieSchema via HoodieSchema.parse()
- HoodieSchemaCache.intern(HoodieSchema) caches the HoodieSchema wrapper
- When
.toAvroSchema()is called, it returns the underlying Avro Schema which was created by avroParser.parse() - this is a NEW object, not the same reference as HOODIE_METADATA_SCHEMA
During this transitionary PRs, we have the following mismatch:
- AvroSchemaCache ensured Avro Schema reference reuse
- HoodieSchemaCache ensures HoodieSchema wrapper reuse, but the wrapped Avro Schema is still a different object
The .equals() is a temporary bridge during the migration.
Once HoodieRecordPayload.getInsertValue() accepts HoodieSchema instead of Schema, the == comparison can be restored.
Details of context can be found in this comment:
#14340 (comment)
What needs to be done:
Restore .equals comparison to use == to restore performance in HoodieMetadataPayload when checking against HOODIE_METADATA_SCHEMA after 1Avro.SchematoHoodieSchemamigration forHoodieRecordPayload` .
Why this task is needed:
Restore O(1) comparison
Task Type
Code improvement/refactoring
Related Issues
Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.