Core: Make totalRecordCount optional in PartitionStats#12226
Core: Make totalRecordCount optional in PartitionStats#12226nastra merged 7 commits intoapache:mainfrom
Conversation
37d1917 to
1c356a6
Compare
jbonofre
left a comment
There was a problem hiding this comment.
It looks good to me:
- I see the change to
Longinstead oflongandnull"support - I also see the deprecation of the
long totalRecordCount()method
|
@aokolnychyi @Fokko wdyt ? |
|
We have already finalized on this change by discussing on #12098 (comment) But looks like Anton is not active. Anyone else can help in merging this? cc: @RussellSpitzer, @nastra |
| * @deprecated since 1.9.0, will be removed in 1.10.0, use {@link #totalRecordCountOptional()} | ||
| * instead. | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
do we really need to deprecate the existing method and introduce a new one to handle this properly? I'm not a fan of having a method with optional in the name. Also what if the field itself becomes a Long but the method's return type stays the same?
There was a problem hiding this comment.
Maybe no one using it as the writer code is not present in previous version.
But just following the guidelines as it is a public interface and introduced in previous version.
There was a problem hiding this comment.
method's return type stays the same
We wanted to return null when not computed as per the discussion. We can't do that if the return type is primitive.
| && stats1.equalityDeleteRecordCount() == stats2.equalityDeleteRecordCount() | ||
| && stats1.equalityDeleteFileCount() == stats2.equalityDeleteFileCount() | ||
| && stats1.totalRecordCount() == stats2.totalRecordCount() | ||
| && Objects.equals(stats1.totalRecords(), stats2.totalRecords()) |
There was a problem hiding this comment.
not directly related to this PR, but why can't we handle equals/hashcode in PartitionStats directly?
There was a problem hiding this comment.
PartitionStats has StructLike member for partition, it needs a comparator for equals(). But storing comparator for each PartitionStats is not efficient. Plus we also want to keep the members of this class same as partition stats spec.
It was discussed in one of the old PR.
Spec was already optional for totalRecordCount.
https://iceberg.apache.org/spec/#partition-statistics-file
During the implementation, we decided to make all the counters to be initialized to zero.
Recently, we concluded that total record count counter can still be optional.
Based on the discussions from #12098 (comment)
and https://lists.apache.org/thread/1dzcv7989q0hchjytyb8g219wo7nc1xm