Skip to content

Core: Make totalRecordCount optional in PartitionStats#12226

Merged
nastra merged 7 commits intoapache:mainfrom
ajantha-bhat:optional
Mar 19, 2025
Merged

Core: Make totalRecordCount optional in PartitionStats#12226
nastra merged 7 commits intoapache:mainfrom
ajantha-bhat:optional

Conversation

@ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Feb 11, 2025

Spec was already optional for totalRecordCount.
https://iceberg.apache.org/spec/#partition-statistics-file

During the implementation, we decided to make all the counters to be initialized to zero.
Recently, we concluded that total record count counter can still be optional.

Based on the discussions from #12098 (comment)
and https://lists.apache.org/thread/1dzcv7989q0hchjytyb8g219wo7nc1xm

Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me:

  • I see the change to Long instead of long and null "support
  • I also see the deprecation of the long totalRecordCount() method

@jbonofre
Copy link
Member

jbonofre commented Mar 6, 2025

@aokolnychyi @Fokko wdyt ?

@ajantha-bhat
Copy link
Member Author

We have already finalized on this change by discussing on #12098 (comment)
and https://lists.apache.org/thread/1dzcv7989q0hchjytyb8g219wo7nc1xm

But looks like Anton is not active. Anyone else can help in merging this?

cc: @RussellSpitzer, @nastra

@ajantha-bhat
Copy link
Member Author

@nastra, @pvary: I need your help to merge this. I am planning for 1.9.0 release this week. Will start a mailing list discussion tomorrow.

* @deprecated since 1.9.0, will be removed in 1.10.0, use {@link #totalRecordCountOptional()}
* instead.
*/
@Deprecated
Copy link
Contributor

@nastra nastra Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to deprecate the existing method and introduce a new one to handle this properly? I'm not a fan of having a method with optional in the name. Also what if the field itself becomes a Long but the method's return type stays the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe no one using it as the writer code is not present in previous version.

But just following the guidelines as it is a public interface and introduced in previous version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method's return type stays the same

We wanted to return null when not computed as per the discussion. We can't do that if the return type is primitive.

&& stats1.equalityDeleteRecordCount() == stats2.equalityDeleteRecordCount()
&& stats1.equalityDeleteFileCount() == stats2.equalityDeleteFileCount()
&& stats1.totalRecordCount() == stats2.totalRecordCount()
&& Objects.equals(stats1.totalRecords(), stats2.totalRecords())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not directly related to this PR, but why can't we handle equals/hashcode in PartitionStats directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartitionStats has StructLike member for partition, it needs a comparator for equals(). But storing comparator for each PartitionStats is not efficient. Plus we also want to keep the members of this class same as partition stats spec.

It was discussed in one of the old PR.

@nastra nastra merged commit cf98065 into apache:main Mar 19, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants