You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As you can see, location column is missing statistics when using add_files.
In addition, while calling add_files I encountered this error many times (but the operation succeeded):
PyArrow statistics missing for column 1 when writing file
Example 2: Stats after using Arrow Dataframe API to load the same Parquet files (shown using Trino show stats):
Apache Iceberg version
0.8.1 (latest release)
Please describe the bug 🐞
Following this Slack thread:
Seems like column statistics are not fully collected when writing data either by using the Arrow Dataframe API or the
add_files
method.Example 1: Stats after using
add_files
method (shown using Trinoshow stats
):As you can see,
location
column is missing statistics when usingadd_files
.In addition, while calling
add_files
I encountered this error many times (but the operation succeeded):Example 2: Stats after using Arrow Dataframe API to load the same Parquet files (shown using Trino
show stats
):On the other hand, after collecting table statistics using Trino, the column statistics look more complete:
Willingness to contribute
The text was updated successfully, but these errors were encountered: