You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: format/spec.md
+25-19Lines changed: 25 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -989,11 +989,11 @@ Partition statistics file must be registered in the table metadata file to be co
989
989
990
990
`partition-statistics` field of table metadata is an optional list of structs with the following fields:
991
991
992
-
| v1 | v2 | Field name | Type | Description |
993
-
|----|----|------------|------|-------------|
994
-
|_required_|_required_|**`snapshot-id`**|`long`| ID of the Iceberg table's snapshot the partition statistics file is associated with. |
995
-
|_required_|_required_|**`statistics-path`**|`string`| Path of the partition statistics file. See [Partition statistics file](#partition-statistics-file). |
996
-
|_required_|_required_|**`file-size-in-bytes`**|`long`| Size of the partition statistics file. |
|_required_|_required_|_required_|**`snapshot-id`**|`long`| ID of the Iceberg table's snapshot the partition statistics file is associated with. |
995
+
|_required_|_required_|_required_|**`statistics-path`**|`string`| Path of the partition statistics file. See [Partition statistics file](#partition-statistics-file). |
996
+
|_required_|_required_|_required_|**`file-size-in-bytes`**|`long`| Size of the partition statistics file. |
997
997
998
998
##### Partition Statistics File
999
999
@@ -1002,20 +1002,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f
1002
1002
1003
1003
The schema of the partition statistics file is as follows:
1004
1004
1005
-
| v1 | v2 | Field id, name | Type | Description |
1006
-
|----|----|----------------|------|-------------|
1007
-
|_required_|_required_|**`1 partition`**|`struct<..>`| Partition data tuple, schema based on the unified partition type considering all specs in a table |
1008
-
|_required_|_required_|**`2 spec_id`**|`int`| Partition spec id |
1009
-
|_required_|_required_|**`3 data_record_count`**|`long`| Count of records in data files |
1010
-
|_required_|_required_|**`4 data_file_count`**|`int`| Count of data files |
1011
-
|_required_|_required_|**`5 total_data_file_size_in_bytes`**|`long`| Total size of data files in bytes |
1012
-
|_optional_|_optional_|**`6 position_delete_record_count`**|`long`| Count of records in position delete files |
1013
-
|_optional_|_optional_|**`7 position_delete_file_count`**|`int`| Count of position delete files |
1014
-
|_optional_|_optional_|**`8 equality_delete_record_count`**|`long`| Count of records in equality delete files |
1015
-
|_optional_|_optional_|**`9 equality_delete_file_count`**|`int`| Count of equality delete files |
1016
-
|_optional_|_optional_|**`10 total_record_count`**|`long`| Accurate count of records in a partition after applying the delete files if any |
1017
-
|_optional_|_optional_|**`11 last_updated_at`**|`long`| Timestamp in milliseconds from the unix epoch when the partition was last updated |
1018
-
|_optional_|_optional_|**`12 last_updated_snapshot_id`**|`long`| ID of snapshot that last updated this partition |
1005
+
| v1 | v2 | v3 | Field id, name | Type | Description |
|_required_|_required_|_required_|**`1 partition`**|`struct<..>`| Partition data tuple, schema based on the unified partition type considering all specs in a table |
1008
+
|_required_|_required_|_required_|**`2 spec_id`**|`int`| Partition spec id |
1009
+
|_required_|_required_|_required_|**`3 data_record_count`**|`long`| Count of records in data files |
1010
+
|_required_|_required_|_required_|**`4 data_file_count`**|`int`| Count of data files |
1011
+
|_required_|_required_|_required_|**`5 total_data_file_size_in_bytes`**|`long`| Total size of data files in bytes |
1012
+
|_optional_|_optional_|_required_|**`6 position_delete_record_count`**|`long`| Count of position deletes across position delete files and deletion vectors |
1013
+
|_optional_|_optional_|_required_|**`7 position_delete_file_count`**|`int`| Count of position delete files ignoring deletion vectors |
1014
+
|||_required_|**`13 dv_count`**|`int`| Count of deletion vectors |
1015
+
|_optional_|_optional_|_required_|**`8 equality_delete_record_count`**|`long`| Count of records in equality delete files |
1016
+
|_optional_|_optional_|_required_|**`9 equality_delete_file_count`**|`int`| Count of equality delete files |
1017
+
|_optional_|_optional_|_optional_|**`10 total_record_count`**|`long`| Accurate count of records in a partition after applying deletes if any |
1018
+
|_optional_|_optional_|_optional_|**`11 last_updated_at`**|`long`| Timestamp in milliseconds from the unix epoch when the partition was last updated |
1019
+
|_optional_|_optional_|_optional_|**`12 last_updated_snapshot_id`**|`long`| ID of snapshot that last updated this partition |
1019
1020
1020
1021
Note that partition data tuple's schema is based on the partition spec output using partition field ids for the struct field ids.
1021
1022
The unified partition type is a struct containing all fields that have ever been a part of any spec in the table
@@ -1032,6 +1033,11 @@ The unified partition type looks like `Struct<field#1, field#2, field#3>`.
1032
1033
and then the table has evolved into `spec#1` which has just one field `{field#2}`.
1033
1034
The unified partition type looks like `Struct<field#1, field#2>`.
1034
1035
1036
+
When a v2 table is upgraded to v3 or later, the `position_delete_record_count` field must account for all position deletes, including those from remaining v2 position delete files and any deletion vectors added after the upgrade.
1037
+
1038
+
Calculating `total_record_count` for a table with equality deletes or v2 position delete files requires reading data. In such cases, implementations may omit this field and must write `NULL`, indicating that the exact record count in a partition is unknown.
1039
+
If a table has no deletes or only deletion vectors, implementations are encouraged to populate `total_record_count` using metadata in manifests.
1040
+
1035
1041
#### Encryption Keys
1036
1042
1037
1043
Keys used for table encryption can be tracked in table metadata as a list named `encryption-keys`. The schema of each key is a struct with the following fields:
0 commit comments