Skip to content

Commit a214e9b

Browse files
authored
Spec: Update partition stats for V3 (#12098)
1 parent 7f3f450 commit a214e9b

File tree

1 file changed

+25
-19
lines changed

1 file changed

+25
-19
lines changed

format/spec.md

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -989,11 +989,11 @@ Partition statistics file must be registered in the table metadata file to be co
989989

990990
`partition-statistics` field of table metadata is an optional list of structs with the following fields:
991991

992-
| v1 | v2 | Field name | Type | Description |
993-
|----|----|------------|------|-------------|
994-
| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. |
995-
| _required_ | _required_ | **`statistics-path`** | `string` | Path of the partition statistics file. See [Partition statistics file](#partition-statistics-file). |
996-
| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the partition statistics file. |
992+
| v1 | v2 | v3 | Field name | Type | Description |
993+
|----|----|----|------------|------|-------------|
994+
| _required_ | _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. |
995+
| _required_ | _required_ | _required_ | **`statistics-path`** | `string` | Path of the partition statistics file. See [Partition statistics file](#partition-statistics-file). |
996+
| _required_ | _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the partition statistics file. |
997997

998998
##### Partition Statistics File
999999

@@ -1002,20 +1002,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f
10021002

10031003
The schema of the partition statistics file is as follows:
10041004

1005-
| v1 | v2 | Field id, name | Type | Description |
1006-
|----|----|----------------|------|-------------|
1007-
| _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table |
1008-
| _required_ | _required_ | **`2 spec_id`** | `int` | Partition spec id |
1009-
| _required_ | _required_ | **`3 data_record_count`** | `long` | Count of records in data files |
1010-
| _required_ | _required_ | **`4 data_file_count`** | `int` | Count of data files |
1011-
| _required_ | _required_ | **`5 total_data_file_size_in_bytes`** | `long` | Total size of data files in bytes |
1012-
| _optional_ | _optional_ | **`6 position_delete_record_count`** | `long` | Count of records in position delete files |
1013-
| _optional_ | _optional_ | **`7 position_delete_file_count`** | `int` | Count of position delete files |
1014-
| _optional_ | _optional_ | **`8 equality_delete_record_count`** | `long` | Count of records in equality delete files |
1015-
| _optional_ | _optional_ | **`9 equality_delete_file_count`** | `int` | Count of equality delete files |
1016-
| _optional_ | _optional_ | **`10 total_record_count`** | `long` | Accurate count of records in a partition after applying the delete files if any |
1017-
| _optional_ | _optional_ | **`11 last_updated_at`** | `long` | Timestamp in milliseconds from the unix epoch when the partition was last updated |
1018-
| _optional_ | _optional_ | **`12 last_updated_snapshot_id`** | `long` | ID of snapshot that last updated this partition |
1005+
| v1 | v2 | v3 | Field id, name | Type | Description |
1006+
|----|----|----|----------------|------|-------------|
1007+
| _required_ | _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table |
1008+
| _required_ | _required_ | _required_ | **`2 spec_id`** | `int` | Partition spec id |
1009+
| _required_ | _required_ | _required_ | **`3 data_record_count`** | `long` | Count of records in data files |
1010+
| _required_ | _required_ | _required_ | **`4 data_file_count`** | `int` | Count of data files |
1011+
| _required_ | _required_ | _required_ | **`5 total_data_file_size_in_bytes`** | `long` | Total size of data files in bytes |
1012+
| _optional_ | _optional_ | _required_ | **`6 position_delete_record_count`** | `long` | Count of position deletes across position delete files and deletion vectors |
1013+
| _optional_ | _optional_ | _required_ | **`7 position_delete_file_count`** | `int` | Count of position delete files ignoring deletion vectors |
1014+
| | | _required_ | **`13 dv_count`** | `int` | Count of deletion vectors |
1015+
| _optional_ | _optional_ | _required_ | **`8 equality_delete_record_count`** | `long` | Count of records in equality delete files |
1016+
| _optional_ | _optional_ | _required_ | **`9 equality_delete_file_count`** | `int` | Count of equality delete files |
1017+
| _optional_ | _optional_ | _optional_ | **`10 total_record_count`** | `long` | Accurate count of records in a partition after applying deletes if any |
1018+
| _optional_ | _optional_ | _optional_ | **`11 last_updated_at`** | `long` | Timestamp in milliseconds from the unix epoch when the partition was last updated |
1019+
| _optional_ | _optional_ | _optional_ | **`12 last_updated_snapshot_id`** | `long` | ID of snapshot that last updated this partition |
10191020

10201021
Note that partition data tuple's schema is based on the partition spec output using partition field ids for the struct field ids.
10211022
The unified partition type is a struct containing all fields that have ever been a part of any spec in the table
@@ -1032,6 +1033,11 @@ The unified partition type looks like `Struct<field#1, field#2, field#3>`.
10321033
and then the table has evolved into `spec#1` which has just one field `{field#2}`.
10331034
The unified partition type looks like `Struct<field#1, field#2>`.
10341035

1036+
When a v2 table is upgraded to v3 or later, the `position_delete_record_count` field must account for all position deletes, including those from remaining v2 position delete files and any deletion vectors added after the upgrade.
1037+
1038+
Calculating `total_record_count` for a table with equality deletes or v2 position delete files requires reading data. In such cases, implementations may omit this field and must write `NULL`, indicating that the exact record count in a partition is unknown.
1039+
If a table has no deletes or only deletion vectors, implementations are encouraged to populate `total_record_count` using metadata in manifests.
1040+
10351041
#### Encryption Keys
10361042

10371043
Keys used for table encryption can be tracked in table metadata as a list named `encryption-keys`. The schema of each key is a struct with the following fields:

0 commit comments

Comments
 (0)