You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`write.parquet.compression-codec`|`{uncompressed,zstd,gzip,snappy}`| zstd | Sets the Parquet compression coddec.|
60
-
|`write.parquet.compression-level`| Integer | null | Parquet compression level for the codec. If not set, it is up to PyIceberg|
61
-
|`write.parquet.row-group-limit`| Number of rows | 1048576 | The upper bound of the number of entries within a single row group|
62
-
|`write.parquet.page-size-bytes`| Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk|
63
-
|`write.parquet.page-row-limit`| Number of rows | 20000 | Set a target threshold for the maximum number of rows within a column chunk|
64
-
|`write.parquet.dict-size-bytes`| Size in bytes | 2MB | Set the dictionary page size limit per row group|
65
-
|`write.metadata.previous-versions-max`| Integer | 100 | The max number of previous version metadata files to keep before deleting after commit.|
66
-
|`write.object-storage.enabled`| Boolean | True | Enables the [`ObjectStoreLocationProvider`](configuration.md#object-store-location-provider) that adds a hash component to file paths. Note: the default value of `True` differs from Iceberg's Java implementation |
67
-
|`write.object-storage.partitioned-paths`| Boolean | True | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled|
68
-
|`write.py-location-provider.impl`| String of form `module.ClassName`| null | Optional, [custom `LocationProvider`](configuration.md#loading-a-custom-location-provider) implementation|
69
-
|`write.data.path`| String pointing to location |∅ | Sets the location where to write the data. If not set, it will use the table location postfixed with `data/`. |
|`write.parquet.compression-codec`|`{uncompressed,zstd,gzip,snappy}`| zstd | Sets the Parquet compression coddec.|
60
+
|`write.parquet.compression-level`| Integer | null | Parquet compression level for the codec. If not set, it is up to PyIceberg|
61
+
|`write.parquet.row-group-limit`| Number of rows | 1048576 | The upper bound of the number of entries within a single row group|
62
+
|`write.parquet.page-size-bytes`| Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk|
63
+
|`write.parquet.page-row-limit`| Number of rows | 20000 | Set a target threshold for the maximum number of rows within a column chunk|
64
+
|`write.parquet.dict-size-bytes`| Size in bytes | 2MB | Set the dictionary page size limit per row group|
65
+
|`write.metadata.previous-versions-max`| Integer | 100 | The max number of previous version metadata files to keep before deleting after commit.|
66
+
|`write.object-storage.enabled`| Boolean | True | Enables the [`ObjectStoreLocationProvider`](configuration.md#object-store-location-provider) that adds a hash component to file paths. Note: the default value of `True` differs from Iceberg's Java implementation |
67
+
|`write.object-storage.partitioned-paths`| Boolean | True | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled |
68
+
|`write.py-location-provider.impl`| String of form `module.ClassName`| null | Optional, [custom `LocationProvider`](configuration.md#loading-a-custom-location-provider) implementation |
69
+
|`write.data.path`| String pointing to location |`{metadata.location}/data`| Sets the location under which data is written. |
70
70
71
71
### Table behavior options
72
72
@@ -211,8 +211,8 @@ file paths that are optimized for object storage.
211
211
212
212
### Simple Location Provider
213
213
214
-
The `SimpleLocationProvider`places a table's file names underneath a `data` directory in the table's base storage
215
-
location (this is `table.metadata.location` - see the [Iceberg table specification](https://iceberg.apache.org/spec/#table-metadata)).
214
+
The `SimpleLocationProvider`provides paths prefixed by `{location}/data/`, where `location` comes from the [table metadata](https://iceberg.apache.org/spec/#table-metadata-fields). This can be overridden by setting [`write.data.path` table configuration](#write-options).
215
+
216
216
For example, a non-partitioned table might have a data file with location:
217
217
218
218
```txt
@@ -240,9 +240,9 @@ When several files are stored under the same prefix, cloud object stores such as
240
240
resulting in slowdowns. The `ObjectStoreLocationProvider` counteracts this by injecting deterministic hashes, in the form of binary directories,
241
241
into file paths, to distribute files across a larger number of object store prefixes.
242
242
243
-
Paths still contain partitions just before the file name, in Hive-style, and a `data` directory beneath the table's location,
244
-
in a similar manner to the [`SimpleLocationProvider`](configuration.md#simple-location-provider). For example, a table
245
-
partitioned over a string column `category` might have a data file with location: (note the additional binary directories)
243
+
Paths still are also prefixed by `{location}/data/`, where `location` comes from the [table metadata](https://iceberg.apache.org/spec/#table-metadata-fields), in a similar manner to the [`SimpleLocationProvider`](configuration.md#simple-location-provider). This can be overridden by setting [`write.data.path`table configuration](#write-options).
244
+
245
+
For example, a table partitioned over a string column `category` might have a data file with location: (note the additional binary directories)
0 commit comments