Move file system migration guides

mosabua · mosabua · commit 69e3836cf7f7 · 2025-02-04T11:48:38.000-08:00
We are removing the docs for the legacy systems but for now want to keep
the migration guides to allow people looking at the current docs to still be
able to read how to upgrade. We will remove these docs in the future, potentially
as part of the removal of the code, or even later.
diff --git a/docs/src/main/sphinx/object-storage/file-system-azure.md b/docs/src/main/sphinx/object-storage/file-system-azure.md
@@ -117,3 +117,52 @@ storage accounts:
  use the **Client ID**, **Secret** and **Tenant ID** values from the
  application registration, to configure the catalog using properties from
  [](azure-oauth-authentication).
+
+
+(fs-legacy-azure-migration)=
+## Migration from legacy Azure Storage file system
+
+Trino includes legacy Azure Storage support to use with a catalog using the
+Delta Lake, Hive, Hudi, or Iceberg connectors. Upgrading existing deployments to
+the current native implementation is recommended. Legacy support is deprecated
+and will be removed.
+
+To migrate a catalog to use the native file system implementation for Azure,
+make the following edits to your catalog configuration:
+
+1. Add the `fs.native-azure.enabled=true` catalog configuration property.
+2. Configure the `azure.auth-type` catalog configuration property.
+3. Refer to the following table to rename your existing legacy catalog
+   configuration properties to the corresponding native configuration
+   properties. Supported configuration values are identical unless otherwise
+   noted.
+
+  :::{list-table}
+  :widths: 35, 35, 65
+  :header-rows: 1
+   * - Legacy property
+     - Native property
+     - Notes
+   * - `hive.azure.abfs-access-key`
+     - `azure.access-key`
+     -
+   * - `hive.azure.abfs.oauth.endpoint`
+     - `azure.oauth.endpoint`
+     - Also see `azure.oauth.tenant-id` in [](azure-oauth-authentication).
+   * - `hive.azure.abfs.oauth.client-id`
+     - `azure.oauth.client-id`
+     -
+   * - `hive.azure.abfs.oauth.secret`
+     - `azure.oauth.secret`
+     -
+   * - `hive.azure.abfs.oauth2.passthrough`
+     - `azure.use-oauth-passthrough-token`
+     -
+  :::
+
+4. Remove the following legacy configuration properties if they exist in your
+   catalog configuration:
+
+      * `hive.azure.abfs-storage-account`
+      * `hive.azure.wasb-access-key`
+      * `hive.azure.wasb-storage-account`
diff --git a/docs/src/main/sphinx/object-storage/file-system-gcs.md b/docs/src/main/sphinx/object-storage/file-system-gcs.md
@@ -78,3 +78,34 @@ Cloud Storage:
   - Path to the JSON file on each node that contains your Google Cloud Platform
     service account key. Not to be set together with `gcs.json-key`.
 :::
+
+(fs-legacy-gcs-migration)=
+## Migration from legacy Google Cloud Storage file system
+
+Trino includes legacy Google Cloud Storage support to use with a catalog using
+the Delta Lake, Hive, Hudi, or Iceberg connectors. Upgrading existing
+deployments to the current native implementation is recommended. Legacy support
+is deprecated and will be removed.
+
+To migrate a catalog to use the native file system implementation for Google
+Cloud Storage, make the following edits to your catalog configuration:
+
+1. Add the `fs.native-gcs.enabled=true` catalog configuration property.
+2. Refer to the following table to rename your existing legacy catalog
+   configuration properties to the corresponding native configuration
+   properties. Supported configuration values are identical unless otherwise
+   noted.
+
+  :::{list-table}
+  :widths: 35, 35, 65
+  :header-rows: 1
+   * - Legacy property
+     - Native property
+     - Notes
+   * - `hive.gcs.use-access-token`
+     - `gcs.use-access-token`
+     -
+   * - `hive.gcs.json-key-file-path`
+     - `gcs.json-key-file-path`
+     - Also see `gcs.json-key` in preceding sections
+  :::
diff --git a/docs/src/main/sphinx/object-storage/file-system-s3.md b/docs/src/main/sphinx/object-storage/file-system-s3.md
@@ -277,3 +277,130 @@ Example JSON configuration:
     are converted to a colon.
     Choose a value not used in any of your IAM ARNs.
 :::
+
+
+(fs-legacy-s3-migration)=
+## Migration from legacy S3 file system
+
+Trino includes legacy Amazon S3 support to use with a catalog using the Delta
+Lake, Hive, Hudi, or Iceberg connectors. Upgrading existing deployments to the
+current native implementation is recommended. Legacy support is deprecated and
+will be removed.
+
+To migrate a catalog to use the native file system implementation for S3, make
+the following edits to your catalog configuration:
+
+1. Add the `fs.native-s3.enabled=true` catalog configuration property.
+2. Refer to the following table to rename your existing legacy catalog
+   configuration properties to the corresponding native configuration
+   properties. Supported configuration values are identical unless otherwise
+   noted.
+
+  :::{list-table}
+  :widths: 35, 35, 65
+  :header-rows: 1
+   * - Legacy property
+     - Native property
+     - Notes
+   * - `hive.s3.aws-access-key`
+     - `s3.aws-access-key`
+     -
+   * - `hive.s3.aws-secret-key`
+     - `s3.aws-secret-key`
+     -
+   * - `hive.s3.iam-role`
+     - `s3.iam-role`
+     - Also see `s3.role-session-name` in preceding sections
+       for more role configuration options.
+   * - `hive.s3.external-id`
+     - `s3.external-id`
+     -
+   * - `hive.s3.endpoint`
+     - `s3.endpoint`
+     - Add the `https://` prefix to make the value a correct URL.
+   * - `hive.s3.region`
+     - `s3.region`
+     -
+   * - `hive.s3.sse.enabled`
+     - None
+     - `s3.sse.type` set to the default value of `NONE` is equivalent to
+       `hive.s3.sse.enabled=false`.
+   * - `hive.s3.sse.type`
+     - `s3.sse.type`
+     -
+   * - `hive.s3.sse.kms-key-id`
+     - `s3.sse.kms-key-id`
+     -
+   * - `hive.s3.upload-acl-type`
+     - `s3.canned-acl`
+     - See preceding sections for supported values.
+   * - `hive.s3.streaming.part-size`
+     - `s3.streaming.part-size`
+     -
+   * - `hive.s3.proxy.host`, `hive.s3.proxy.port`
+     - `s3.http-proxy`
+     - Specify the host and port in one URL, for example `localhost:8888`.
+   * - `hive.s3.proxy.protocol`
+     - `s3.http-proxy.secure`
+     - Set to `TRUE` to enable HTTPS.
+   * - `hive.s3.proxy.non-proxy-hosts`
+     - `s3.http-proxy.non-proxy-hosts`
+     -
+   * - `hive.s3.proxy.username`
+     - `s3.http-proxy.username`
+     -
+   * - `hive.s3.proxy.password`
+     - `s3.http-proxy.password`
+     -
+   * - `hive.s3.proxy.preemptive-basic-auth`
+     - `s3.http-proxy.preemptive-basic-auth`
+     -
+   * - `hive.s3.sts.endpoint`
+     - `s3.sts.endpoint`
+     -
+   * - `hive.s3.sts.region`
+     - `s3.sts.region`
+     -
+   * - `hive.s3.max-error-retries`
+     - `s3.max-error-retries`
+     - Also see `s3.retry-mode` in preceding sections for more retry behavior
+       configuration options.
+   * - `hive.s3.connect-timeout`
+     - `s3.connect-timeout`
+     -
+   * - `hive.s3.connect-ttl`
+     - `s3.connection-ttl`
+     - Also see `s3.connection-max-idle-time` in preceding section for more
+       connection keep-alive options.
+   * - `hive.s3.socket-timeout`
+     - `s3.socket-read-timeout`
+     - Also see `s3.tcp-keep-alive` in preceding sections for more socket
+       connection keep-alive options.
+   * - `hive.s3.max-connections`
+     - `s3.max-connections`
+     -
+   * - `hive.s3.path-style-access`
+     - `s3.path-style-access`
+     -
+  :::
+
+1. Remove the following legacy configuration properties if they exist in your
+   catalog configuration:
+
+      * `hive.s3.storage-class`
+      * `hive.s3.signer-type`
+      * `hive.s3.signer-class`
+      * `hive.s3.staging-directory`
+      * `hive.s3.pin-client-to-current-region`
+      * `hive.s3.ssl.enabled`
+      * `hive.s3.sse.enabled`
+      * `hive.s3.kms-key-id`
+      * `hive.s3.encryption-materials-provider`
+      * `hive.s3.streaming.enabled`
+      * `hive.s3.max-client-retries`
+      * `hive.s3.max-backoff-time`
+      * `hive.s3.max-retry-time`
+      * `hive.s3.multipart.min-file-size`
+      * `hive.s3.multipart.min-part-size`
+      * `hive.s3-file-system-type`
+      * `hive.s3.user-agent-prefix`
diff --git a/docs/src/main/sphinx/object-storage/legacy-gcs.md b/docs/src/main/sphinx/object-storage/legacy-gcs.md
@@ -0,0 +1,120 @@
+# Legacy Google Cloud Storage support
+
+Object storage connectors can access
+[Google Cloud Storage](https://cloud.google.com/storage/) data using the
+`gs://` URI prefix.
+
+:::{warning}
+Legacy support is not recommended and will be removed. Use [](file-system-gcs).
+:::
+
+## Requirements
+
+To use Google Cloud Storage with non-anonymous access objects, you need:
+
+- A [Google Cloud service account](https://console.cloud.google.com/projectselector2/iam-admin/serviceaccounts)
+- The key for the service account in JSON format
+
+(hive-google-cloud-storage-configuration)=
+## Configuration
+
+To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in
+your catalog configuration file.
+
+The use of Google Cloud Storage as a storage location for an object storage
+catalog requires setting a configuration property that defines the
+[authentication method for any non-anonymous access object](https://cloud.google.com/storage/docs/authentication). Access methods cannot
+be combined.
+
+The default root path used by the `gs:\\` prefix is set in the catalog by the
+contents of the specified key file, or the key file used to create the OAuth
+token.
+
+:::{list-table} Google Cloud Storage configuration properties
+:widths: 35, 65
+:header-rows: 1
+
+* - Property Name
+  - Description
+* - `hive.gcs.json-key-file-path`
+  - JSON key file used to authenticate your Google Cloud service account with
+    Google Cloud Storage.
+* - `hive.gcs.use-access-token`
+  - Use client-provided OAuth token to access Google Cloud Storage.
+:::
+
+The following uses the Delta Lake connector in an example of a minimal
+configuration file for an object storage catalog using a JSON key file:
+
+```properties
+connector.name=delta_lake
+hive.metastore.uri=thrift://example.net:9083
+hive.gcs.json-key-file-path=${ENV:GCP_CREDENTIALS_FILE_PATH}
+```
+
+## General usage
+
+Create a schema to use if one does not already exist, as in the following
+example:
+
+```sql
+CREATE SCHEMA storage_catalog.sales_data_in_gcs WITH (location = 'gs://example_location');
+```
+
+Once you have created a schema, you can create tables in the schema, as in the
+following example:
+
+```sql
+CREATE TABLE storage_catalog.sales_data_in_gcs.orders (
+    orderkey BIGINT,
+    custkey BIGINT,
+    orderstatus VARCHAR(1),
+    totalprice DOUBLE,
+    orderdate DATE,
+    orderpriority VARCHAR(15),
+    clerk VARCHAR(15),
+    shippriority INTEGER,
+    comment VARCHAR(79)
+);
+```
+
+This statement creates the folder `gs://sales_data_in_gcs/orders` in the root
+folder defined in the JSON key file.
+
+Your table is now ready to populate with data using `INSERT` statements.
+Alternatively, you can use `CREATE TABLE AS` statements to create and
+populate the table in a single statement.
+
+
+## Migration to Google Cloud Storage file system
+
+Trino includes a [native implementation to access Google Cloud
+Storage](/object-storage/file-system-gcs) with a catalog using the Delta Lake,
+Hive, Hudi, or Iceberg connectors. Upgrading existing deployments to the new
+native implementation is recommended. Legacy support will be deprecated and
+removed.
+
+To migrate a catalog to use the native file system implementation for Google
+Cloud Storage, make the following edits to your catalog configuration:
+
+1. Add the `fs.native-gcs.enabled=true` catalog configuration property.
+2. Refer to the following table to rename your existing legacy catalog
+   configuration properties to the corresponding native configuration
+   properties. Supported configuration values are identical unless otherwise
+   noted.
+
+  :::{list-table}
+  :widths: 35, 35, 65
+  :header-rows: 1
+   * - Legacy property
+     - Native property
+     - Notes
+   * - `hive.gcs.use-access-token`
+     - `gcs.use-access-token`
+     -
+   * - `hive.gcs.json-key-file-path`
+     - `gcs.json-key-file-path`
+     - Also see `gcs.json-key` in [](/object-storage/file-system-gcs)
+  :::
+
+For more information, see the [](/object-storage/file-system-gcs).
diff --git a/docs/src/main/sphinx/object-storage/legacy-s3.md b/docs/src/main/sphinx/object-storage/legacy-s3.md
@@ -335,7 +335,6 @@ the `org.apache.hadoop.conf.Configurable` interface from the Hadoop Java API, th
 is passed in after the object instance is created, and before it is asked to provision or retrieve any
 encryption keys.
 
-(fs-legacy-s3-migration)=
 ## Migration to S3 file system
 
 Trino includes a [native implementation to access Amazon
diff --git a/docs/src/main/sphinx/release/release-458.md b/docs/src/main/sphinx/release/release-458.md
@@ -20,8 +20,8 @@
   support](file-system-configuration) with
   `fs.native-azure.enabled`,`fs.native-gcs.enabled`, `fs.native-s3.enabled`, or
   `fs.hadoop.enabled` in each catalog. Use the migration guides for [Azure
-  Storage](fs-legacy-azure-migration), Google Cloud
-  Storage, and [S3](fs-legacy-s3-migration) to assist
+  Storage](fs-legacy-azure-migration), [Google Cloud
+  Storage](fs-legacy-gcs-migration), and [S3](fs-legacy-s3-migration) to assist
   if you have not switched from legacy support. ({issue}`23343`)
 * Add JMX monitoring to the [](/object-storage/file-system-s3). ({issue}`23177`)
 * Reduce the number of file system operations when reading from Delta Lake
@@ -39,8 +39,8 @@
   support](file-system-configuration) with
   `fs.native-azure.enabled`,`fs.native-gcs.enabled`, `fs.native-s3.enabled`, or
   `fs.hadoop.enabled` in each catalog. Use the migration guides for [Azure
-  Storage](fs-legacy-azure-migration), Google Cloud
-  Storage, and [S3](fs-legacy-s3-migration) to assist
+  Storage](fs-legacy-azure-migration), [Google Cloud
+  Storage](fs-legacy-gcs-migration), and [S3](fs-legacy-s3-migration) to assist
   if you have not switched from legacy support. ({issue}`23343`)
 * Add JMX monitoring to the native S3 file system support. ({issue}`23177`)
 * Reduce the number of file system operations when reading tables with file system
@@ -57,8 +57,8 @@
   support](file-system-configuration) with
   `fs.native-azure.enabled`,`fs.native-gcs.enabled`, `fs.native-s3.enabled`, or
   `fs.hadoop.enabled` in each catalog. Use the migration guides for [Azure
-  Storage](fs-legacy-azure-migration), Google Cloud
-  Storage, and [S3](fs-legacy-s3-migration) to assist
+  Storage](fs-legacy-azure-migration), [Google Cloud
+  Storage](fs-legacy-gcs-migration), and [S3](fs-legacy-s3-migration) to assist
   if you have not switched from legacy support. ({issue}`23343`)
 * Add JMX monitoring to the native S3 file system support. ({issue}`23177`)
 * Fix rare, long planning times when Hive metastore caching is enabled. ({issue}`23401`)
@@ -70,8 +70,8 @@
   support](file-system-configuration) with
   `fs.native-azure.enabled`,`fs.native-gcs.enabled`, `fs.native-s3.enabled`, or
   `fs.hadoop.enabled` in each catalog. Use the migration guides for [Azure
-  Storage](fs-legacy-azure-migration), Google Cloud
-  Storage, and [S3](fs-legacy-s3-migration) to assist
+  Storage](fs-legacy-azure-migration), [Google Cloud
+  Storage](fs-legacy-gcs-migration), and [S3](fs-legacy-s3-migration) to assist
   if you have not switched from legacy support. ({issue}`23343`)
 * Add JMX monitoring to the native S3 file system support. ({issue}`23177`)
 * Fix rare, long planning times when Hive metastore caching is enabled. ({issue}`23401`)