Skip to content

Commit 69e3836

Browse files
committed
Move file system migration guides
We are removing the docs for the legacy systems but for now want to keep the migration guides to allow people looking at the current docs to still be able to read how to upgrade. We will remove these docs in the future, potentially as part of the removal of the code, or even later.
1 parent b240af5 commit 69e3836

File tree

6 files changed

+335
-9
lines changed

6 files changed

+335
-9
lines changed

docs/src/main/sphinx/object-storage/file-system-azure.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,3 +117,52 @@ storage accounts:
117117
use the **Client ID**, **Secret** and **Tenant ID** values from the
118118
application registration, to configure the catalog using properties from
119119
[](azure-oauth-authentication).
120+
121+
122+
(fs-legacy-azure-migration)=
123+
## Migration from legacy Azure Storage file system
124+
125+
Trino includes legacy Azure Storage support to use with a catalog using the
126+
Delta Lake, Hive, Hudi, or Iceberg connectors. Upgrading existing deployments to
127+
the current native implementation is recommended. Legacy support is deprecated
128+
and will be removed.
129+
130+
To migrate a catalog to use the native file system implementation for Azure,
131+
make the following edits to your catalog configuration:
132+
133+
1. Add the `fs.native-azure.enabled=true` catalog configuration property.
134+
2. Configure the `azure.auth-type` catalog configuration property.
135+
3. Refer to the following table to rename your existing legacy catalog
136+
configuration properties to the corresponding native configuration
137+
properties. Supported configuration values are identical unless otherwise
138+
noted.
139+
140+
:::{list-table}
141+
:widths: 35, 35, 65
142+
:header-rows: 1
143+
* - Legacy property
144+
- Native property
145+
- Notes
146+
* - `hive.azure.abfs-access-key`
147+
- `azure.access-key`
148+
-
149+
* - `hive.azure.abfs.oauth.endpoint`
150+
- `azure.oauth.endpoint`
151+
- Also see `azure.oauth.tenant-id` in [](azure-oauth-authentication).
152+
* - `hive.azure.abfs.oauth.client-id`
153+
- `azure.oauth.client-id`
154+
-
155+
* - `hive.azure.abfs.oauth.secret`
156+
- `azure.oauth.secret`
157+
-
158+
* - `hive.azure.abfs.oauth2.passthrough`
159+
- `azure.use-oauth-passthrough-token`
160+
-
161+
:::
162+
163+
4. Remove the following legacy configuration properties if they exist in your
164+
catalog configuration:
165+
166+
* `hive.azure.abfs-storage-account`
167+
* `hive.azure.wasb-access-key`
168+
* `hive.azure.wasb-storage-account`

docs/src/main/sphinx/object-storage/file-system-gcs.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,3 +78,34 @@ Cloud Storage:
7878
- Path to the JSON file on each node that contains your Google Cloud Platform
7979
service account key. Not to be set together with `gcs.json-key`.
8080
:::
81+
82+
(fs-legacy-gcs-migration)=
83+
## Migration from legacy Google Cloud Storage file system
84+
85+
Trino includes legacy Google Cloud Storage support to use with a catalog using
86+
the Delta Lake, Hive, Hudi, or Iceberg connectors. Upgrading existing
87+
deployments to the current native implementation is recommended. Legacy support
88+
is deprecated and will be removed.
89+
90+
To migrate a catalog to use the native file system implementation for Google
91+
Cloud Storage, make the following edits to your catalog configuration:
92+
93+
1. Add the `fs.native-gcs.enabled=true` catalog configuration property.
94+
2. Refer to the following table to rename your existing legacy catalog
95+
configuration properties to the corresponding native configuration
96+
properties. Supported configuration values are identical unless otherwise
97+
noted.
98+
99+
:::{list-table}
100+
:widths: 35, 35, 65
101+
:header-rows: 1
102+
* - Legacy property
103+
- Native property
104+
- Notes
105+
* - `hive.gcs.use-access-token`
106+
- `gcs.use-access-token`
107+
-
108+
* - `hive.gcs.json-key-file-path`
109+
- `gcs.json-key-file-path`
110+
- Also see `gcs.json-key` in preceding sections
111+
:::

docs/src/main/sphinx/object-storage/file-system-s3.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,3 +277,130 @@ Example JSON configuration:
277277
are converted to a colon.
278278
Choose a value not used in any of your IAM ARNs.
279279
:::
280+
281+
282+
(fs-legacy-s3-migration)=
283+
## Migration from legacy S3 file system
284+
285+
Trino includes legacy Amazon S3 support to use with a catalog using the Delta
286+
Lake, Hive, Hudi, or Iceberg connectors. Upgrading existing deployments to the
287+
current native implementation is recommended. Legacy support is deprecated and
288+
will be removed.
289+
290+
To migrate a catalog to use the native file system implementation for S3, make
291+
the following edits to your catalog configuration:
292+
293+
1. Add the `fs.native-s3.enabled=true` catalog configuration property.
294+
2. Refer to the following table to rename your existing legacy catalog
295+
configuration properties to the corresponding native configuration
296+
properties. Supported configuration values are identical unless otherwise
297+
noted.
298+
299+
:::{list-table}
300+
:widths: 35, 35, 65
301+
:header-rows: 1
302+
* - Legacy property
303+
- Native property
304+
- Notes
305+
* - `hive.s3.aws-access-key`
306+
- `s3.aws-access-key`
307+
-
308+
* - `hive.s3.aws-secret-key`
309+
- `s3.aws-secret-key`
310+
-
311+
* - `hive.s3.iam-role`
312+
- `s3.iam-role`
313+
- Also see `s3.role-session-name` in preceding sections
314+
for more role configuration options.
315+
* - `hive.s3.external-id`
316+
- `s3.external-id`
317+
-
318+
* - `hive.s3.endpoint`
319+
- `s3.endpoint`
320+
- Add the `https://` prefix to make the value a correct URL.
321+
* - `hive.s3.region`
322+
- `s3.region`
323+
-
324+
* - `hive.s3.sse.enabled`
325+
- None
326+
- `s3.sse.type` set to the default value of `NONE` is equivalent to
327+
`hive.s3.sse.enabled=false`.
328+
* - `hive.s3.sse.type`
329+
- `s3.sse.type`
330+
-
331+
* - `hive.s3.sse.kms-key-id`
332+
- `s3.sse.kms-key-id`
333+
-
334+
* - `hive.s3.upload-acl-type`
335+
- `s3.canned-acl`
336+
- See preceding sections for supported values.
337+
* - `hive.s3.streaming.part-size`
338+
- `s3.streaming.part-size`
339+
-
340+
* - `hive.s3.proxy.host`, `hive.s3.proxy.port`
341+
- `s3.http-proxy`
342+
- Specify the host and port in one URL, for example `localhost:8888`.
343+
* - `hive.s3.proxy.protocol`
344+
- `s3.http-proxy.secure`
345+
- Set to `TRUE` to enable HTTPS.
346+
* - `hive.s3.proxy.non-proxy-hosts`
347+
- `s3.http-proxy.non-proxy-hosts`
348+
-
349+
* - `hive.s3.proxy.username`
350+
- `s3.http-proxy.username`
351+
-
352+
* - `hive.s3.proxy.password`
353+
- `s3.http-proxy.password`
354+
-
355+
* - `hive.s3.proxy.preemptive-basic-auth`
356+
- `s3.http-proxy.preemptive-basic-auth`
357+
-
358+
* - `hive.s3.sts.endpoint`
359+
- `s3.sts.endpoint`
360+
-
361+
* - `hive.s3.sts.region`
362+
- `s3.sts.region`
363+
-
364+
* - `hive.s3.max-error-retries`
365+
- `s3.max-error-retries`
366+
- Also see `s3.retry-mode` in preceding sections for more retry behavior
367+
configuration options.
368+
* - `hive.s3.connect-timeout`
369+
- `s3.connect-timeout`
370+
-
371+
* - `hive.s3.connect-ttl`
372+
- `s3.connection-ttl`
373+
- Also see `s3.connection-max-idle-time` in preceding section for more
374+
connection keep-alive options.
375+
* - `hive.s3.socket-timeout`
376+
- `s3.socket-read-timeout`
377+
- Also see `s3.tcp-keep-alive` in preceding sections for more socket
378+
connection keep-alive options.
379+
* - `hive.s3.max-connections`
380+
- `s3.max-connections`
381+
-
382+
* - `hive.s3.path-style-access`
383+
- `s3.path-style-access`
384+
-
385+
:::
386+
387+
1. Remove the following legacy configuration properties if they exist in your
388+
catalog configuration:
389+
390+
* `hive.s3.storage-class`
391+
* `hive.s3.signer-type`
392+
* `hive.s3.signer-class`
393+
* `hive.s3.staging-directory`
394+
* `hive.s3.pin-client-to-current-region`
395+
* `hive.s3.ssl.enabled`
396+
* `hive.s3.sse.enabled`
397+
* `hive.s3.kms-key-id`
398+
* `hive.s3.encryption-materials-provider`
399+
* `hive.s3.streaming.enabled`
400+
* `hive.s3.max-client-retries`
401+
* `hive.s3.max-backoff-time`
402+
* `hive.s3.max-retry-time`
403+
* `hive.s3.multipart.min-file-size`
404+
* `hive.s3.multipart.min-part-size`
405+
* `hive.s3-file-system-type`
406+
* `hive.s3.user-agent-prefix`
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Legacy Google Cloud Storage support
2+
3+
Object storage connectors can access
4+
[Google Cloud Storage](https://cloud.google.com/storage/) data using the
5+
`gs://` URI prefix.
6+
7+
:::{warning}
8+
Legacy support is not recommended and will be removed. Use [](file-system-gcs).
9+
:::
10+
11+
## Requirements
12+
13+
To use Google Cloud Storage with non-anonymous access objects, you need:
14+
15+
- A [Google Cloud service account](https://console.cloud.google.com/projectselector2/iam-admin/serviceaccounts)
16+
- The key for the service account in JSON format
17+
18+
(hive-google-cloud-storage-configuration)=
19+
## Configuration
20+
21+
To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in
22+
your catalog configuration file.
23+
24+
The use of Google Cloud Storage as a storage location for an object storage
25+
catalog requires setting a configuration property that defines the
26+
[authentication method for any non-anonymous access object](https://cloud.google.com/storage/docs/authentication). Access methods cannot
27+
be combined.
28+
29+
The default root path used by the `gs:\\` prefix is set in the catalog by the
30+
contents of the specified key file, or the key file used to create the OAuth
31+
token.
32+
33+
:::{list-table} Google Cloud Storage configuration properties
34+
:widths: 35, 65
35+
:header-rows: 1
36+
37+
* - Property Name
38+
- Description
39+
* - `hive.gcs.json-key-file-path`
40+
- JSON key file used to authenticate your Google Cloud service account with
41+
Google Cloud Storage.
42+
* - `hive.gcs.use-access-token`
43+
- Use client-provided OAuth token to access Google Cloud Storage.
44+
:::
45+
46+
The following uses the Delta Lake connector in an example of a minimal
47+
configuration file for an object storage catalog using a JSON key file:
48+
49+
```properties
50+
connector.name=delta_lake
51+
hive.metastore.uri=thrift://example.net:9083
52+
hive.gcs.json-key-file-path=${ENV:GCP_CREDENTIALS_FILE_PATH}
53+
```
54+
55+
## General usage
56+
57+
Create a schema to use if one does not already exist, as in the following
58+
example:
59+
60+
```sql
61+
CREATE SCHEMA storage_catalog.sales_data_in_gcs WITH (location = 'gs://example_location');
62+
```
63+
64+
Once you have created a schema, you can create tables in the schema, as in the
65+
following example:
66+
67+
```sql
68+
CREATE TABLE storage_catalog.sales_data_in_gcs.orders (
69+
orderkey BIGINT,
70+
custkey BIGINT,
71+
orderstatus VARCHAR(1),
72+
totalprice DOUBLE,
73+
orderdate DATE,
74+
orderpriority VARCHAR(15),
75+
clerk VARCHAR(15),
76+
shippriority INTEGER,
77+
comment VARCHAR(79)
78+
);
79+
```
80+
81+
This statement creates the folder `gs://sales_data_in_gcs/orders` in the root
82+
folder defined in the JSON key file.
83+
84+
Your table is now ready to populate with data using `INSERT` statements.
85+
Alternatively, you can use `CREATE TABLE AS` statements to create and
86+
populate the table in a single statement.
87+
88+
89+
## Migration to Google Cloud Storage file system
90+
91+
Trino includes a [native implementation to access Google Cloud
92+
Storage](/object-storage/file-system-gcs) with a catalog using the Delta Lake,
93+
Hive, Hudi, or Iceberg connectors. Upgrading existing deployments to the new
94+
native implementation is recommended. Legacy support will be deprecated and
95+
removed.
96+
97+
To migrate a catalog to use the native file system implementation for Google
98+
Cloud Storage, make the following edits to your catalog configuration:
99+
100+
1. Add the `fs.native-gcs.enabled=true` catalog configuration property.
101+
2. Refer to the following table to rename your existing legacy catalog
102+
configuration properties to the corresponding native configuration
103+
properties. Supported configuration values are identical unless otherwise
104+
noted.
105+
106+
:::{list-table}
107+
:widths: 35, 35, 65
108+
:header-rows: 1
109+
* - Legacy property
110+
- Native property
111+
- Notes
112+
* - `hive.gcs.use-access-token`
113+
- `gcs.use-access-token`
114+
-
115+
* - `hive.gcs.json-key-file-path`
116+
- `gcs.json-key-file-path`
117+
- Also see `gcs.json-key` in [](/object-storage/file-system-gcs)
118+
:::
119+
120+
For more information, see the [](/object-storage/file-system-gcs).

docs/src/main/sphinx/object-storage/legacy-s3.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,6 @@ the `org.apache.hadoop.conf.Configurable` interface from the Hadoop Java API, th
335335
is passed in after the object instance is created, and before it is asked to provision or retrieve any
336336
encryption keys.
337337

338-
(fs-legacy-s3-migration)=
339338
## Migration to S3 file system
340339

341340
Trino includes a [native implementation to access Amazon

docs/src/main/sphinx/release/release-458.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@
2020
support](file-system-configuration) with
2121
`fs.native-azure.enabled`,`fs.native-gcs.enabled`, `fs.native-s3.enabled`, or
2222
`fs.hadoop.enabled` in each catalog. Use the migration guides for [Azure
23-
Storage](fs-legacy-azure-migration), Google Cloud
24-
Storage, and [S3](fs-legacy-s3-migration) to assist
23+
Storage](fs-legacy-azure-migration), [Google Cloud
24+
Storage](fs-legacy-gcs-migration), and [S3](fs-legacy-s3-migration) to assist
2525
if you have not switched from legacy support. ({issue}`23343`)
2626
* Add JMX monitoring to the [](/object-storage/file-system-s3). ({issue}`23177`)
2727
* Reduce the number of file system operations when reading from Delta Lake
@@ -39,8 +39,8 @@
3939
support](file-system-configuration) with
4040
`fs.native-azure.enabled`,`fs.native-gcs.enabled`, `fs.native-s3.enabled`, or
4141
`fs.hadoop.enabled` in each catalog. Use the migration guides for [Azure
42-
Storage](fs-legacy-azure-migration), Google Cloud
43-
Storage, and [S3](fs-legacy-s3-migration) to assist
42+
Storage](fs-legacy-azure-migration), [Google Cloud
43+
Storage](fs-legacy-gcs-migration), and [S3](fs-legacy-s3-migration) to assist
4444
if you have not switched from legacy support. ({issue}`23343`)
4545
* Add JMX monitoring to the native S3 file system support. ({issue}`23177`)
4646
* Reduce the number of file system operations when reading tables with file system
@@ -57,8 +57,8 @@
5757
support](file-system-configuration) with
5858
`fs.native-azure.enabled`,`fs.native-gcs.enabled`, `fs.native-s3.enabled`, or
5959
`fs.hadoop.enabled` in each catalog. Use the migration guides for [Azure
60-
Storage](fs-legacy-azure-migration), Google Cloud
61-
Storage, and [S3](fs-legacy-s3-migration) to assist
60+
Storage](fs-legacy-azure-migration), [Google Cloud
61+
Storage](fs-legacy-gcs-migration), and [S3](fs-legacy-s3-migration) to assist
6262
if you have not switched from legacy support. ({issue}`23343`)
6363
* Add JMX monitoring to the native S3 file system support. ({issue}`23177`)
6464
* Fix rare, long planning times when Hive metastore caching is enabled. ({issue}`23401`)
@@ -70,8 +70,8 @@
7070
support](file-system-configuration) with
7171
`fs.native-azure.enabled`,`fs.native-gcs.enabled`, `fs.native-s3.enabled`, or
7272
`fs.hadoop.enabled` in each catalog. Use the migration guides for [Azure
73-
Storage](fs-legacy-azure-migration), Google Cloud
74-
Storage, and [S3](fs-legacy-s3-migration) to assist
73+
Storage](fs-legacy-azure-migration), [Google Cloud
74+
Storage](fs-legacy-gcs-migration), and [S3](fs-legacy-s3-migration) to assist
7575
if you have not switched from legacy support. ({issue}`23343`)
7676
* Add JMX monitoring to the native S3 file system support. ({issue}`23177`)
7777
* Fix rare, long planning times when Hive metastore caching is enabled. ({issue}`23401`)

0 commit comments

Comments
 (0)