diff --git a/UseCases/ExternalObjectStore/README.md b/UseCases/ExternalObjectStore/README.md index 778d623b..c25789d2 100644 --- a/UseCases/ExternalObjectStore/README.md +++ b/UseCases/ExternalObjectStore/README.md @@ -2,37 +2,39 @@ ### Introduction -The following is a summary of how to access different formats of data stored in an external object store. You can copy and modify the example queries below to access your own datasets. For simplicity the included datasets are setup to not need credentials, but it is highly recommended that you use credentials to access your own datasets. +The following is a summary of how to access different formats of data stored in an object store. You can copy and modify the example queries below to access your own datasets. For simplicity the included datasets are setup to not need credentials, but it is highly recommended that you use credentials to access your own datasets. -You can use similar SQL to access your own external object store. Simply replace the following: -* __LOCATION__ - Replace with the location of your object store. The location must begin with /s3/ (Amazon) or /az/ (Azure). -* __USER__ or __ACCESS_ID__ - Add the user name for your external object store. -* __PASSWORD__ or __ACCESS_KEY__ - Add the password of the user on your external object store. +You can use similar SQL to access your own object stores. Simply replace the following: +* __LOCATION__ - Replace with the location of your object store. The location must begin with /s3/ (Amazon), /az/ (Azure), or /gs/ (Google). +* __USER__ or __ACCESS_ID__ - Add the user name for your object store. +* __PASSWORD__ or __ACCESS_KEY__ - Add the password of the user on your object store. +* __SESSION_TOKEN__ - You can also add an optional session token if using AWS Service Token Service to provide you with the access credentials. * Uncomment the EXTERNAL SECURITY clause as necessary -When modifying to access your data - your external object store must be configured to allow access from the Vantage environment. Provide your credentials in USER and PASSWORD (used in the CREATE AUTHORIZATION command) and ACCESS_ID and ACCESS_KEY (used in the READ_NOS command). +When modifying to access your data - your object store must be configured to allow access from the Vantage environment. Provide your credentials in USER and PASSWORD (used in the CREATE AUTHORIZATION command) or AUTHORIZATION simple syntax (used in the READ_NOS command if you don't wish to use the Authorization Object). -### Accessing External Object Storage +### Accessing Object Storage -There are two ways to read data from an external object store: +There are two ways to read data from an object store: #### READ_NOS READ_NOS allows you to do the following: -* Perform an ad hoc query on data that is in CSV and JSON formats with the data in-place on an external object store -* Examine the schema of PARQUET formatted data +* Perform an ad hoc query on data that is in Parquet, CSV, and JSON formats with the data in-place on an object store +* Examine the structure (layoout) of the object store +* Examine the schema of the formatted data * Bypass creating a foreign table in the database #### Foreign Tables -Users with CREATE TABLE privilege can create a foreign table inside the database, point this virtual table to an external storage location, and use SQL to translate the external data into a form useful for business. -Using a foreign table in gives you the ability to: +Users with CREATE TABLE privilege can create a foreign table inside the database, point this virtual table to an object store location, and use SQL to translate the data into a form useful for business. +Using a foreign table gives you the ability to: * Load external data to the database * Join external data to data stored in the database * Filter the data -* Use views to simplify how the data appears to your users +* Use views to simplify how the data appears to your users or provide added security access controls -Data read through a foreign server is not automatically stored on disk and the data can only be seen by that query. +Data read through a foreign server is not persisted and the data can only be seen by that query. Data can be loaded into the database by selecting from READ_NOS or a Foreign Table in a CREATE TABLE AS ... WITH DATA statement. @@ -42,202 +44,61 @@ Create an authorization object to contain the credentials to your external objec ```sql -CREATE AUTHORIZATION InvAuth -AS INVOKER TRUSTED +CREATE AUTHORIZATION MyAuth USER 'ACCESS_KEY_ID' PASSWORD 'SECRET_ACCESS_KEY'; ``` -### Accessing CSV Data Stored on Amazon S3 with READ_NOS +### Accessing Data Stored on Amazon S3 with READ_NOS Select data from external object store using READ_NOS: ```sql -SELECT TOP 2 payload..* FROM READ_NOS( -ON ( SELECT CAST( NULL AS DATASET STORAGE FORMAT CSV ) ) USING -LOCATION('/s3/s3.amazonaws.com/trial-datasets/IndoorSensor/') ---ACCESS_ID ( 'ACCESS_KEY_ID' ) ---ACCESS_KEY ( 'SECRET_ACCESS_KEY' ) +SELECT TOP 2 * FROM ( +LOCATION='/s3/trial-datasets.s3.amazonaws.com/IndoorSensor/' +AUTHORIZATION=MyAuth +--AUTHORIZATION='{"access_id":"ACCESS_KEY_ID","access_key":"SECRET_ACCESS_KEY"}' --[optional AUTHORIZATION using direct credentials] +--RETURNTYPE='NOSREAD_KEY' --[optional if wanting to list the layout of the object store] +--RETURNTYPE='NOSREAD_SCHEMA' --[optional if wanting to display the schema of the data files] ) AS D; ``` -### Accessing CSV Data Stored on Amazon S3 with CREATE FOREIGN TABLE +### Accessing Data Stored on Amazon S3 with CREATE FOREIGN TABLE Create a foreign table: ```sql -CREATE FOREIGN TABLE sample_csv ---, EXTERNAL SECURITY INVOKER TRUSTED InvAuth -( Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC, Payload DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV -) -USING (LOCATION('/s3/s3.amazonaws.com/trial-datasets/IndoorSensor/')); +CREATE FOREIGN TABLE sample_table +--, EXTERNAL SECURITY MyAuth +USING (LOCATION('/s3/trial-datasets.s3.amazonaws.com/IndoorSensor/')); ``` View some data using the foreign table: ```sql -SELECT TOP 2 CAST(payload AS VARCHAR(64000)) -FROM sample_csv; +SELECT TOP 2 * FROM sample_table; ``` -### Import data into Vantage from CSV data stored on Amazon S3 +### Importing Data Into Vantage From Data Stored on Amazon S3 -To persist the data from an external object store we can use a CREATE TABLE AS statement as follows: +To persist the data from an object store we can use a CREATE TABLE AS statement as follows: -First we need a schema to apply to the data: +CREATE MULTISET TABLE sample_local_table AS ( + SELECT * FROM ( + LOCATION='/s3/trial-datasets.s3.amazonaws.com/IndoorSensor/' + AUTHORIZATION=MyAuth + ) AS D +) WITH DATA; -```sql -CREATE CSV SCHEMA sample_csv_schema AS -'{"field_delimiter":",","field_names":["date", "time", "epoch", "moteid", "temperature", "humidity", "light", "voltage"]}'; -``` - -Create a foreign table that uses that schema: - - -```sql -CREATE FOREIGN TABLE sample_csv_ft -( Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC, - Payload DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV WITH SCHEMA sample_csv_schema -) -USING (LOCATION('/s3/s3.amazonaws.com/trial-datasets/IndoorSensor/data.csv')); -``` - -Create a view that splits out the CSV into individual columns: - - -```sql -REPLACE VIEW sample_csv_view - AS - (SELECT - CAST(payload.."date" AS DATE FORMAT 'YYYY-MM-DD') sensdate, - CAST(payload.."time" AS TIME(6) FORMAT 'HH:MI:SSDS(F)') senstime, - CAST(payload..epoch AS INTEGER) epoch, - CAST(payload..moteid AS INTEGER) moteid, - CAST(payload..temperature AS FLOAT) ( FORMAT '-ZZZ9.99') temperature, - CAST(payload..humidity AS FLOAT) ( FORMAT '-ZZZ9.99') humidity, - CAST(payload..light AS FLOAT) ( FORMAT '-ZZZ9.99') light, - CAST(payload..voltage AS FLOAT) ( FORMAT '-ZZZ9.99') voltage, - CAST(payload.."date" || ' ' || payload.."time" AS TIMESTAMP FORMAT 'YYYY-MM-DDBHH:MI:SSDS(F)') sensdatetime - FROM sample_csv_ft); -``` - - -```sql -SELECT TOP 2 * -FROM sample_csv_view; -``` - -### Accessing JSON Data Stored on Amazon S3 with READ_NOS - -Select data from external object store using READ_NOS: - - -```sql -SELECT TOP 2 payload.* FROM READ_NOS ( -ON ( SELECT CAST( NULL AS JSON ) ) -USING -LOCATION ('/S3/s3.amazonaws.com/trial-datasets/EVCarBattery/') ---ACCESS_ID ( 'ACCESS_KEY_ID' ) ---ACCESS_KEY ( 'SECRET_ACCESS_KEY' ) -) AS D -``` - -### Accessing JSON Data Stored on Amazon S3 with CREATE FOREIGN TABLE - -Create a foreign table: - - -```sql -CREATE FOREIGN TABLE sample_json ---, EXTERNAL SECURITY INVOKER TRUSTED InvAuth -( -LOCATION VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC , Payload JSON INLINE LENGTH 32000 CHARACTER SET UNICODE ) -USING( -LOCATION('/S3/s3.amazonaws.com/trial-datasets/EVCarBattery/') ) -``` - -View some data using the foreign table: - - -```sql -SELECT TOP 2 * FROM sample_json -``` - -### Exploring Parquet Data Stored on Amazon S3 with READ_NOS - -See what files are in the Parquet bucket: - - -```sql -SELECT location(char(255)), ObjectLength -FROM read_nos ( -ON (select cast(NULL AS DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV)) -USING - LOCATION ('/S3/s3.amazonaws.com/trial-datasets/SalesOffload/') - RETURNTYPE ('NOSREAD_KEYS') - --ACCESS_ID ( 'ACCESS_KEY_ID' ) - --ACCESS_KEY ( 'SECRET_ACCESS_KEY' ) -) AS D -ORDER BY 1 -``` - -See what the schema of the data is (specify one file - assuming all files in the bucket are formatted the same): - - -```sql -SELECT * FROM READ_NOS ( - USING - LOCATION ('/s3/s3.amazonaws.com/trial-datasets/SalesOffload/2010/1/object_33_0_1.parquet') - RETURNTYPE ('NOSREAD_PARQUET_SCHEMA') - FULLSCAN ('TRUE') - --ACCESS_ID ( 'ACCESS_KEY_ID' ) - --ACCESS_KEY ( 'SECRET_ACCESS_KEY' ) -) AS D -``` - -### Accessing Parquet Data Stored on Amazon S3 with CREATE FOREIGN TABLE - -Create a foreign table: - - -```sql -CREATE FOREIGN TABLE sample_parquet ---, EXTERNAL SECURITY INVOKER TRUSTED InvAuth -( -Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC, -TheYear INTEGER, -TheMonth INTEGER, -sales_date DATE FORMAT 'YY/MM/DD', -customer_id INTEGER, -store_id INTEGER, -basket_id INTEGER, -product_id INTEGER, -sales_quantity INTEGER, -discount_amount FLOAT FORMAT '-ZZZ9.99' -) -USING ( -LOCATION ('/s3/s3.amazonaws.com/trial-datasets/SalesOffload') -STOREDAS ('PARQUET')) -NO PRIMARY INDEX -PARTITION BY COLUMN -``` - -View some data using the foreign table: - - -```sql -SELECT TOP 2 * FROM sample_parquet -``` - -# Writing Data to an External Object Store +# Writing Data to an Object Store ## Introduction -The following is a summary of how to copy data from Vantage to an external object store. You must provide your own bucket to execute the example queries below." +The following is a summary of how to copy data from Vantage to an object store. You must provide your own bucket to execute the example queries below." ## WRITE_NOS @@ -249,8 +110,9 @@ WRITE_NOS allows you to do the following: Before running the following examples, replace the following fields in the example scripts: * *YourBucketName* : with the name of your bucket -* *AccessID* : from the Access Key for your bucket - Access key ID example: AKIAIOSFODNN7EXAMPLE -* *AccessKey* : from the Access Key for your bucket - Secret Access Key example: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY +* *YourAuthObject* : your authorization object containing your storage credentials +* [optionally] *AccessID* : from the Access Key for your bucket - Access key ID example: AKIAIOSFODNN7EXAMPLE +* [optionally] *AccessKey* : from the Access Key for your bucket - Secret Access Key example: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY ### Example 1 This first example will select all rows in local sample_csv_local to copy the dataset to the object store's *sample1* partition: @@ -260,7 +122,8 @@ SELECT * FROM WRITE_NOS ( ON ( SELECT * FROM sample_csv_local ) USING LOCATION ('/s3/YourBucketName.s3.amazonaws.com/sample1/') - AUTHORIZATION ('{\"Access_ID\":\"AccessID\",\"Access_Key\":\"AccessKey\"}') + AUTHORIZATION (MyAuth) +-- AUTHORIZATION ('{\"Access_ID\":\"AccessID\",\"Access_Key\":\"AccessKey\"}') STOREDAS ('PARQUET') ) AS d; ``` @@ -286,7 +149,8 @@ SELECT * FROM WRITE_NOS ( PARTITION BY TheYear ORDER BY TheYear USING LOCATION ('/s3/YourBucketName.s3.amazonaws.com/sample2/') - AUTHORIZATION ('{\"Access_ID\":\"AccessID\",\"Access_Key\":\"AccessKey\"}') + AUTHORIZATION (MyAuth) +-- AUTHORIZATION ('{\"Access_ID\":\"AccessID\",\"Access_Key\":\"AccessKey\"}') NAMING ('DISCRETE') INCLUDE_ORDERING ('FALSE') STOREDAS ('PARQUET') @@ -304,29 +168,13 @@ Drop the objects we created in our own database schema. ```sql -DROP AUTHORIZATION InvAuth; -``` - -```sql -DROP TABLE sample_csv; -``` - -```sql -DROP VIEW sample_csv_view; -``` - -```sql -DROP TABLE sample_csv_ft; -``` - -```sql -DROP CSV SCHEMA sample_csv_schema; +DROP AUTHORIZATION MyAuth; ``` ```sql -DROP TABLE sample_json; +DROP TABLE sample_table; ``` ```sql -DROP TABLE sample_parquet; +DROP TABLE sample_local_table; ```