This component downloads files from S3 to /data/out/files
.
- Supports
*
wildcards - Handles subfolders
- Can process only new files
- Skips files stored in Glacier & Glacier Deep Archive
loginType
(required) -- Login type (credentials
orrole
)accessKeyId
(required ifloginType
is"credentials"
) -- AWS Access Key ID#secretAccessKey
(required ifloginType
is"credentials"
) -- AWS Secret Access KeyaccountId
(required if loginType is "role") - AWS Account IDbucket
(required) -- AWS S3 bucket name (the region will be autodetected)key
(required) -- Search key prefix, optionally ending with a*
wildcard. All filed downloaded using a wildcard are stored in/data/out/files/wildcard
.saveAs
(optional) -- Store all downloaded files in a specified folder.includeSubfolders
(optional) -- Download all subfolders. Available only when using a wildcard in the search key prefix.- The subfolder structure will be flattened, replacing
/
in the path with-
, e.g.,folder1/file1.csv
=>folder1-file1.csv
. - Existing
-
characters will be escaped to avoid colisions with another-
, e.g.,collision-file.csv
=>collision--file.csv
.
- The subfolder structure will be flattened, replacing
newFilesOnly
(optional) -- Download only new files.- The last downloaded file's timestamp is stored in the
lastDownloadedFileTimestamp
property of the state file. - If multiple files have the same timestamp,
processedFilesInLastTimestampSecond
records all processed files within that second.
- The last downloaded file's timestamp is stored in the
limit
(optional, default0
) -- Maximum number of files to download.- If
key
matches more files thanlimit
, the oldest files will be downloaded first. - When used with
newFilesOnly
, the extractor will process up tolimit
new files that have not been downloaded yet.
- If
{
"parameters": {
"accessKeyId": "AKIA****",
"#secretAccessKey": "****",
"bucket": "myBucket",
"key": "myfile.csv",
"includeSubfolders": false,
"newFilesOnly": false
}
}
{
"parameters": {
"accountId": "1234567890",
"bucket": "myBucket",
"key": "myfile.csv",
"includeSubfolders": false,
"newFilesOnly": false
}
}
{
"parameters": {
"accessKeyId": "AKIA****",
"#secretAccessKey": "****",
"bucket": "myBucket",
"key": "myfolder/*",
"saveAs": "myfolder",
"includeSubfolders": false,
"newFilesOnly": false
}
}
{
"parameters": {
"accessKeyId": "AKIA****",
"#secretAccessKey": "****",
"bucket": "myBucket",
"key": "myfolder/*",
"includeSubfolders": true,
"newFilesOnly": true
}
}
Note: state.json must be provided in this case.
{
"parameters": {
"accessKeyId": "AKIA****",
"#secretAccessKey": "****",
"bucket": "myBucket",
"key": "myfolder/*",
"includeSubfolders": true,
"newFilesOnly": true,
"limit": 100
}
}
Note: state.json has to be provided in this case.
- Create an AWS S3 bucket and IAM user using
aws-services.json
CloudFormation template. - Create a
.env
file. Use the output of theaws-services
CloudFront stack to populate the variables, along with your Redshift credentials.
AWS_S3_BUCKET=
AWS_REGION=
UPLOAD_USER_AWS_ACCESS_KEY=
UPLOAD_USER_AWS_SECRET_KEY=
DOWNLOAD_USER_AWS_ACCESS_KEY=
DOWNLOAD_USER_AWS_SECRET_KEY=
KEBOOLA_USER_AWS_ACCESS_KEY=
KEBOOLA_USER_AWS_SECRET_KEY=
ACCOUNT_ID=
ROLE_NAME=
KBC_PROJECTID=
KBC_STACKID=
- Build Docker images
docker-compose build
- Install Composer packages
docker-compose run --rm dev composer install --prefer-dist --no-interaction
Run tests with the following command.
docker-compose run --rm dev ./vendor/bin/phpunit
MIT licensed, see the LICENSE file.