Skip to content

terascope/file-assets

Repository files navigation

File Asset Bundle

A set of Teraslice processors for working with data stored in files on disk. The readers utilize the chunked-file-reader module to break data into records.

Since all the readers in this asset bundle use DataEntities, the slice's file path can be retrieved from each record by using something like record.getMetadata('path'). More information about DataEntities can be found here.

APIS

Operations

Releases

You can find a list of releases, changes, and pre-built asset bundles here.

Getting Started

This asset bundle requires a running Teraslice cluster Documentation.

# Step 1: make sure you have teraslice-cli installed
yarn global add teraslice-cli

# Step 2:
# teraslice-cli assets deploy <cluster_alias> <asset-name[@version]>
# deploy the latest release to a teraslice cluster
teraslice-cli assets deploy cluster1 terascope/file-assets

# or deploy a specific version to a teraslice cluster
teraslice-cli assets deploy localCluster terascope/[email protected]

# or build from source and deploy to a teraslice cluster
teraslice-cli assets deploy cluster2 --build

Connectors

Terafoundation connector for S3 compatible clients.

S3 Connector

Configuration:

The S3 connector configuration, in your Teraslice configuration file, includes the following parameters:

Configuration Description Type Notes
endpoint Target S3 HTTP endpoint, must be URL String optional, defaults to http://127.0.0.1:80
accessKeyId S3 access key ID String required
secretAccessKey S3 secret access key String required
region AWS Region where bucket is located String optional, defaults to us-east-1
maxRetries Maximum retry attempts Number optional, defaults to 3
sslEnabled Flag to enable/disable SSL communication Boolean optional, defaults to true
caCertificate A string containing a single or multiple ca certificates String optional, defaults to ' '
certLocation DEPRECATED - use caCertificate. Location of ssl cert String optional, defaults to ' '
forcePathStyle Whether to force path style URLs for S3 objects Boolean optional, defaults to false
bucketEndpoint Whether to use the bucket name as the endpoint for this request Boolean optional, defaults to false

Terafoundation S3 configuration example:

terafoundation:
    connectors:
        s3:
            default:
                endpoint: "http://localhost:9000"
                accessKeyId: "yourId"
                secretAccessKey: "yourPassword"
                forcePathStyle: true
                sslEnabled: true
                caCertificate: |
                    -----BEGIN CERTIFICATE-----
                    MIICGTCCAZ+gAwIBAgIQCeCTZaz32ci5PhwLBCou8zAKBggqhkjOPQQDAzBOMQs
                    ...
                    DXZDjC5Ty3zfDBeWUA==
                    -----END CERTIFICATE-----

Development

Tests

Run the file-assets tests

Requirements:

  • docker - A MinIO container will be created using Docker
yarn test

Build

Build a compiled asset bundle to deploy to a teraslice cluster.

Install Teraslice CLI:

yarn global add teraslice-cli
teraslice-cli assets build

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT licensed.

About

Teraslice processors for working with data stored in files on disk, S3 or HDFS.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 12