-
Notifications
You must be signed in to change notification settings - Fork 744
Description
New feature
Usage scenario
Many commonly used public datasets are stored on public S3 buckets which are configured in a way that only allows access using anonymous AWS credentials (e.g. AWS iGenomes). AWS supports anonymous access to S3 when AWS credentials are configured using the --no-sign-request AWS CLI flag.
On the other hand, when running on AWS Batch, AWS credentials are required to access all AWS resources & services needed to run on a private AWS Batch cluster.
Nextflow allows configuring a single set of AWS credentials (or role) to be used during runtime, and accesses all S3 URLs provided to channels the same way, meaning you cannot use public datasets as mentioned above when running on your own private AWS Batch cluster, and also not in conjunction with datasets stored in private S3 buckets.
Suggest implementation
My suggestion is to add support for a new option flag to the fromPath channel factory (and maybe others as well), which when set to true, given an S3 URL as the path, uses the --no-sign-request flag when generating the AWS CLI command that pulls the data from the given S3 bucket.
This way the user will have the granularity required in order to access both public & private S3 buckets during the same run, while running on a private AWS Batch cluster as well, regardless of how the public S3 buckets are configured.