Skip to content

Conversation

@jorgee
Copy link
Contributor

@jorgee jorgee commented Nov 11, 2025

close #4732

Depite public s3 buckets can be accessed without s3 credentials, the access to this buckets using the AWS SDK or CLI can fail if we get the credentials from instance or job roles that only allow to access a certain private s3 buckets. It produces errors when running pipelines combining public s3 buckets with private buckets.

The aws.client.anonymous option does not solve the issue by two reasons:
1- It only aplies to SDK actions at head node, and some stage-in operations are happening at the AWS Batch job that uses the AWS CLI.
2- It is aplied to all the clients. If applied the head job cannot access private buckets.

This PR implements a fallback mechanism for S3 download operations in the CLI and SDK.

  • In the case of the task stage in with CLI, Nexflow tries first with normal call and if fails by AccessDenied or Forbidden it tries adding --no-sign-request.
  • In the case of the s3 client, we are creating a client per bucket. When creates the client it tries to access the bucket with the credentials and if fails it tries to access anonymously. If it's able to access with anonymous credentials, it configures the bucket client with anonymous credentials.

I think this approach can deprecate the aws.client.anonymous flag

@jorgee jorgee requested a review from bentsherman November 11, 2025 12:44
@netlify
Copy link

netlify bot commented Nov 11, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 997ad7f
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69132fa271a86500084024c0

echo "$output"
return 1
fi
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres' no way to avoid the fallback logic and make this predictable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is tricky, the only way to check if it is public in this condition is just try to access. We could reduce the number of fallbacks by using the try we do in the ls for the cp, so if the ls has worked with --no-sign-request, we will use it in the cp. It will be a complex code but I will just fallback once per file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be the alternative only falling back for ls

      nxf_s3_download() {
            local source=\$1
            local target=\$2
            local file_name=\$(basename \$1)
            local opts=(--only-show-errors)
            local ls_output
            if ! ls_output=\$($cli s3 ls \$source 2>&1); then
                if echo "\$ls_output" | grep -Eq "(AccessDenied|Forbidden|403)"; then
                    echo "Access denied, retrying unsigned request..."
                    if ! ls_output=\$($cli s3 ls --no-sign-request \$source 2>&1); then
                        echo \$ls_output
                        return 1
                    else
                        opts+=(--no-sign-request)
                    fi
                else
                    echo \$ls_output
                    return 1
                fi
            fi   
                    
            local is_dir=\$(echo \$ls_output | grep -F "PRE \${file_name}/" -c)
            
            if [[ \$is_dir == 1 ]]; then
                opts+=(--recursive)
            fi
            $cli s3 cp "\${opts[@]}" "\$source" "\$target"
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Input data from misconfigured public S3 buckets when using AWS credentials

4 participants