Skip to content

Staging script not found when running on Google Batch #5888

@ejseqera

Description

@ejseqera

Bug report

When running Nextflow on Google Batch with gcsfuse mounted directories, and attemping to stage in many input files for a task, the task will fail with the following error:

Error executing process > 'COPY_FILES'

Caused by:
  No such file or directory: /mnt/disks/nf-tower-test-eu-1/scratch/1rnFXXfWvpW08n/f7/b2319081b9b7957733eebb846c0fd3/.command.stage

This appears to be related to the fix implemented in #4282 and reported in #4279, which was intended to disable the separate staging script for remote object storage entirely. However, the fix doesn't properly work on Google Batch.

Steps to reproduce the problem

  1. Run a Nextflow pipeline on GCP with a task that stages in many input files (e.g., ~1000 or more files
# create random files
for ((n=0;n<6000;n++)); do touch dummy_file_${n}.txt; done

# sync to a GCS bucket
gsutil -m rsync -r ./ gs://nf-tower-test-eu-1/esha/many_files_test/

One process workflow:

process COPY_FILES {
    input:
        path files
    output:
        path("outdir", type: 'dir')
    script:
    """
    mkdir -p outdir
    for f in ${files}; do
        cp \$f outdir/
    done
    """
}

workflow {
    Channel.fromPath(params.input).collect()
    | COPY_FILES()
}
  1. Instead of staging in the directory, stage in each individual file which will result in a large .command.run exceeding 1MB.
  2. The task fails because it tries to access the .command.stage file which isn't properly created or accessible

Environment

  • Nextflow version: 24.10.5
  • Seqera Platform Cloud Version 24.3.0-cycle4_803f393

Additional context

(Add any other context about the problem here)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions