Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new module: FASTQDL #7506

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

new module: FASTQDL #7506

wants to merge 10 commits into from

Conversation

camlloyd
Copy link
Contributor

@camlloyd camlloyd commented Feb 20, 2025

PR checklist

Closes #7505 by adding new module: FASTQDL

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@camlloyd camlloyd marked this pull request as ready for review February 26, 2025 19:44
Copy link
Contributor

@SPPearce SPPearce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is generally fine, but the linting doesn't currently cope with using pip. I'm not sure when this will be resolved...

@SPPearce
Copy link
Contributor

SPPearce commented Mar 3, 2025

See here for the tools linting issue

@camlloyd
Copy link
Contributor Author

camlloyd commented Mar 3, 2025

See here for the tools linting issue

One for the March hackathon!

Copy link
Contributor

@famosab famosab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments for the implementation of this module :)

"""

mkdir ${prefix}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be a bit more specific:

Suggested change
echo "" | gzip > ${accession}.fastq.gz
echo "" | gzip > ${accession}_1.fastq.gz
echo "" | gzip > ${accession}_2.fastq.gz
touch ${prefix}-run-info.tsv

tuple val(meta), val(accession)

output:
tuple val(meta), path("test"), emit: db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta), path("test"), emit: db
tuple val(meta), path("*.fastq.gz"), emit: fastq
tuple val(meta), path("*-run-info.tsv"), emit: runinfo
tuple val(meta), path("*-run-mergers.tsv"), emit: runmergers, optional: true

I would update this to emit each file that we expect (might be a bit complicated now but I think this will makle handling everything in the pipeline easier).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing this here we also need to adjust the meta.yml :)

$args \\
--accession $accession \\
--cpus $task.cpus \\
--outdir ${prefix}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--outdir ${prefix}
--outdir .

def prefix = task.ext.prefix ?: "${meta.id}"
"""

mkdir ${prefix}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mkdir ${prefix}

def prefix = task.ext.prefix ?: "${meta.id}"
"""
fastq-dl \\
$args \\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$args \\
$args \\
-- prefix ${prefix} \\

Copy link
Contributor

@famosab famosab Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run you used belongs to Experiment ERX4876079
BioSample: SAMEA7787147
Study / Project: PRJEB37886

But I downloading a whole project or Sample will definetly be too many files :D

Maybe we can also test to use the Experiment Accession and then put a comment in the module that this is only tested for Runs & Experiments

@famosab
Copy link
Contributor

famosab commented Mar 6, 2025

(E|D|S)RR[0-9]{6,}: Run accession
(E|D|S)RX[0-9]{6,}: Experiment accession
(E|D|S)RS[0-9]{6,}: Sample accession
(E|D|S)RP[0-9]{6,}: Study accession

E = Ena, S = NCBI, D= DDBJ

Copy link
Contributor

@famosab famosab Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run you used belongs to Experiment ERX4876079
BioSample: SAMEA7787147
Study / Project: PRJEB37886

But I downloading a whole project or Sample will definetly be too many files :D

Maybe we can also test to use the Experiment Accession and then put a comment in the module that this is only tested for Runs & Experiments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

new module: FASTQDL
3 participants