Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue #5612 - boolean param to optionally log skipped fastqs #5638

Conversation

glichtenstein
Copy link
Contributor

@glichtenstein glichtenstein commented May 21, 2024

skipped_fastqs.log

PR checklist

Closes Issue #5612
nf-core/demultiplex counterpart: nf-core/demultiplex#190

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

Description

This pull request adds the optional parameter --log_skipped_fastqs to the demultiplex pipeline. This change was necessary to address the issue with logging skipped FASTQ files and preventing the AWS S3 issue.

Patch Details

The changes in this PR were applied using a patch from a specific commit in the nf-core/modules repository. The commit details are as follows:

The patch was applied to ensure that the demultiplex repository includes the latest changes from nf-core/modules necessary for this feature.

Testing

true
nextflow run glichtenstein/demultiplex -r 7d9538e --input nf-core-samplesheet.csv --outdir /data/scratch/iseq-DI/output -profile docker -work-dir /data/scratch/iseq-DI/workdir -resume --skip_tools fastp --log_empty_fastqs=true
false
nextflow run glichtenstein/demultiplex -r 7d9538e --input nf-core-samplesheet.csv --outdir /data/scratch/iseq-DI/output -profile docker -work-dir /data/scratch/iseq-DI/workdir -resume --skip_tools fastp --log_empty_fastqs=false

executor > local (5)
[1a/e1ae22] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:BCLCONVERT [100%] 1 of 1 ✔
[5f/05036f] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:FALCO (iseq-DI_S1_L001) [100%] 1 of 1 ✔
[84/64923f] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:MD5SUM (iseq-DI_S1_L001) [100%] 2 of 2 ✔
[11/351698] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:MULTIQC [100%] 1 of 1 ✔
-[nf-core/demultiplex] Pipeline completed successfully-

cat output/empty_fastqs.log
Empty or invalid FASTQ file: /data/scratch/iseq-DI/workdir/1a/e1ae22.../Bad-iseq-DI_S2_L001_R1_001.fastq.gz
Empty or invalid FASTQ file: /data/scratch/iseq-DI/workdir/1a/e1ae22.../Bad-iseq-DI_S2_L001_R2_001.fastq.gz

Dataset used

  • A public BCL from 10X Genomics. I will add it to nf-core tests datasets when possible.
mkdir -p iseq-DI; cd iseq-DI
wget https://cf.10xgenomics.com/supp/spatial-exp/demultiplexing/iseq-DI.tar.gz
wget https://cf.10xgenomics.com/supp/spatial-exp/demultiplexing/bcl_convert_samplesheet.csv

SampleSheet was modified to include bad barcodes.

nf-core_samplesheet.csv

id,samplesheet,lane,flowcell,per_flowcell_manifest
AA0BCDFGH,/data/scratch/iseq-DI/bcl_convert_samplesheet.csv,1,/data/scratch/iseq-DI/iseq-DI.tar.gz

bcl_convert_samplesheet.csv

[Header],,,
FileFormatVersion,2,,
,,,
[BCLConvert_Settings],,,
CreateFastqForIndexReads,0,,
,,,
[BCLConvert_Data],,,
Lane,Sample_ID,index,index2
1,iseq-DI,GTAACATGCG,AGGTAACACT
1,Bad-iseq-DI,TATACAGATA,GATACAGATA

@glichtenstein glichtenstein changed the title boolean parameter --log_skipped_fastqs added, filename changed to ski… optional boolean parameter to --log_skipped_fastqs added May 21, 2024
@glichtenstein glichtenstein changed the title optional boolean parameter to --log_skipped_fastqs added Boolean parameter to --log_skipped_fastqs May 21, 2024
@glichtenstein glichtenstein changed the title Boolean parameter to --log_skipped_fastqs fix issue #5612 - boolean param to optionally log skipped fastqs May 21, 2024
appendToLogFile(
"Empty or invalid FASTQ file: ${fastq}",
logFile
if (params.log_skipped_fastqs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (params.log_skipped_fastqs) {
if (log_skipped_fastqs) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, when I remove params. I get ERROR ~ No such variable: log_skipped_fastqs

@@ -8,12 +8,13 @@ include { BCLCONVERT } from "../../../modules/nf-core/bclconvert/main"
include { BCL2FASTQ } from "../../../modules/nf-core/bcl2fastq/main"

// Define the log file path before the workflow starts
def logFile = new File("${params.outdir}/invalid_fastqs.log")
def logFile = new File("${params.outdir}/skipped_fastqs.log")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we generally try to avoid using params in subworkflows 😦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, if not it was ending in some temp workdir, this way it is saved inside the outdir provided in the nextflow run --outdir /some/path/outdir, shall I remove it and leave it as def logFile = new File("skipped_fastqs.log")?

@glichtenstein glichtenstein force-pushed the boolean-parameter-to--log_skipped_fastqs branch from 90fe8e7 to 8e87edb Compare May 24, 2024 16:29
@k1sauce k1sauce mentioned this pull request May 29, 2024
@glichtenstein
Copy link
Contributor Author

superseded by #5720

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants