Add support for variant scoring by MIVMIR, GICAM models#812
Add support for variant scoring by MIVMIR, GICAM models#812torbjorgen wants to merge 1 commit intonf-core:devfrom
Conversation
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.5.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
|
I'd appreciate your input here @ramprasadn thanks : ) |
262d52c to
543604f
Compare
|
@ramprasadn @peterpru Appreciate your review here. Thanks : ) |
384856c to
5d2f14b
Compare
Signed-off-by: Tor Björgen <tor.bjorgen@scilifelab.se>
5d2f14b to
f986c17
Compare
fellen31
left a comment
There was a problem hiding this comment.
Nice! Some comments/questions.
There was a problem hiding this comment.
Is this something that will eventually make its way into other pipelines as well? If so I think it would be beneficial to put rdds on pypi, make a seqera container, and add it as nf-core modules so they can be shared between pipelines.
There was a problem hiding this comment.
Good point! I think this will depend on the degree of adoption/ interest in this tool going forward.
I'll make sure to consider this if there's a need. 👍
There was a problem hiding this comment.
I was thinking if this should be included in Nallo, for example.
There was a problem hiding this comment.
I'd recommend building a separate SV model for Nallo purposes. We can talk about the overlap in current design and Nallo to see what we can put together.
EDIT: for clarification purposes, if we'd be running this for SNVs and the annotations are the same, we should be OK in Nallo as well.
| beforeScript "mkdir ${task.workDir}/rdds-tmp" | ||
| afterScript "rm -r ${task.workDir}/rdds-tmp" | ||
| containerOptions {[ | ||
| workflow.containerEngine.equals("singularity") ? "--bind ${task.workDir}/rdds-tmp:/rdds/tmp" : "", | ||
| workflow.containerEngine.equals("docker") ? "--tmpfs /rdds/tmp": "", | ||
| "" | ||
| ].minus("").join(" ")} |
There was a problem hiding this comment.
Could a tmp folder creation not be handled within rdds?
There was a problem hiding this comment.
The reason for designing it this way is because of RAM consumption reasons. When I do multiprocessing in the container, I often follow the pattern to write data to disk and then input it to sharded processes. If I'd be doing this in a RAM native /tmp directory it would consume container RAM instead eventually causing OOM. This scales better because it's not dependent on Nextflow resource configs.
There was a problem hiding this comment.
Not sure I completely follow, I haven't used --tmpfs myself. But we would want Nextflow to pass resource allocations to e.g. SLURM so it can give OOM errors if a process exceeds those resource allocations. Do you mean that the folder creation needs to happen before the container is invoked, so that you can use --tmpfs to write the temporary files to disk instead to RAM? I don't see how that is different from creating temporary files directly from the python code, but again, python is not my strong side.
There was a problem hiding this comment.
Sorry, there are two reasons for the above code snippet.
-
In the production setting (many available CPU cores), the above note applies to Singularity hosted runners.
This is because we're usingbindoption and the way singularity handles the ownership, file permissions. -
The docker config, using
--tmpfsflag is a mere workaround to allow running this in a dockerized environment for testing/CI purposes. Volume mounting in docker is a pain, and has issues with file ownership, permissions causing a headache when passing data in and out. I'd expect the dockerized environment to have potential issues with OOM if running on a large case, but this allows for simple testing for now.
|
|
||
| // Run MIVMIR - GICAM scoring (not supported for MT SNVs and SVs) | ||
| if (rank_with_mivmir_gicam) { | ||
| ch_genmod_gicam_score_config = channel.fromPath("$projectDir/modules/local/gicam/rank_model_genmod_gicam.ini", checkIfExists: true).collect() |
There was a problem hiding this comment.
This file should probably be input the same as all other reference files, see the general genmod score config (ch_score_config) for an example).
There was a problem hiding this comment.
No this is by design. The genmod scoring config is integral to the GICAM optimisation process and cannot be changed without retraining gicam.
I'll add a note in the source on this to be more explicit.
There was a problem hiding this comment.
Ok. I still think don't think it should be set in the subworkflow, but rather likech_cadd_header which also is not meant to be changed:
Line 259 in 8ee3032
| GICAM_INFER(MIVMIR_INFER.out.vcf) | ||
| TABIX_BGZIPTABIX_GICAM(GICAM_INFER.out.vcf) |
There was a problem hiding this comment.
Always nice if the tool/module can read/output a compressed VCF directly, to use less temporary space: https://nf-co.re/docs/specifications/components/modules/general#compression-of-input-and-output-files
| BCFTOOLS_MERGE_GENMOD_GICAM(ch_merge_genmod_gicam) | ||
| TABIX_BGZIP_GENMOD_GICAM(BCFTOOLS_MERGE_GENMOD_GICAM.out.vcf).output |
There was a problem hiding this comment.
bcftools merge can create an index with --write-index=tbi and output a compressed VCF, no need for tabix + bgzip.
Add support for SNV variant ranking using MIVMIR, GICAM models.
TODOs
Fix rank_variants subworkflow tests (failing due to no data in Genmod annotate VCF)PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).Ensure the test suite passes (nextflow run . -profile test_singleton,docker --outdir <OUTDIR>).ERRORS in other modules prevents this
nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).