Add support for variant scoring by MIVMIR, GICAM models by torbjorgen · Pull Request #812 · nf-core/raredisease

torbjorgen · 2026-04-16T11:44:29Z

Add support for SNV variant ranking using MIVMIR, GICAM models.

TODOs

Open access docker registry
Unit test for MIVMIR, GICAM
- Unit test
- Merge test data Add unit test data for mivmir, gicam modules test-datasets#2002
- Fixup test data URL
~~Fix rank_variants subworkflow tests (failing due to no data in Genmod annotate VCF)~~

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
If you've added a new tool - have you followed the pipeline conventions in the contribution docs
If necessary, also make a PR on the nf-core/raredisease branch on the nf-core/test-datasets repository.
Make sure your code lints (nf-core pipelines lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
~~Ensure the test suite passes (nextflow run . -profile test_singleton,docker --outdir <OUTDIR>).~~
ERRORS in other modules prevents this

* --skip_tools (fastp,gens,peddy,germlinecnvcaller,eklipse,ngsbits): "fastp,gens,peddy,germlinecnvcaller,eklipse,ngsbits" does not match regular expression [^((fastp|gens|germlinecnvcaller|peddy|smncopynumbercaller|vcf2cytosure|fastqc|ngsbits)?,?)*(?<!,)$]

Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
Usage Documentation in docs/usage.md is updated.
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

nf-core-bot · 2026-04-16T11:45:06Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

torbjorgen · 2026-04-16T14:23:07Z

I'd appreciate your input here @ramprasadn thanks : )

torbjorgen · 2026-04-17T11:16:15Z

@ramprasadn @peterpru Appreciate your review here. Thanks : )

Signed-off-by: Tor Björgen <tor.bjorgen@scilifelab.se>

fellen31

Nice! Some comments/questions.

fellen31 · 2026-04-24T09:48:43Z

Is this something that will eventually make its way into other pipelines as well? If so I think it would be beneficial to put rdds on pypi, make a seqera container, and add it as nf-core modules so they can be shared between pipelines.

Good point! I think this will depend on the degree of adoption/ interest in this tool going forward.
I'll make sure to consider this if there's a need. 👍

I was thinking if this should be included in Nallo, for example.

I'd recommend building a separate SV model for Nallo purposes. We can talk about the overlap in current design and Nallo to see what we can put together.

EDIT: for clarification purposes, if we'd be running this for SNVs and the annotations are the same, we should be OK in Nallo as well.

fellen31 · 2026-04-24T09:51:32Z

+    beforeScript "mkdir ${task.workDir}/rdds-tmp"
+    afterScript "rm -r ${task.workDir}/rdds-tmp"
+    containerOptions {[
+        workflow.containerEngine.equals("singularity") ? "--bind ${task.workDir}/rdds-tmp:/rdds/tmp" : "",
+        workflow.containerEngine.equals("docker") ? "--tmpfs /rdds/tmp": "",
+        ""
+    ].minus("").join(" ")}


Could a tmp folder creation not be handled within rdds?

The reason for designing it this way is because of RAM consumption reasons. When I do multiprocessing in the container, I often follow the pattern to write data to disk and then input it to sharded processes. If I'd be doing this in a RAM native /tmp directory it would consume container RAM instead eventually causing OOM. This scales better because it's not dependent on Nextflow resource configs.

Not sure I completely follow, I haven't used --tmpfs myself. But we would want Nextflow to pass resource allocations to e.g. SLURM so it can give OOM errors if a process exceeds those resource allocations. Do you mean that the folder creation needs to happen before the container is invoked, so that you can use --tmpfs to write the temporary files to disk instead to RAM? I don't see how that is different from creating temporary files directly from the python code, but again, python is not my strong side.

Sorry, there are two reasons for the above code snippet.

In the production setting (many available CPU cores), the above note applies to Singularity hosted runners.
This is because we're using bind option and the way singularity handles the ownership, file permissions.

The docker config, using --tmpfs flag is a mere workaround to allow running this in a dockerized environment for testing/CI purposes. Volume mounting in docker is a pain, and has issues with file ownership, permissions causing a headache when passing data in and out. I'd expect the dockerized environment to have potential issues with OOM if running on a large case, but this allows for simple testing for now.

fellen31 · 2026-04-24T10:00:50Z


+        // Run MIVMIR - GICAM scoring (not supported for MT SNVs and SVs)
+        if (rank_with_mivmir_gicam) {
+            ch_genmod_gicam_score_config = channel.fromPath("$projectDir/modules/local/gicam/rank_model_genmod_gicam.ini", checkIfExists: true).collect()


This file should probably be input the same as all other reference files, see the general genmod score config (ch_score_config) for an example).

No this is by design. The genmod scoring config is integral to the GICAM optimisation process and cannot be changed without retraining gicam.

I'll add a note in the source on this to be more explicit.

Ok. I still think don't think it should be set in the subworkflow, but rather likech_cadd_header which also is not meant to be changed:

raredisease/main.nf

Line 259 in 8ee3032

ch_cadd_header = channel.fromPath("$projectDir/assets/cadd_to_vcf_header_-1.0-.txt", checkIfExists: true).collect()

fellen31 · 2026-04-24T10:01:49Z

+            GICAM_INFER(MIVMIR_INFER.out.vcf)
+            TABIX_BGZIPTABIX_GICAM(GICAM_INFER.out.vcf)


Always nice if the tool/module can read/output a compressed VCF directly, to use less temporary space: https://nf-co.re/docs/specifications/components/modules/general#compression-of-input-and-output-files

fellen31 · 2026-04-24T10:02:41Z

+            BCFTOOLS_MERGE_GENMOD_GICAM(ch_merge_genmod_gicam)
+            TABIX_BGZIP_GENMOD_GICAM(BCFTOOLS_MERGE_GENMOD_GICAM.out.vcf).output


bcftools merge can create an index with --write-index=tbi and output a compressed VCF, no need for tabix + bgzip.

torbjorgen force-pushed the variant-scoring-by-mivmir-gicam branch 5 times, most recently from 262d52c to 543604f Compare April 17, 2026 09:20

torbjorgen marked this pull request as ready for review April 17, 2026 11:13

torbjorgen requested review from peterpru and ramprasadn April 20, 2026 08:59

torbjorgen linked an issue Apr 20, 2026 that may be closed by this pull request

Add MIVMIR, GICAM models for SNV ranking #816

Open

torbjorgen force-pushed the variant-scoring-by-mivmir-gicam branch 5 times, most recently from 384856c to 5d2f14b Compare April 23, 2026 07:36

Add support for variant scoring by MIVMIR, GICAM models

f986c17

Signed-off-by: Tor Björgen <tor.bjorgen@scilifelab.se>

torbjorgen force-pushed the variant-scoring-by-mivmir-gicam branch from 5d2f14b to f986c17 Compare April 23, 2026 07:39

fellen31 reviewed Apr 24, 2026

View reviewed changes

		GICAM_INFER(MIVMIR_INFER.out.vcf)
		TABIX_BGZIPTABIX_GICAM(GICAM_INFER.out.vcf)

		BCFTOOLS_MERGE_GENMOD_GICAM(ch_merge_genmod_gicam)
		TABIX_BGZIP_GENMOD_GICAM(BCFTOOLS_MERGE_GENMOD_GICAM.out.vcf).output

Conversation

torbjorgen commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODOs

PR checklist

Uh oh!

nf-core-bot commented Apr 16, 2026

Uh oh!

torbjorgen commented Apr 16, 2026

Uh oh!

torbjorgen commented Apr 17, 2026

Uh oh!

fellen31 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

torbjorgen Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

torbjorgen commented Apr 16, 2026 •

edited

Loading

torbjorgen Apr 24, 2026 •

edited

Loading