Skip to content

Add support for variant scoring by MIVMIR, GICAM models#812

Open
torbjorgen wants to merge 1 commit intonf-core:devfrom
torbjorgen:variant-scoring-by-mivmir-gicam
Open

Add support for variant scoring by MIVMIR, GICAM models#812
torbjorgen wants to merge 1 commit intonf-core:devfrom
torbjorgen:variant-scoring-by-mivmir-gicam

Conversation

@torbjorgen
Copy link
Copy Markdown

@torbjorgen torbjorgen commented Apr 16, 2026

Add support for SNV variant ranking using MIVMIR, GICAM models.

TODOs

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/raredisease branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Ensure the test suite passes (nextflow run . -profile test_singleton,docker --outdir <OUTDIR>).
    ERRORS in other modules prevents this
* --skip_tools (fastp,gens,peddy,germlinecnvcaller,eklipse,ngsbits): "fastp,gens,peddy,germlinecnvcaller,eklipse,ngsbits" does not match regular expression [^((fastp|gens|germlinecnvcaller|peddy|smncopynumbercaller|vcf2cytosure|fastqc|ngsbits)?,?)*(?<!,)$]
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@torbjorgen
Copy link
Copy Markdown
Author

I'd appreciate your input here @ramprasadn thanks : )

@torbjorgen torbjorgen force-pushed the variant-scoring-by-mivmir-gicam branch 5 times, most recently from 262d52c to 543604f Compare April 17, 2026 09:20
@torbjorgen torbjorgen marked this pull request as ready for review April 17, 2026 11:13
@torbjorgen
Copy link
Copy Markdown
Author

@ramprasadn @peterpru Appreciate your review here. Thanks : )

@torbjorgen torbjorgen linked an issue Apr 20, 2026 that may be closed by this pull request
@torbjorgen torbjorgen force-pushed the variant-scoring-by-mivmir-gicam branch 5 times, most recently from 384856c to 5d2f14b Compare April 23, 2026 07:36
Signed-off-by: Tor Björgen <tor.bjorgen@scilifelab.se>
@torbjorgen torbjorgen force-pushed the variant-scoring-by-mivmir-gicam branch from 5d2f14b to f986c17 Compare April 23, 2026 07:39
Copy link
Copy Markdown
Contributor

@fellen31 fellen31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Some comments/questions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that will eventually make its way into other pipelines as well? If so I think it would be beneficial to put rdds on pypi, make a seqera container, and add it as nf-core modules so they can be shared between pipelines.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I think this will depend on the degree of adoption/ interest in this tool going forward.
I'll make sure to consider this if there's a need. 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if this should be included in Nallo, for example.

Copy link
Copy Markdown
Author

@torbjorgen torbjorgen Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend building a separate SV model for Nallo purposes. We can talk about the overlap in current design and Nallo to see what we can put together.

EDIT: for clarification purposes, if we'd be running this for SNVs and the annotations are the same, we should be OK in Nallo as well.

Comment on lines +9 to +15
beforeScript "mkdir ${task.workDir}/rdds-tmp"
afterScript "rm -r ${task.workDir}/rdds-tmp"
containerOptions {[
workflow.containerEngine.equals("singularity") ? "--bind ${task.workDir}/rdds-tmp:/rdds/tmp" : "",
workflow.containerEngine.equals("docker") ? "--tmpfs /rdds/tmp": "",
""
].minus("").join(" ")}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a tmp folder creation not be handled within rdds?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for designing it this way is because of RAM consumption reasons. When I do multiprocessing in the container, I often follow the pattern to write data to disk and then input it to sharded processes. If I'd be doing this in a RAM native /tmp directory it would consume container RAM instead eventually causing OOM. This scales better because it's not dependent on Nextflow resource configs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I completely follow, I haven't used --tmpfs myself. But we would want Nextflow to pass resource allocations to e.g. SLURM so it can give OOM errors if a process exceeds those resource allocations. Do you mean that the folder creation needs to happen before the container is invoked, so that you can use --tmpfs to write the temporary files to disk instead to RAM? I don't see how that is different from creating temporary files directly from the python code, but again, python is not my strong side.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, there are two reasons for the above code snippet.

  1. In the production setting (many available CPU cores), the above note applies to Singularity hosted runners.
    This is because we're using bind option and the way singularity handles the ownership, file permissions.

  2. The docker config, using --tmpfs flag is a mere workaround to allow running this in a dockerized environment for testing/CI purposes. Volume mounting in docker is a pain, and has issues with file ownership, permissions causing a headache when passing data in and out. I'd expect the dockerized environment to have potential issues with OOM if running on a large case, but this allows for simple testing for now.

Comment thread modules/local/gicam/main.nf
Comment thread modules/local/gicam/main.nf
Comment thread modules/local/gicam/main.nf
Comment thread modules/local/mivmir/meta.yml

// Run MIVMIR - GICAM scoring (not supported for MT SNVs and SVs)
if (rank_with_mivmir_gicam) {
ch_genmod_gicam_score_config = channel.fromPath("$projectDir/modules/local/gicam/rank_model_genmod_gicam.ini", checkIfExists: true).collect()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should probably be input the same as all other reference files, see the general genmod score config (ch_score_config) for an example).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is by design. The genmod scoring config is integral to the GICAM optimisation process and cannot be changed without retraining gicam.

I'll add a note in the source on this to be more explicit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I still think don't think it should be set in the subworkflow, but rather likech_cadd_header which also is not meant to be changed:

ch_cadd_header = channel.fromPath("$projectDir/assets/cadd_to_vcf_header_-1.0-.txt", checkIfExists: true).collect()

Comment on lines +44 to +45
GICAM_INFER(MIVMIR_INFER.out.vcf)
TABIX_BGZIPTABIX_GICAM(GICAM_INFER.out.vcf)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always nice if the tool/module can read/output a compressed VCF directly, to use less temporary space: https://nf-co.re/docs/specifications/components/modules/general#compression-of-input-and-output-files

Comment on lines +71 to +72
BCFTOOLS_MERGE_GENMOD_GICAM(ch_merge_genmod_gicam)
TABIX_BGZIP_GENMOD_GICAM(BCFTOOLS_MERGE_GENMOD_GICAM.out.vcf).output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bcftools merge can create an index with --write-index=tbi and output a compressed VCF, no need for tabix + bgzip.

Comment thread CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MIVMIR, GICAM models for SNV ranking

3 participants