-
Notifications
You must be signed in to change notification settings - Fork 25
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* use dev version of pygscatalog use dev version of pgscatalog-utils * fix versions * update images * Add match error explanation to docs (#311) * add match explanation * Update match.rst Some edits --------- Co-authored-by: Sam Lambert <[email protected]> * Explicitly declare plugins in nextflow.config (#315) * declare plugins explicitly * add provenance to pipeline_info output * update citation string * remove deprecated parameter * nf-schema migration updates * fix validation parameter case * update help message to use nf-schema * update offline instructions with plugin information * fix typo * bump fraposa_pgsc version * disable nf-schema validateparameters() for now * update conda environments * update pyyaml environment * bump pgscatalog-utils to v1.1.1 * bump version to beta * drop defaults channel from conda environment * bump pgscatalog-utils version * trigger action --------- Co-authored-by: Sam Lambert <[email protected]>
- Loading branch information
Showing
26 changed files
with
192 additions
and
103 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,5 +7,6 @@ Explanation | |
:maxdepth: 1 | ||
|
||
output | ||
match | ||
geneticancestry | ||
plink2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
.. _matchrates: | ||
|
||
Why do I get match rate errors? | ||
=============================== | ||
|
||
When you're running the PGS Catalog Calculator you might see errors like: | ||
|
||
.. code-block:: console | ||
pgscatalog.core.lib.pgsexceptions.ZeroMatchesError: All scores fail to meet match threshold 0.75 | ||
You might also see some scoring files in the report are coloured red, and are excluded from the output. | ||
|
||
By default pgsc_calc will continue calculating if at least one score passes the **match rate threshold**, which is controlled by the ``--min_overlap`` parameter. | ||
|
||
The default parameter is 0.75, this was chosen because on our experiences applying PGS to new cohorts where most scores will score better than this threshold. | ||
|
||
If scores match your target genome poorly it's typically because a problem with input data (target genomes or scoring files). | ||
|
||
What is matching? | ||
----------------- | ||
|
||
The calculator carefully checks that variants (rows) in a scoring file are present in your target genomes. | ||
|
||
The matching procedure `is described in the preprint supplement <https://www.medrxiv.org/content/10.1101/2024.05.29.24307783v1.supplementary-material>`_. | ||
|
||
The matching procedure never makes any changes to target genome data and only seeks to match variants in the scoring file to the genome. | ||
|
||
Adjusting ``--min_overlap`` is a bad idea | ||
------------------------------------------ | ||
|
||
The aim of the PGS Catalog Calculator is to faithfully recalculate scores submitted by authors to the PGS Catalog on new target genomes. | ||
|
||
If few variants in a published scoring file are present in a target genome, then the calculated score isn't a good representation of the original published score. | ||
|
||
When you evaluate the predictive performance of a score with low match rates it will be less likely to reproduce the metrics reported in the PGS Catalog. | ||
|
||
If you reduce ``--min_overlap`` then the calculator will output scores calculated with the remaining variants, **but these scores may not be representative of the original data submitted to the PGS Catalog.** | ||
|
||
Are your target genomes imputed? Are they WGS? | ||
---------------------------------------------- | ||
|
||
The calculator assumes that target genotyping data were called from a limited number of markers on a genotyping array and imputed using a larger reference panel to increase variant density. | ||
|
||
WGS data are not natively supported by the calculator (as homozygous REF sites are excluded from the variant sites). However, it's `possible to create compatible gVCFs from WGS data. <https://github.com/PGScatalog/pgsc_calc/discussions/123#discussioncomment-6469422>`_ | ||
|
||
In the future we plan to improve support for WGS. | ||
|
||
Did you set the correct genome build? | ||
------------------------------------- | ||
|
||
The calculator will automatically grab scoring files in the correct genome build from the PGS Catalog. If match rates are low it may be because you have specified the wrong genome build. If you're using custom scoring files and the match rate is low it is possible that the `--liftover` command may have been omitted. | ||
|
||
I'm still getting match rate errors. How do I figure out what's wrong? | ||
---------------------------------------------------------------------- | ||
|
||
Problems with matching are normally because of problems with input data rather than the matching procedure. | ||
|
||
If you're trying to reproduce a specific score and are experiencing problems, then some manual work is required. | ||
|
||
Try checking the full variant matching log to see which variants are missing, which will be present in the work directory reported in the Nextflow error. | ||
|
||
It can be a good idea to manually search your target genotypes for missing variants to see what's happening. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
name: fraposa | ||
name: fraposa-pgsc | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
dependencies: | ||
- python=3.10 | ||
- pip | ||
- pip: | ||
- fraposa-pgsc==0.1.0 | ||
- fraposa-pgsc=0.1.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
name: pgscatalog_utils | ||
name: pgscatalog-utils | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
dependencies: | ||
- python=3.11 | ||
- pip | ||
- pip: | ||
- pgscatalog_utils==1.0.2 | ||
- pgscatalog.utils=1.1.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
name: pyyaml | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
- defaults | ||
dependencies: | ||
- python=3.10 | ||
- pip | ||
- pip: | ||
- pyyaml==6.0 | ||
- pyyaml=6.0.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
name: report | ||
channels: | ||
- conda-forge | ||
- defaults | ||
- bioconda | ||
dependencies: | ||
- r-jsonlite | ||
- r-dplyr | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
name: zstd | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
dependencies: | ||
- zstd=1.4.8 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,7 +44,6 @@ params { | |
normalization_method = "empirical mean mean+var" | ||
n_normalization = 4 | ||
|
||
|
||
// compatibility params | ||
liftover = false | ||
target_build = null | ||
|
@@ -54,11 +53,9 @@ params { | |
min_overlap = 0.75 | ||
keep_ambiguous = false | ||
keep_multiallelic = false | ||
fast_match = false | ||
copy_genomes = false | ||
genotypes_cache = null | ||
|
||
|
||
// Debug params | ||
only_bootstrap = false | ||
only_input = false | ||
|
@@ -68,9 +65,6 @@ params { | |
only_score = false | ||
skip_ancestry = true | ||
|
||
// deprecated params | ||
platform = null | ||
|
||
// Boilerplate options | ||
outdir = "$launchDir/results" | ||
publish_dir_mode = 'copy' | ||
|
@@ -96,14 +90,6 @@ params { | |
max_memory = '128.GB' | ||
max_cpus = 16 | ||
max_time = '240.h' | ||
|
||
// Schema validation default options | ||
validationFailUnrecognisedParams = false | ||
validationLenientMode = false | ||
validationSchemaIgnoreParams = 'genomes,igenomes_base,platform,only_bootstrap,only_input,only_compatible,only_match,only_score' | ||
validationShowHiddenParams = false | ||
validate_params = true | ||
|
||
} | ||
|
||
// Load base.config by default for all pipelines | ||
|
@@ -270,7 +256,7 @@ manifest { | |
description = 'The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation' | ||
mainScript = 'main.nf' | ||
nextflowVersion = '>=23.10.0' | ||
version = '2.0.0-alpha.6' | ||
version = '2.0.0-beta' | ||
} | ||
|
||
// Load modules.config for DSL2 module specific options | ||
|
@@ -308,3 +294,27 @@ def check_max(obj, type) { | |
} | ||
} | ||
} | ||
|
||
plugins { | ||
id '[email protected]' // validation of parameters | ||
id '[email protected]' // workflow provenance | ||
} | ||
|
||
prov { | ||
enabled = true | ||
formats { | ||
bco { | ||
file = "${params.outdir}/pipeline_info/manifest_${trace_timestamp}.bco.json" | ||
} | ||
} | ||
} | ||
|
||
validation { | ||
// Schema validation default options | ||
monochromeLogs = params.monochrome_logs | ||
failUnrecognisedParams = false | ||
lenientMode = false | ||
defaultIgnoreParams = ['platform'] | ||
ignoreParams = ['genomes','igenomes_base',',only_bootstrap','only_input','only_compatible','only_match','only_score'] | ||
showHiddenParams = false | ||
} |
Oops, something went wrong.