DBSLMM

Overview

There are two versions of DBSLMM: the tuning version and the deterministic version. The tuning version examines three different heritability choices and requires a validation data to tune the heritability hyper-parameter. The deterministic version uses one heritability estimate and directly fit the model in the training data without a separate validation data. Both versions requires a reference data to compute the SNP correlation matrix.
For binary traits, especially for psychiatry diseases (i.e. major depression), DBSLMM usually outperforms some existing PGS construction methods.

Relative papers in my group

PGS tools

Method Paper: Yang S#, Zhou X. Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets. Am J Hum Genet. 2020 May 7;106(5):679-693.
Bechmarking Paper: Yang S#*, Zhou X. PGS-server: accuracy, robustness and transferability of polygenic score methods for biobank scale studies. Brief Bioinform. 2022 Mar 10;23(2):bbac039.
Database paper: Cao C, Zhang S, Wang J, Tian M, Ji X, Huang D*, Yang S*, Gu N. PGS-Depot: a comprehensive resource for polygenic scores constructed by summary statistics based methods. Nucleic Acids Res. 2024 Jan 5;52(D1):D963-D971.
Webserver paper: Yang S#*, Ye X#, Ji X#, Li Z, Tian M, Huang P, Cao C. PGSFusion streamlines polygenic score construction and epidemiological applications in biobank-scale cohorts. bioRxiv.

Applications

Psycho-metabolic nexus: Guo X, Feng Y, Ji X, Jia N, Mainaiti A, Lai J, Wang Z*, Yang S*, Hu S*. Shared genetic architecture and bidirectional clincial risks with psycho-metabolic nexus. eBioMedicine. (in press)
PTRS for Stroke: Cai X, Li H, Cao X, Ma X, Zhu W, Xu L, Yang S, Yu R, Huang P. Integrating transcriptomic and polygenic risk scores to enhance predictive accuracy for ischemic stroke subtypes. Hum Genet. 2024 Nov 18. doi: 10.1007/s00439-024-02717-7.

Update log

v1.0 User-friendly DBSLMM

We update the software/DBSLMM.R and software/TUNE.R.

integrates bigsnpr to estiamte the heritability, select large effect SNPs by clumping, and calculate the $\hat{y}$ for the validation set
provides 'map_hm3_plus.rds' and corresponding PLINK files for three ancestries from 1000 Genomes Project
fits the whole genomoe for three DBSLMM models with one R script
provides the online server PGSFsuion constructing PGS and performing epidemiological application in UKBB

v0.3 Online Server

provides the online version https://www.pgs-server.com/.

v0.3

fixes the bug of model fiting

v0.21

fits DBSLMM in different heritabiliies with one clumping
uses a script to make the usage of DBSLMM-tuning version more easier.

v0.2

validates the flods of heritability: 0.8, 1 and 1.2
fits the LMM, when the chromosome without any large effect SNPs
fits the external validation all by R code.

Tutorial for DBSLMM (v1.0)

In this version, we support DBSLMM and LMM models in automatic and tuning versions. The download link of dbslmm is <>.

Prepartion

download execution file dbslmm
download the reference panel from 1kg
ensure the summary statistics in GEMMA format

DBSLMM-auto

Summary_stat=/summary/statistics/in/GEMMA/fromat.txt
anc=EUR
BLOCK=${PACK_DIR}/block_data/${anc}
outpath=${DATA_DIR}output/
model=DBSLMM
Rscript ${PACK_DIR}/software/DBSLMM.R --summary ${Summary_stat} --dbslmm ${PACK_DIR}/dbslmm --type auto --model ${model} \
                                      --reference ${ref_panel} --block ${BLOCK} --N ${N} --outPath ${outpath}/

Tips

If you set model=LMM, we can fit DBSLMM-LMM model.
If the summary statistics is not EUR, you can set anc=AFR or anc=EAS.
If the trait is quantitative, you can set N=n_sample.
If the trait is binary, you can set N=${n_case},${n_control}. DBSLMM will use effective sample size to estimate heritability. You CAN also regard the binary trait as the quantitative trait.

DBSLMM-tuning

Summary_stat=/summary/statistics/in/GEMMA/fromat.txt
anc=EUR
BLOCK=${PACK_DIR}/block_data/${anc}
outpath=${DATA_DIR}output/
model=DBSLMM
Rscript ${PACK_DIR}/software/DBSLMM.R --summary ${Summary_stat} --dbslmm ${PACK_DIR}/dbslmm --type tuning --model DBSLMM
                                      --reference ${ref_panel} --block ${BLOCK} --N ${N} --outPath ${outpath}/ --h2f 0.8,1,1.2 		   
Summary_prefix=`echo "$Summary_stat" | awk -F'.txt' '{print $1}'`
Rscript ${PACK_DIR}/software/TUNE.R --dbslmm_eff ${Summary_prefix} --h2f 0.8,1,1.2 \
                                    --validation_g ${val_genotype} --validation_p ${val_phenotype}

Tips

h2f in DBSLMM.R and TUNE.R MUST be the same.
If you have covarites, you can specify cov flag.

Tutorial for external test only using summary statistics

Make SNP correlation matrix

We should use reference panel to construct SNP correlation matrix. You can treat 1000 Genome Project as reference panel. We also need the block information. The code is MAKELD.R.

mkld=/your/path/MAKELD.R
# construct SNP corrlation matrix for each chromosome
for chr in `seq 1 22`
do
bfile=/your/path/chr${chr}
blockfile=/your/path/chr${chr}.bed
plink=/your/path/plink-1.9
outpath=/your/path/outpath/
Rscript ${mkld} --bfile ${bfile} --blockfile ${blockfile} --plink ${plink} --outpath ${outpath} --chr ${chr}
done

Estimate R²

The format of extsumm is the same as that of LDSC.

SNP N Z A1 A2
rs1983865 253135 3.79310344827586 T C
rs1983864 251364 -4.51612903225806 T G
rs12411954 253213 0.413793103448276 T C
rs7077266 250092 -0.92 T G

We should use the standardized SNP effect size. You have two options. First, the variant ID is read from column 1, an allele code is read from the following column, the standardized effect size associated with the named allele is read from the column after the allele column. E.g.

esteff=/your/path/esteff.txt,1,2,3

Second, the variant ID is read from column 1, an allele code is read from the following column, the standardized effect size associated with the named allele is read from the column after the allele column, the MAF with the named allele is read from the last column. E.g.

esteff=/your/path/esteff.txt,1,2,3,4

The example of EXTERNAL.R is as following:

external=/your/path/EXTERNAL.R
extsumm=/your/path/extsumm.ldsc.gz
esteff=/your/path/esteff.txt,1,2,3
LDpath=/your/path/LDpath/
outpath=/your/path/outpath/
Rscript ${external} --extsumm ${extsumm} --esteff ${esteff} --LDpath ${LDpath} --outpath ${outpath}

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
block_data		block_data
docs		docs
scr		scr
software		software
.gitignore		.gitignore
DBSLMM.Rproj		DBSLMM.Rproj
Manual.Rmd		Manual.Rmd
README.md		README.md
Scripts.Rmd		Scripts.Rmd
Validate.Rmd		Validate.Rmd
_site.yml		_site.yml
flow_chart.png		flow_chart.png
index.Rmd		index.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DBSLMM

Overview

Relative papers in my group

PGS tools

Applications

Update log

v1.0 User-friendly DBSLMM

v0.3 Online Server

v0.3

v0.21

v0.2

Tutorial for DBSLMM (v1.0)

Prepartion

DBSLMM-auto

DBSLMM-tuning

Tutorial for external test only using summary statistics

Make SNP correlation matrix

Estimate R²

About

Releases

Packages

Languages

biostat0903/DBSLMM

Folders and files

Latest commit

History

Repository files navigation

DBSLMM

Overview

Relative papers in my group

PGS tools

Applications

Update log

v1.0 User-friendly DBSLMM

v0.3 Online Server

v0.3

v0.21

v0.2

Tutorial for DBSLMM (v1.0)

Prepartion

DBSLMM-auto

DBSLMM-tuning

Tutorial for external test only using summary statistics

Make SNP correlation matrix

Estimate R2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Estimate R²

Packages