Cell-type-specific Methylation Analysis

This package provides functions that reproduce the analyses reported in Rahmani et al. 2021 (bioRxiv) comparing TCA (Rahmani et al. 2019, Nature Communications) and CellDMC (Zheng et al. 2018, Nature Methods). These functions were also used to generate the figuers in Rahmani et al. 2022, Frontiers in Bioinformatics; see under "2.1 Software and computational tools" in the Supplementary Information of the latter publication for more details.

Installation

The package has a requirement of R version 3.5.0 or greater. The results found in Rahmani et al. (2021) were run using R version 3.6.0. Other package requirements are listed in the Imports section of the DESCRIPTION file.

With the devtools package, you can install this package using the following command:

devtools::install_github("cozygene/CellTypeSpecificMethylationAnalysis")

Usage

The functions exported by this package will download most of the source data and output each figure found in Rahmani et al. (2021) into user-specified directories. Note that the data directory will contain just under 5GB of data and may temporarily max out at 13GB for temporary files.

Parametric Simulations

Figure 1 and Supplementary Figure 7 evaluate the methods when a phenotype affects methylation (X|Y). They can be reproduced with the following

parametric_simulations(data_dir, results_dir, plots_dir,
                       experiment_index=1)

which will save the simulation data, results, and plots (in tiff format) in the provided directories.

Figure 2 and Supplementary Figures 1 and 9 evaluate the methods when a phenotype is affected by methlation (Y|X), which can be reproduced with

parametric_simulations(data_dir, results_dir, plots_dir,
                       experiment_index=3)

Supplmentary Figures 3 and 4 perform the experiments in Figure 1 with smaller sample size and are generated with:

parametric_simulations(data_dir, results_dir, plots_dir,
                       experiment_index=4)

Supplementary Figure 8 performs the experiment in Figure 1 but considers the estimated effect direction when determining performance. It can be generated as follows:

parametric_simulations(data_dir, results_dir, plots_dir,
                       experiment_index=2)

Smoking EWAS

The methods were benchmarked on two independent whole-blood datasets (Hannum et al., Liu et al.) .

Figure 3 visualizes these results and can be reproduced as follows:

smoking.res <- smoking_analysis(hannum_smk_status_path="hannum_smoking_status.txt",
                                data_dir, results_dir, plot_dir)

where hannum_smk_status_path is the path to a CSV file that contains the smoking status of samples in the Hannum et al. dataset. This data is currently not publicly available but is required for this analysis. It contains a header indicating ID and smk columns that contain the sample IDs and smoking status, respectively. Smoking status is listed as either Never, Past Smoker, or Current Smoker in this file.

Cell Fraction Estimation

Supplementary Figure 2 evaluates estimation of cell fractions in two whole-blood datasets (Koestler et al., Jing et al.) with FACS-based fractions as ground truth. It can be reproduced with

refit.w.res <- refit_w_comparison(bbc_pheno_path="PhenoTypesBBC.Rd",
                                  bbc_facs_path="facsBBC.Rd",
                                  data_dir, plot_dir)

The Koestler et al. data that is analyzed within this function is available as an object exported by this package and is also publicly available on GEO (GSE77797). Preprocessing of these data in the above package followed procedures described in Lehne et al. (Genome Biology 2015). The BBC data, which is available through NODE (OER035661), is downloaded and processed by this function. Note that the phenotype and FACS data are currently under restriced acces. The bbc_pheno_path and bbc_facs_path parameters point to the Rd files that contain these data and were provided by Jing et al.

Runtime Comparison

The runtime of each method was also compared in Supplementary Figure 5. These results can be reproduced with the following function:

runtime.res <- runtime_comparison(plot_dir)

Note that this function runs each simulation serially (took on the order of a day on a single core).

TCA Optimization Comparison

Rahmani et al. (2021) also introduces an alternative procedure for fitting the TCA model in an efficient manner. While this generalized method of moments (GMM) approach provides different estimates than the original TCA maximum likelihood (MLE) model, we show in Supplementary Figure 6 that their estimates are highly concordant as sample sizes increase. It can be run with

mle.gmm.comparison.res <- var_mle_gmm_comparison(plot_dir)

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
R		R
data		data
man		man
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cell-type-specific Methylation Analysis

Installation

Usage

Parametric Simulations

Smoking EWAS

Cell Fraction Estimation

Runtime Comparison

TCA Optimization Comparison

About

Releases

Packages

Contributors 2

Languages

cozygene/CellTypeSpecificMethylationAnalysis

Folders and files

Latest commit

History

Repository files navigation

Cell-type-specific Methylation Analysis

Installation

Usage

Parametric Simulations

Smoking EWAS

Cell Fraction Estimation

Runtime Comparison

TCA Optimization Comparison

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages