Skip to content

Commit 2e85d46

Browse files
author
mrbarbitoff
committed
Cleanup
1 parent 2573e59 commit 2e85d46

File tree

321 files changed

+860137
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

321 files changed

+860137
-1
lines changed

README.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,45 @@
1-
# covid19-exome
1+
# COVID-19 exome project
2+
3+
## Directory `feature_processing`
4+
5+
Preparation of datasets, exploratory data analysis, filtering, normalization of data.
6+
7+
- `feature_mapping.py`, `extra_feature_mapping.py` - name mappings of features (from Russian to English short abbreviations).
8+
- `part1_ids_preprocessing.ipynb` - preprocessing of VCF and phenotypes ids.
9+
- `part2_phenotype_preprocessing.ipynb` - cleaning and preparation of phenotypes, renaming of cols, etc.
10+
- `part2.5_new_table_analysis.ipynb` - the same, but with extra-features.
11+
- `part3_march_pheno_eda_and_normalysing.ipynb` - EDA, filtering, normalization of data.
12+
13+
## Directory `cvas_and_rvas`
14+
15+
Common variants association study (CVAS) and Rare variants association study (RVAS).
16+
- `sd3_gwas_com.ipynb` - CVAS.
17+
- `final_rwas_pipeline_p1_hail_prepare.ipynb` - first step of RVAS (preparation of tables).
18+
- `final_rwas_pipeline_p2_statistics_and_plots.ipynb` - second step of RVAS (tests and plots).
19+
- `out_hail_gwas_com_sd3/` - directpry with p values of cvas.
20+
- `out_hail_rvas/` - directory with p values of rvas.
21+
- `rvas_dataset/` - directory with table for the 2nd part of rvas
22+
23+
24+
## Directory `risk_score_and_regression`
25+
26+
Validation of found SNPs: its annotation and statistical checks.
27+
28+
- `Variants_annotations_and_score_calculation.ipynb` - the main script with all annotations, statistics counts, etc (`columns_to_check.py` need for this notebook).
29+
- `draw_data/` - directory with datasets for R plots.
30+
- `other_data/` - directory with outputs of this script,
31+
which are not needed for drawing pictures.
32+
33+
34+
35+
## Directory `r_draw`
36+
R code for drawing figures, and figures itself.
37+
- `1_MH_QQ.R` - draw Manhattan and QQ plots (files: `Rectangular-Manhattan.*t*.pdf` and `QQplot*.pdf` respectively).
38+
- `2_PCA.R` - draw plots for principal components from EDA (for sex and death). Images: `images/pca_*.pdf`.
39+
- `3_violin_regression.R` - draw violin plots and regression on SNPS and associated features (from `data/regression/regr_rs*_*.tsv`). Images: `images/regr_<rs>_<feature>.pdf`.
40+
- `4_boxplots.R` - draw boxplots for features by death and severity (from `data/boxplots_analyses.tsv`). Images: `images/bozplot_violin_<death/severity>_<feature>.pdf`.
41+
- `5_histplots.R` - draw histplots (from: `data/features_for_hist.tsv` and `data/boxplots_analyses.tsv`). Imaged data:
42+
- `images/histogram_<feature>___top_10_score.pdf` - histograms for score by severity/death/storm;
43+
- `images/score_hist.pdf` - histogram pf the snps' score;
44+
- `images/histogram_<death/severity>_<feature>.pdf` - histograms of features by death/severity.
45+
- `images/` - drawn figures.

0 commit comments

Comments
 (0)