|
1 |
| -# covid19-exome |
| 1 | +# COVID-19 exome project |
| 2 | + |
| 3 | +## Directory `feature_processing` |
| 4 | + |
| 5 | +Preparation of datasets, exploratory data analysis, filtering, normalization of data. |
| 6 | + |
| 7 | +- `feature_mapping.py`, `extra_feature_mapping.py` - name mappings of features (from Russian to English short abbreviations). |
| 8 | +- `part1_ids_preprocessing.ipynb` - preprocessing of VCF and phenotypes ids. |
| 9 | +- `part2_phenotype_preprocessing.ipynb` - cleaning and preparation of phenotypes, renaming of cols, etc. |
| 10 | +- `part2.5_new_table_analysis.ipynb` - the same, but with extra-features. |
| 11 | +- `part3_march_pheno_eda_and_normalysing.ipynb` - EDA, filtering, normalization of data. |
| 12 | + |
| 13 | +## Directory `cvas_and_rvas` |
| 14 | + |
| 15 | +Common variants association study (CVAS) and Rare variants association study (RVAS). |
| 16 | +- `sd3_gwas_com.ipynb` - CVAS. |
| 17 | +- `final_rwas_pipeline_p1_hail_prepare.ipynb` - first step of RVAS (preparation of tables). |
| 18 | +- `final_rwas_pipeline_p2_statistics_and_plots.ipynb` - second step of RVAS (tests and plots). |
| 19 | +- `out_hail_gwas_com_sd3/` - directpry with p values of cvas. |
| 20 | +- `out_hail_rvas/` - directory with p values of rvas. |
| 21 | +- `rvas_dataset/` - directory with table for the 2nd part of rvas |
| 22 | + |
| 23 | + |
| 24 | +## Directory `risk_score_and_regression` |
| 25 | + |
| 26 | +Validation of found SNPs: its annotation and statistical checks. |
| 27 | + |
| 28 | +- `Variants_annotations_and_score_calculation.ipynb` - the main script with all annotations, statistics counts, etc (`columns_to_check.py` need for this notebook). |
| 29 | +- `draw_data/` - directory with datasets for R plots. |
| 30 | +- `other_data/` - directory with outputs of this script, |
| 31 | +which are not needed for drawing pictures. |
| 32 | + |
| 33 | + |
| 34 | + |
| 35 | +## Directory `r_draw` |
| 36 | +R code for drawing figures, and figures itself. |
| 37 | +- `1_MH_QQ.R` - draw Manhattan and QQ plots (files: `Rectangular-Manhattan.*t*.pdf` and `QQplot*.pdf` respectively). |
| 38 | +- `2_PCA.R` - draw plots for principal components from EDA (for sex and death). Images: `images/pca_*.pdf`. |
| 39 | +- `3_violin_regression.R` - draw violin plots and regression on SNPS and associated features (from `data/regression/regr_rs*_*.tsv`). Images: `images/regr_<rs>_<feature>.pdf`. |
| 40 | +- `4_boxplots.R` - draw boxplots for features by death and severity (from `data/boxplots_analyses.tsv`). Images: `images/bozplot_violin_<death/severity>_<feature>.pdf`. |
| 41 | +- `5_histplots.R` - draw histplots (from: `data/features_for_hist.tsv` and `data/boxplots_analyses.tsv`). Imaged data: |
| 42 | + - `images/histogram_<feature>___top_10_score.pdf` - histograms for score by severity/death/storm; |
| 43 | + - `images/score_hist.pdf` - histogram pf the snps' score; |
| 44 | + - `images/histogram_<death/severity>_<feature>.pdf` - histograms of features by death/severity. |
| 45 | +- `images/` - drawn figures. |
0 commit comments