Skip to content

Step 02 ‐ phenotype organisation

gibran hemani edited this page Oct 20, 2024 · 6 revisions

This step will organise the phenotype and covariate data ready for the GWAS analysis. It will create summaries and plots for each phenotype so that we can evaluate the distributional overlaps across cohorts. Efforts have been made to ensure that no individual-level will be shared, and no disclosive data will be included in the summary data that is shared (e.g. jitters are applied to plots etc). Notes:

  • Check the static_covariates="sex yob" flag in the config.env file. You may wish to add cohort specific covariates to your covariates file in which case they must be declared here also. The GWAS is performed in an age stratified manner so covariates that covary with age should not necessarily be a problem but please discuss with us if you're unsure.
  • See the Data-preparation and Phenotype-definitions pages to guide you as to how the phenotype and covariate data should be prepared.
  • This ran in ~20 minutes on 450k samples (two phenotypes) on UK Biobank with env_threads=100 using 10Gb RAM

To run

./02-phenotype-organisation.sh

Check and archive the results

Please manually look at the $results_dir/phenotype_organisation.html page to check that each phenotype is distributed as you would expect. Also monitor the samples that are being excluded and included and that they are as you expect.

This will generate the tarball and md5 sums for the results for this step:

./utils/archive.sh 02

Upload the results to the SFTP

./utils/upload.sh 02

(Note that this will prompt you for your SFTP password - contact us if you haven't received this.

(Optional) Running in containers

Docker

Make sure you've pulled the latest image:

docker pull mrcieu/lifecourse-gwas:latest

and then run:

./utils/run_docker.sh ./02-phenotype-organisation.sh

Apptainer

Make sure you've pulled the latest image:

apptainer pull docker://mrcieu/lifecourse-gwas:latest
./utils/run_apptainer.sh ./02-phenotype-organisation.sh