Skip to content

Step 01 ‐ ancestry

gibran hemani edited this page Jan 14, 2025 · 8 revisions

This step will generate principal components if required. Notes:

  • Ensure that env_family_data="true" in the config.env file is set if the data has substantial relatedness. This will ensure that the relatedness is handled appropriately. If it is set to "false" then this step will identify a subset of ~unrelated individuals for the main analyses.
  • The number of PCs that will be used by default is 10, but set the config.env file to the appropriate number for your study.
  • The pipeline expects the analysis to be run on a single ancestral group. If your cohort has substantial numbers of samples across multiple ancestral groups please ensure that you run the pipeline separately for each of those major ancestral groups. Some cohorts may be comprised of largely admixed individuals. In general we suggest treating such cohorts as a single group, but we would defer to your experience of how best to handle your cohort.
  • If you have pre-calculated PCs for the samples in these data you are welcome to provide them. Please store them in $genotype_processed_dir/pcs.txt, with columns FID, IID, PC1, PC2, ....
  • This ran in ~2 hours on 450k samples in UKBiobank using env_threads=100 and up to 32Gb RAM

To run:

./01-ancestry.sh

If necessary the script can be resumed at different steps e.g. related, pcs, grm, keeplists. i.e.

./01-ancestry.sh pcs

will run from the pcs step to the end of the file.

Check and archive the results

Please manually look at the $results_dir/pcaplot.png file to check that ancestral clustering looks as expected before uploading the results.

This will generate the tarball and md5 sums for the results for this step:

./utils/archive.sh 01

Upload the results to the SFTP

./utils/upload.sh 01

(Note that this will prompt you for your SFTP password - contact us if you haven't received this.

(Optional) Running in containers

Docker

Make sure you've pulled the latest image:

docker pull mrcieu/lifecourse-gwas:latest

and then run:

./utils/run_docker.sh ./01-ancestry.sh

Apptainer

Make sure you've pulled the latest image:

apptainer pull docker://mrcieu/lifecourse-gwas:latest
./utils/run_apptainer.sh ./01-ancestry.sh