Skip to content

Step 00 ‐ organise genotype data

gibran hemani edited this page Jan 14, 2025 · 3 revisions

This step checks the input genotype data, extracts an LD pruned subset for subsequent analysis, and identifies a subset of high quality variants to use for subsequent analysis. Notes:

  • Please ensure that the genetic data is setup according to the instructions in the Data preparation page
  • This ran in 25 minutes on 450k samples in UK Biobank with env_threads=100 and used approx 10Gb RAM

To run

./00a-genotype-organisation.sh
./00b-genotype-organisation.sh

Check and archive the results

./utils/archive.sh 00

Upload the results to the SFTP

Contact us to alert us that you plan to upload your results!

./utils/upload.sh 00

(Note that this will prompt you for your SFTP password - contact us if you haven't received this.

(Optional) Running in containers

Docker

Make sure you've pulled the latest image:

docker pull mrcieu/lifecourse-gwas:latest

and then run:

./utils/run_docker.sh ./00a-genotype-organisation.sh
./utils/run_docker.sh ./00b-genotype-organisation.sh

Apptainer

Make sure you've pulled the latest image:

apptainer pull docker://mrcieu/lifecourse-gwas:latest
./utils/run_apptainer.sh ./00a-genotype-organisation.sh
./utils/run_apptainer.sh ./00b-genotype-organisation.sh