Skip to content

Step 04 ‐ GWAS

gibran hemani edited this page Jan 16, 2025 · 9 revisions

This step will perform the GWAS analysis using fastGWA. It will attempt to perform a linear mixed model using the sparse kinships and PCs generated in step 01. However if that fails to converge it will revert to running a linear model on unrelated samples. Because there are potentially a large number of GWASs that will be run the pipeline will store the results in a .gz file that contains a standardised variant ID, beta, SE, N, p-value and effect allele frequency for each variant. It will also create a summary file .summary.rds with lambda QC values etc.

To run

One GWAS needs to be run for every age x phenotype x sex combination. To run all of them at once:

./04-gwas.sh

This will be use the env_threads variable in config.env to determine how many threads to use for each GWAS, but it will still run each GWAS one at a time.

You can specify which phenotype to run by providing the row number for the $phenotype_processed_dir/phenolist file. E.g. to run the first GWAS:

./04-gwas.sh 1

Recommended checks: This will likely be the most computationally demanding part of the pipeline. We suggest checking things are working as expected before running all GWASs.

  1. Make sure that the QQ plot from 00a looks as expected - the observed test statistics should follow the expected with no inflation or deflation
  2. Make sure that the PRS results from 03 look as expected - you should have strong associations of the PRS with the traits as a positive control
  3. Run just one GWAS and check that you get a sensible result - the results/04/*fastGWA.summary.rds file stores information about the GWAS including inflation lambda values etc.

Once you are happy that things are performing as expected go on to run the rest of the GWASs. Feel free to discuss with the developers!

And so to run in parallel on a cluster (e.g. using slurm), supposing you have 100 GWASs to run and your env_threads=10, create a submit.sh script that looks a bit like this:

#SBATCH --array 1-100
#SBATCH --account=<your HPC account name>
#SBATCH --partition=<your HPC partition name>
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=10
#SBATCH --time=00:10:00
./04-gwas.sh ${SLURM_ARRAY_TASK_ID}

and submit with sbatch submit.sh

Check and archive the results

Please manually look at the $results_dir/04 directory. You should see a *.fastGWA.gz and *.fastGWA.summary.rds file for every phenotype x age x sex combination.

Once all GWASs are done this will generate the tarball and md5 sums for the results for this step:

utils/archive.sh 04

Note that we expect one GWAS to require ~150Mb storage, so if you have run 100 GWASs the resulting tarball will be about 15Gb.

Upload the results to the SFTP

utils/upload.sh 04

(Note that this will prompt you for your SFTP password - contact us if you haven't received this.

(Optional) Running in containers

Docker

Make sure you've pulled the latest image:

docker pull mrcieu/lifecourse-gwas:latest

and then run:

./utils/run_docker.sh ./04-gwas.sh

Apptainer

Make sure you've pulled the latest image:

apptainer pull docker://mrcieu/lifecourse-gwas:latest
./utils/run_apptainer.sh ./04-gwas.sh