-
Notifications
You must be signed in to change notification settings - Fork 0
Step 04 ‐ GWAS
This step will perform the GWAS analysis using fastGWA. It will attempt to perform a linear mixed model using the sparse kinships and PCs generated in step 01
. However if that fails to converge it will revert to running a linear model on unrelated samples. Because there are potentially a large number of GWASs that will be run the pipeline will store the results in a .gz
file that contains a standardised variant ID, beta, SE, N, p-value and effect allele frequency for each variant. It will also create a summary file .summary.rds
with lambda QC values etc.
One GWAS needs to be run for every age x phenotype x sex combination. To run all of them at once:
./04-gwas.sh
This will be use the env_threads
variable in config.env
to determine how many threads to use for each GWAS, but it will still run each GWAS one at a time.
You can specify which phenotype to run by providing the row number for the $phenotype_processed_dir/phenolist
file. E.g. to run the first GWAS:
./04-gwas.sh 1
Recommended checks: This will likely be the most computationally demanding part of the pipeline. We suggest checking things are working as expected before running all GWASs.
- Make sure that the QQ plot from
00a
looks as expected - the observed test statistics should follow the expected with no inflation or deflation - Make sure that the PRS results from
03
look as expected - you should have strong associations of the PRS with the traits as a positive control - Run just one GWAS and check that you get a sensible result - the
results/04/*fastGWA.summary.rds
file stores information about the GWAS including inflation lambda values etc.
Once you are happy that things are performing as expected go on to run the rest of the GWASs. Feel free to discuss with the developers!
And so to run in parallel on a cluster (e.g. using slurm), supposing you have 100 GWASs to run and your env_threads=10
, create a submit.sh
script that looks a bit like this:
#SBATCH --array 1-100
#SBATCH --account=<your HPC account name>
#SBATCH --partition=<your HPC partition name>
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=10
#SBATCH --time=00:10:00
./04-gwas.sh ${SLURM_ARRAY_TASK_ID}
and submit with sbatch submit.sh
Please manually look at the $results_dir/04
directory. You should see a *.fastGWA.gz
and *.fastGWA.summary.rds
file for every phenotype x age x sex combination.
Once all GWASs are done this will generate the tarball and md5 sums for the results for this step:
utils/archive.sh 04
Note that we expect one GWAS to require ~150Mb storage, so if you have run 100 GWASs the resulting tarball will be about 15Gb.
utils/upload.sh 04
(Note that this will prompt you for your SFTP password - contact us if you haven't received this.
Make sure you've pulled the latest image:
docker pull mrcieu/lifecourse-gwas:latest
and then run:
./utils/run_docker.sh ./04-gwas.sh
Make sure you've pulled the latest image:
apptainer pull docker://mrcieu/lifecourse-gwas:latest
./utils/run_apptainer.sh ./04-gwas.sh