-
Notifications
You must be signed in to change notification settings - Fork 0
Setup
- Linux x86 operating system
- GNU Bash 5.1.16 or higher
- R 4.1.0 or higher
- (Optional) Snakemake 8.19 or higher
- (Optional) Docker version 26.1.3 or higher
- (Optional) Apptainer version 1.3.2 or higher
We have attempted to make the pipeline as resource efficient as possible. For the analysis on UK Biobank steps 00-03
required < 10Gb RAM and we used 100 cores to complete in < 1 hour. For the GWAS step 04
it takes a couple of minutes per GWAS using 100 cores and < 5Gb RAM.
Use git to clone the repository:
git clone https://github.com/MRCIEU/Lifecourse-GWAS.git
Setup your directory locations. Copy the config-template.env
to a new file called config.env
, then edit it to have the paths to genotype / phenotype data locations etc as required
cp config-template.env config.env
We recommend using data paths that are outside of the cloned code repository. You will see that you need the following working data directories, ideally on fast disk that can be accessed by HPC nodes.
genotype_input_list="/EDIT/THIS/PATH"
phenotype_input_dir="/EDIT/THIS/PATH"
phenotype_processed_dir="/EDIT/THIS/PATH"
genotype_processed_dir="/EDIT/THIS/PATH"
Note that we will never request you to transfer any data from the raw individual-level data directories listed above. All the results from the pipeline will be stored in the results_dir
.
results_dir="/EDIT/THIS/PATH"
We will only store non-disclosive summary data in here that is safe to transfer to our servers for checking and subsequent meta-analysis etc.
For the sftp_username
parameter - please contact us if you haven't received this username and password - it is used to upload results from the analysis to the SFTP server.
For the cohort_name
parameter - please provide only an alphanumeric string with no spaces or special characters (e.g. ALSPAC
).
Please check that the genome_build
field is correct.
The location of the .bgen
and .sample
files need to be specified in a particular way, please see here for details.
In R (ideally version 4.4.2) run the following:
install.packages("renv")
renv::restore()
This will automatically install all the correctly versioned R packages required to run the pipeline.
To see if the R packages are installed and the binaries are working please run
./utils/check.sh
You should see the following output:
Checking R packages...
No issues found -- the project is in a consistent state.
Checking plink...
All good!
Checking flashpca...
All good!
Checking king...
All good!
Checking cohort name...
<cohort_name> is a valid cohort name!
Checking genotype input list...
Checking 23 bgen files exist
All good!
If you see otherwise, let us know. You may need to use containers to provide the appropriate executable environment.
If you wish to run the analysis in a controlled containerised environment, which we have tested to work with the relevant R packages and binaries pre-installed, you can use docker or apptainer. If you need to use alternative container environments let us know.
For Docker:
docker pull mrcieu/lifecourse-gwas:latest
./utils/run_docker.sh ./utils/check.sh
For Apptainer:
apptainer pull docker://mrcieu/lifecourse-gwas:latest
./utils/run_apptainer.sh ./utils/check.sh