Skip to content

Commit

Permalink
Add ability to get usage information from colabfold_batch
Browse files Browse the repository at this point in the history
  • Loading branch information
jamesabbott committed May 31, 2023
1 parent 96eb985 commit cdc3207
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 11 deletions.
12 changes: 8 additions & 4 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ All necessary components are already available on the cluster.
1. Obtain a copy of this repository either using git i.e.
`git clone git://github.com/bartongroup/JCA_colabfold_batch.git`
or by downloading a release tarball from the link on the right under 'Releases'. Copy this tarball onto the cluster filesystem and extract with
`tar zxvf v1.5.2-beta2.tar.gz`
`tar zxvf v1.5.2-beta3.tar.gz`

2. Change into the directory which is created by step 1 - this will have the repository name if cloned from git, or the version number if obtained from a Release tarball.
a) From a repository clone:
`cd Colabfold_batch_installer`
b) From a release tarball:
`cd Colabfold_batch_installer-1.5.2-beta2`
`cd Colabfold_batch_installer-1.5.2-beta3`

3. Run the setup script:
`./setup.sh`
Expand All @@ -43,11 +43,13 @@ This will create a new conda environment named `colabfold_batch` based upon the

### Singularity

Usage: run_colabfold_singularity.sh -i /path/to/fasta/file [-c 'colabfold arguments']
Usage: run_colabfold_singularity.sh -i /path/to/fasta/file [-c 'colabfold arguments'] [-h] [-u]

The `run_colabfold_singularity.sh` script can be submitted directly to GridEngine, and requires at a minimun the path to an input fasta file. Any specific colabfold arguments can be provided using the `-c` argument. Log files will be written to a 'colabfold_logs' directory in the submission directory, while outputs will be written to a `colabfold_outputs` directory within the directory containing the submitted fasta file.

i.e. `qsub /path/to/run_colabfold.sh -i test/cadh5_arath.fa -c "--num-recycle 5 --amber --num-relax 5"`
i.e. `qsub /path/to/run_colabfold_singularity.sh -i test/cadh5_arath.fa -c "--num-recycle 5 --amber --num-relax 5"`

Full colabfold usage information can be found by running `run_colabfold_singularity.sh -u`

### Full Installation

Expand All @@ -73,6 +75,8 @@ Resulting job logs will be written into a subdirectory of the submission directo

**N.B. There are known issues with alphafold in relaxing models using Amber on GPUs - if this fails, omit the `--use-gpu-relax` argument and run amber only on CPUs - This part of the process on CPUs doesn't seem overly slow**

Full colabfold usage information can be found by running `run_colabfold.sh -u`

## Limitations

At present we do not have an in-house MMSeq2 server, so queries are directed to the default public server, which has limited capacity. Also bear in mind that use of a public resource would expose data externally which may not be appropriate.
Expand Down
17 changes: 14 additions & 3 deletions run_colabfold.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,31 @@
set -e

usage() {
echo "$0 -i /path/to/fasta/file [-c 'colabfold arguments']"
echo "Usage: $0 -i /path/to/fasta/file [-c 'colabfold arguments'] [-h] [-u]"
echo
echo "Note that colabfold arguments passed via '-c' must be surrounded with quotes to ensure they are all passed to colabfold"
echo
echo "run $0 -u for colabfold_batch help"
echo
exit 1
}

colabfold_usage() {
colabfold_batch -h
exit 1
}

while getopts "i:c:h" opt; do
while getopts "i:c:uh" opt; do
case $opt in
i)
input=$OPTARG
;;
c)
colabfold_args=$OPTARG
;;
u)
colabfold_usage
;;
h)
usage
;;
Expand All @@ -40,7 +52,6 @@ for arg in "${colabfold_args_list[@]}"; do
echo "Should this occur, rerun without --use-gpu-relax"
echo
fi

done

if [[ -z "$input" ]]; then
Expand Down
23 changes: 19 additions & 4 deletions run_colabfold_singularity.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,32 @@ set -e
image='/cluster/gjb_lab/cdr/colabfold/colabfold_batch.1.5.2.sif'

usage() {
echo "$0 -i /path/to/fasta/file [-c 'colabfold arguments']"
echo "Usage: $0 -i /path/to/fasta/file [-c 'colabfold arguments'] [-h] [-u]"
echo
echo "Note that colabfold arguments passed via '-c' must be surrounded with quotes to ensure they are all passed to colabfold"
echo
echo "run $0 -u for colabfold_batch help"
echo
exit 1
}

while getopts "i:c:h" opt; do
colabfold_usage() {
export TINI_SUBREAPER=1
singularity run ${image} colabfold_batch -h
exit 1
}

while getopts "i:c:uh" opt; do
case $opt in
i)
input=$OPTARG
;;
c)
colabfold_args=$OPTARG
;;
u)
colabfold_usage
;;
h)
usage
;;
Expand All @@ -36,7 +50,7 @@ read -a colabfold_args_list <<< "$colabfold_args"

for arg in "${colabfold_args_list[@]}"; do
if [[ "$arg" == "--use-gpu-relax" ]]; then
echo
echo
echo "WARNING: Running amber relaxation on GPUs is unreliable and may fail."
echo "Should this occur, rerun without --use-gpu-relax"
echo
Expand All @@ -63,5 +77,6 @@ echo "GPU: $CUDA_VISIBLE_DEVICES"
echo "Command line: colabfold_batch ${colabfold_args_list[@]} ${input} ${input_dir}/colabfold_outputs"

export TF_CPP_MIN_LOG_LEVEL=2
export TINI_SUBREAPER=1
singularity exec --nv -B ${input_dir}:/mnt ${image} \
colabfold_batch ${colabfold_args_list[@]} /mnt/${fasta_file} /mnt/colabfold_output
colabfold_batch ${colabfold_args_list[@]} /mnt/${fasta_file} /mnt/colabfold_output

0 comments on commit cdc3207

Please sign in to comment.