Skip to content
/ gafl Public

Geometric Algebra Flow Matching (GAFL) for Protein Backbone Generation

License

Notifications You must be signed in to change notification settings

hits-mli/gafl

Repository files navigation

Geometric Algebra Flow Matching (GAFL) for Protein Backbone Generation

Source code for "Generating Highly Designable Proteins with Geometric Algebra Flow Matching" (https://arxiv.org/abs/2411.05238)

If you use this code, please cite:

@inproceedings{wagnerseute2024gafl,
  title={Generating Highly Designable Proteins with Geometric Algebra Flow Matching},
  author={Wagner, Simon and Seute, Leif and Viliuga, Vsevolod and Wolf, Nicolas and Gr{\"a}ter, Frauke and St{\"u}hmer, Jan},
  booktitle={Thirty-eighth Conference on Neural Information Processing Systems},
  year={2024}
}

This repository is based on FrameFlow (https://github.com/microsoft/protein-frame-flow).

The datasets and weights of the models reported in the paper will be made available in the future

Installation

TLDR

conda env create -f environment.yaml
conda activate gafl
bash install_gatr.sh # Apply patches to gatr

# Install package:
pip install -e .

Geometric Algebra Transformer

Geometric Algebra Transformer (gatr) in version 1.2.0 requires the xformers package that resulted in conflicting package dependencies. We therefore require to install gatr from source and apply patches to remove the dependency on xformers. Please note that gatr is distributed under its own license, which you can find in LICENSE.

To install gatr with the required patches please run

conda activate gafl
bash install_gatr.sh

Install package

After installing the requirements from environment.yaml and applying the patches to gatr, you can install the gafl package by running

pip install -e .

Usage

Inference

To sample backbone structures using the model (without the re-folding procedure, which is implemented e.g. in FrameDiff), run

python experiments/inference.py inference.ckpt_path=<path/to/ckpt>

You can specify the inference settings like the number of samples and timesteps via a config file like configs/inference.yaml.

Download model weights

The weights of the models reported in the paper are published at https://github.com/hits-mli/gafl/releases/download/v1.0.0/gafl-pdb.zip. Download the zip file, extract it and specify the path to the checkpoint in the inference command:

mkdir -p outputs
wget https://github.com/hits-mli/gafl/releases/download/v1.0.0/gafl-pdb.zip
unzip gafl-pdb.zip -d outputs
python experiments/inference.py inference.ckpt_path=outputs/gafl-pdb/gafl321.ckpt

Training

To train the model on the scope dataset, paste the path to your metadata csv file in configs/data/default.yaml and run

python experiments/train.py model=gafl

For training on pdb, set paths to the metadata csv file and to the cluster-defining file (as in FrameDiff) in configs/data/pdb.yaml and run

python experiments/train.py model=gafl data=pdb

Download dataset

GAFL was trained on the PDB dataset from FrameDiff. For ablations, the SCOPe dataset from FrameFlow was used.

About

Geometric Algebra Flow Matching (GAFL) for Protein Backbone Generation

Resources

License

Stars

Watchers

Forks

Packages

No packages published