Skip to content

Commit 787ffc6

Browse files
sokryptontomgoddardYoshitakaMomilot-mirditadennissv
authored
v2.3.1 - merge 'beta' into 'main' (sokrypton#371)
* Added input_features_callback to run() function to allow plotting MSA before structure predictions computation. * Added lower level plot_pae() and plot_protein_backbone() functions to allow making individual plots instead of multi-pane composite plots. * Fix typo in new protein backbone plotting code. * testing v.2.3.0 * adding support for multimer_v3 * updating version number of alphafold-colabfold * Update batch.py convert unrelaxed protein features to numpy before saving (to speedup save function) * add option to enable/disable fuse * adding option to enable/disable fuse * download v3 params by default * adding option for iterating through random_seed(s) * adding option to iterate through random seeds * Update poetry.lock * Delete poetry.lock * Update batch.py * Update test.yml only run test on main branch * fixing encoding error (when amber relax is used and notebook is run multiple times) * refactoring code a bit to write the results to jobname directory * add model_type into filename * adding model_type to filename * disable bfloat16 for old (v1, v2) multimer models * select best view using plddt * adding option to "save all" outputs of model * adding option to save_all * adding max_msa option to notebook * allow max_msa settings for multimer * adjust max settings * updating dm-haiku to fix error in jax * adding option to disable use_cluster_profile * Created using Colaboratory * code cleanup * code cleanup * Update models.py * add option to enable bfloat16 * Fix tensorflow import * making bfloat16 and fuse the default * cleanup * major bugfix for ptm + is_complex (sokrypton#360) * Update README.md * adding option to control num of top models to relax * control number of models to relax * Update batch.py * update pae filename * remove whitespace from saved images * addressing memory leak and pae bugfix all outputs are saved as they are generated to avoid memory leaks. scores reloaded later for plotting. bugfix: *_predicted_aligned_error_v1.json is now the best ranked output (before it was whichever model was saved first) * adding leading zeros to rank id and seed id * adding leading zeros to rank and seed id(s) * Delete colabfold_alphafold.py unused file * monomers can now be predicted with multimer model! * fix numbering * Update batch.py bugfix for custom msa input run function updated to return model ranks and metrics * update run function to print metrics and keep track of best model rank * replacing max_msa replacing max_msa option with max_seq and max_extra_seq to make easier to control. * Update batch.py * Update batch.py rank_by wasn't computed correctly for multimer ranking * Update AlphaFold2_batch.ipynb * Update AlphaFold2_batch.ipynb * adding TPU support * remove TPU warning... * disable BiopythonDeprecationWarning * Move load_models_and_params to before the job loop (sokrypton#368) * Update batch.py Move load_models_and_params to before the job loop to avoid issues when resuming from an existing directory with already finished jobs. * Update batch.py Change to check if it's the first job instead of moving load_models_and_params * recompile_padding update padding is now defined as a constant int instead of float * Update README.md * Created using Colaboratory * Update batch.py * Created using Colaboratory * adding iptm support to ptm (for complexes) * removing extra print out * Update batch.py * Update batch.py * add option to save intermediate results * add option to save_recycles * Update batch.py * updating pyproject.toml to use latest alphafold-colabfold v2.3 * Revert "updating pyproject.toml to use latest alphafold-colabfold v2.3" This reverts commit 05ec548. * Update alphafold-colabfold to 2.3.1 * attempt to fix test * Adding link back to old version * update test data * Update test_colabfold.py * fix test (fingers crossed) * Update test_colabfold.py * Update README.md * updating notebooks to use "main" instead of "beta" --------- Co-authored-by: Tom Goddard <[email protected]> Co-authored-by: YoshitakaMo <[email protected]> Co-authored-by: Milot Mirdita <[email protected]> Co-authored-by: Dennis Svedberg <[email protected]> Co-authored-by: Martin Steinegger <[email protected]> Co-authored-by: Martin Steinegger <[email protected]>
1 parent 28c78c6 commit 787ffc6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2874
-3388
lines changed

.github/workflows/test.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
name: Test
22

3-
on: [ push, pull_request ]
4-
3+
on:
4+
push:
5+
branches:
6+
- main
7+
pull_request:
8+
branches:
9+
- main
510
jobs:
611
run-tests:
712
runs-on: ubuntu-latest

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@
77
/tmalign
88
/pytestemp
99
__pycache__
10+
.DS_Store

AlphaFold2.ipynb

Lines changed: 168 additions & 109 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,7 @@
1-
# ColabFold
1+
# ColabFold - v1.5.0
22

33
```diff
4-
+ 2022/01/03: The MSA server's faulty hardware from 12/26 was replaced.
5-
+ There were intermittent failures on 12/26 and 1/3. Currently,
6-
+ there are no known issues. Let us know if you experience any.
7-
+ 2022/10/10: Bugfix: random_seed was not being used for alphafold-multimer.
8-
+ Same structure was returned regardless of defined seed. This
9-
+ has been fixed!
10-
+ 2022/07/13: We have set up a new ColabFold MSA server provided by Korean
11-
+ Bioinformation Center. It provides accelerated MSA generation,
12-
+ we updated the UniRef30 to 2022_02 and PDB/PDB70 to 220313.
4+
+ 04Feb2023: ColabFold updated to use AlphaFold v2.3.1!
135
```
146
<p align="center"><img src="https://github.com/sokrypton/ColabFold/raw/main/.github/ColabFold_Marv_Logo.png" height="250"/></p>
157

@@ -28,7 +20,7 @@
2820
| [OmegaFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/omegafold.ipynb) | Yes | Maybe | No | No| No |
2921
||
3022
| **OLD retired notebooks** | | | | | |
31-
| [AlphaFold2_complexes](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2_complexes.ipynb) | No | Yes | No | No | No |
23+
| [AlphaFold2_complexes](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_complexes.ipynb) | No | Yes | No | No | No |
3224
| [AlphaFold2_jackhmmer](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold_wJackhmmer.ipynb) | Yes | No | Yes | Yes | No |
3325
| [AlphaFold2_noTemplates_noMD](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/verbose/alphafold_noTemplates_noMD.ipynb) |
3426
| [AlphaFold2_noTemplates_yesMD](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/verbose/alphafold_noTemplates_yesMD.ipynb) |
@@ -158,6 +150,15 @@ Searches against the ColabFoldDB can be done in two different modes:
158150
-----------------
159151
**OLD Updates**
160152
```diff
153+
2023/01/03: The MSA server's faulty hardware from 12/26 was replaced.
154+
There were intermittent failures on 12/26 and 1/3. Currently,
155+
there are no known issues. Let us know if you experience any.
156+
2022/10/10: Bugfix: random_seed was not being used for alphafold-multimer.
157+
Same structure was returned regardless of defined seed. This
158+
has been fixed!
159+
2022/07/13: We have set up a new ColabFold MSA server provided by Korean
160+
Bioinformation Center. It provides accelerated MSA generation,
161+
we updated the UniRef30 to 2022_02 and PDB/PDB70 to 220313.
161162
11Mar2022: We use in default AlphaFold-multimer-v2 weights for complex modeling.
162163
We also offer the old complex modes "AlphaFold-ptm" or "AlphaFold-multimer-v1"
163164
04Mar2022: ColabFold now uses a much more powerful server for MSAs and searches through the ColabFoldDB instead of BFD/MGnify.
@@ -196,4 +197,3 @@ Searches against the ColabFoldDB can be done in two different modes:
196197
20Nov2021 "AMBER" is fixed thanks to Kevin Pan
197198
```
198199
-----------------
199-

batch/AlphaFold2_batch.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -260,4 +260,4 @@
260260
]
261261
}
262262
]
263-
}
263+
}

beta/AlphaFold2_advanced.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
"colab_type": "text"
2525
},
2626
"source": [
27-
"<a href=\"https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
27+
"<a href=\"https://colab.research.google.com/github/sokrypton/ColabFold/blob/beta/beta/AlphaFold2_advanced.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
2828
]
2929
},
3030
{
File renamed without changes.

colabfold/alphafold/models.py

Lines changed: 96 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
from pathlib import Path
22
from functools import wraps, partialmethod
33
from typing import Tuple, List, Optional
4-
54
import haiku
6-
75
from alphafold.model import model, config, data
86
from alphafold.model.modules import AlphaFold
97
from alphafold.model.modules_multimer import AlphaFold as AlphaFoldMultimer
@@ -12,129 +10,121 @@
1210
def load_models_and_params(
1311
num_models: int,
1412
use_templates: bool,
15-
num_recycle: int = 3,
13+
num_recycles: Optional[int] = None,
14+
recycle_early_stop_tolerance: Optional[float] = None,
1615
num_ensemble: int = 1,
1716
model_order: Optional[List[int]] = None,
1817
model_suffix: str = "_ptm",
1918
data_dir: Path = Path("."),
20-
recompile_all_models: bool = False,
2119
stop_at_score: float = 100,
22-
rank_by: str = "plddt",
23-
return_representations: bool = False,
24-
training: bool = False,
25-
max_msa: str = None,
20+
rank_by: str = "auto",
21+
max_seq: Optional[int] = None,
22+
max_extra_seq: Optional[int] = None,
23+
use_cluster_profile: Optional[bool] = None,
24+
use_fuse: bool = True,
25+
use_bfloat16: bool = True,
26+
use_dropout: bool = False,
27+
2628
) -> List[Tuple[str, model.RunModel, haiku.Params]]:
2729
"""We use only two actual models and swap the parameters to avoid recompiling.
2830
2931
Note that models 1 and 2 have a different number of parameters compared to models 3, 4 and 5,
3032
so we load model 1 and model 3.
3133
"""
3234

33-
if return_representations:
34-
# this forces the AlphaFold to always return representations
35-
AlphaFold.__call__ = partialmethod(
36-
AlphaFold.__call__, return_representations=True
37-
)
38-
39-
AlphaFoldMultimer.__call__ = partialmethod(
40-
AlphaFoldMultimer.__call__, return_representations=True
41-
)
42-
43-
if not model_order:
44-
model_order = [3, 4, 5, 1, 2]
45-
4635
# Use only two model and later swap params to avoid recompiling
4736
model_runner_and_params: [Tuple[str, model.RunModel, haiku.Params]] = []
4837

49-
if recompile_all_models:
50-
for n, model_number in enumerate(model_order):
51-
if n == num_models:
52-
break
53-
model_name = f"model_{model_number}"
54-
params = data.get_model_haiku_params(
55-
model_name=model_name + model_suffix, data_dir=str(data_dir)
56-
)
57-
model_config = config.model_config(model_name + model_suffix)
58-
model_config.model.stop_at_score = float(stop_at_score)
59-
model_config.model.stop_at_score_ranker = rank_by
60-
if max_msa != None:
61-
max_msa_clusters, max_extra_msa = [int(x) for x in max_msa.split(":")]
62-
model_config.data.eval.max_msa_clusters = max_msa_clusters
63-
model_config.data.common.max_extra_msa = max_extra_msa
64-
if model_suffix == "_ptm":
65-
model_config.data.common.num_recycle = num_recycle
66-
model_config.model.num_recycle = num_recycle
67-
model_config.data.eval.num_ensemble = num_ensemble
68-
elif model_suffix.startswith("_multimer"):
69-
model_config.model.num_recycle = num_recycle
70-
if training:
71-
model_config.model.num_ensemble_train = num_ensemble
72-
else:
73-
model_config.model.num_ensemble_eval = num_ensemble
74-
model_runner_and_params.append(
75-
(
76-
model_name,
77-
model.RunModel(model_config, params, is_training=training),
78-
params,
79-
)
80-
)
38+
if model_order is None: model_order = [1, 2, 3, 4, 5]
39+
40+
model_build_order = [3, 4, 5, 1, 2]
41+
if "multimer" in model_suffix:
42+
models_need_compilation = [3]
8143
else:
44+
# only models 1,2 use templates
8245
models_need_compilation = [1, 3] if use_templates else [3]
83-
model_build_order = [3, 4, 5, 1, 2]
84-
model_runner_and_params_build_order: [
85-
Tuple[str, model.RunModel, haiku.Params]
86-
] = []
87-
model_runner = None
88-
for model_number in model_build_order:
89-
if model_number in models_need_compilation:
90-
model_config = config.model_config(
91-
"model_" + str(model_number) + model_suffix
92-
)
93-
model_config.model.stop_at_score = float(stop_at_score)
94-
model_config.model.stop_at_score_ranker = rank_by
95-
if max_msa != None:
96-
max_msa_clusters, max_extra_msa = [
97-
int(x) for x in max_msa.split(":")
98-
]
99-
model_config.data.eval.max_msa_clusters = max_msa_clusters
100-
model_config.data.common.max_extra_msa = max_extra_msa
101-
if model_suffix == "_ptm":
102-
model_config.data.common.num_recycle = num_recycle
103-
model_config.model.num_recycle = num_recycle
104-
model_config.data.eval.num_ensemble = num_ensemble
105-
elif model_suffix.startswith("_multimer"):
106-
model_config.model.num_recycle = num_recycle
107-
if training:
108-
model_config.model.num_ensemble_train = num_ensemble
109-
else:
110-
model_config.model.num_ensemble_eval = num_ensemble
111-
model_runner = model.RunModel(
112-
model_config,
113-
data.get_model_haiku_params(
114-
model_name="model_" + str(model_number) + model_suffix,
115-
data_dir=str(data_dir),
116-
),
117-
is_training=training,
118-
)
119-
model_name = f"model_{model_number}"
46+
47+
model_runner_and_params_build_order: [Tuple[str, model.RunModel, haiku.Params]] = []
48+
model_runner = None
49+
for model_number in model_build_order:
50+
if model_number in models_need_compilation:
51+
52+
# get configurations
53+
model_config = config.model_config("model_" + str(model_number) + model_suffix)
54+
model_config.model.stop_at_score = float(stop_at_score)
55+
model_config.model.rank_by = rank_by
56+
57+
# set dropouts
58+
model_config.model.global_config.eval_dropout = use_dropout
59+
60+
# set bfloat options
61+
model_config.model.global_config.bfloat16 = use_bfloat16
62+
63+
# set fuse options
64+
model_config.model.embeddings_and_evoformer.evoformer.triangle_multiplication_incoming.fuse_projection_weights = use_fuse
65+
model_config.model.embeddings_and_evoformer.evoformer.triangle_multiplication_outgoing.fuse_projection_weights = use_fuse
66+
if "multimer" in model_suffix or model_number in [1,2]:
67+
model_config.model.embeddings_and_evoformer.template.template_pair_stack.triangle_multiplication_incoming.fuse_projection_weights = use_fuse
68+
model_config.model.embeddings_and_evoformer.template.template_pair_stack.triangle_multiplication_outgoing.fuse_projection_weights = use_fuse
69+
70+
# set number of sequences options
71+
if max_seq is not None:
72+
if "multimer" in model_suffix:
73+
model_config.model.embeddings_and_evoformer.num_msa = max_seq
74+
else:
75+
model_config.data.eval.max_msa_clusters = max_seq
76+
77+
if max_extra_seq is not None:
78+
if "multimer" in model_suffix:
79+
model_config.model.embeddings_and_evoformer.num_extra_msa = max_extra_seq
80+
else:
81+
model_config.data.common.max_extra_msa = max_extra_seq
82+
83+
# set number of recycles and ensembles
84+
if "multimer" in model_suffix:
85+
if num_recycles is not None:
86+
model_config.model.num_recycle = num_recycles
87+
if use_cluster_profile is not None:
88+
model_config.model.embeddings_and_evoformer.use_cluster_profile = use_cluster_profile
89+
model_config.model.num_ensemble_eval = num_ensemble
90+
else:
91+
if num_recycles is not None:
92+
model_config.data.common.num_recycle = num_recycles
93+
model_config.model.num_recycle = num_recycles
94+
model_config.data.eval.num_ensemble = num_ensemble
95+
96+
97+
if recycle_early_stop_tolerance is not None:
98+
model_config.model.recycle_early_stop_tolerance = recycle_early_stop_tolerance
99+
100+
# get model runner
120101
params = data.get_model_haiku_params(
121-
model_name=model_name + model_suffix, data_dir=str(data_dir)
102+
model_name="model_" + str(model_number) + model_suffix,
103+
data_dir=str(data_dir), fuse=use_fuse)
104+
model_runner = model.RunModel(
105+
model_config,
106+
params,
122107
)
123-
# keep only parameters of compiled model
124-
params_subset = {}
125-
for k in model_runner.params.keys():
126-
params_subset[k] = params[k]
108+
109+
model_name = f"model_{model_number}"
110+
params = data.get_model_haiku_params(
111+
model_name=model_name + model_suffix, data_dir=str(data_dir), fuse=use_fuse,
112+
)
113+
# keep only parameters of compiled model
114+
params_subset = {}
115+
for k in model_runner.params.keys():
116+
params_subset[k] = params[k]
127117

128-
model_runner_and_params_build_order.append(
129-
(model_name, model_runner, params_subset)
130-
)
131-
# reorder model
132-
for n, model_number in enumerate(model_order):
133-
if n == num_models:
118+
model_runner_and_params_build_order.append(
119+
(model_name, model_runner, params_subset)
120+
)
121+
# reorder model
122+
for n, model_number in enumerate(model_order):
123+
if n == num_models:
124+
break
125+
model_name = f"model_{model_number}"
126+
for m in model_runner_and_params_build_order:
127+
if model_name == m[0]:
128+
model_runner_and_params.append(m)
134129
break
135-
model_name = f"model_{model_number}"
136-
for m in model_runner_and_params_build_order:
137-
if model_name == m[0]:
138-
model_runner_and_params.append(m)
139-
break
140130
return model_runner_and_params

0 commit comments

Comments
 (0)