Diffusion scheduling code making abnormal protein output #29

jmoojun · 2024-07-03T07:51:11Z

I believe your code has some discrepancies when compared to the pseudocode in your article.

Algorithm 1 TRAINING
Input: Training examples of structures, sequences, and
MSAs {(Si,Ai,Mi)}
for all (Si,Ai,Mi) do
Extract x1 ← BetaCarbons(Si)
Sample x0 ∼ HarmonicPrior(length(Ai))
Align x0 ← RMSDAlign(x0, x1)
Sample t ∼ Uniform[0, 1]
Interpolate xt ← t · x1 + (1 − t) · x0
Predict ˆ Si ← AlphaFold(Ai,Mi, xt, t)
Optimize loss L = FAPE2( ˆ Si, Si)

Does this pseudocode correspond to your code in wrapper.py ModelWrapper.distillation_training_step?

for t, s in zip(schedule[:-1], schedule[1:]):
output = self.teacher(batch, prev_outputs=prev_outputs)
pseudo_beta = pseudo_beta_fn(batch['aatype'], output['final_atom_positions'], None)
noisy = rmsdalign(pseudo_beta, noisy)
noisy = (s / t) * noisy + (1 - s / t) * pseudo_beta

This holds the same in ModelWrapper.inference.

The atoms in the PDB output seems to be clustered together very densely, which makes it an abnormal protein structure.

bjing2016 · 2024-09-02T23:06:16Z

Which output are you showing here?
In the code, the time index is flipped --- so t=1 in the paper corresponds to t=0 in the code, and vice versa. Sorry that this is not documented more clearly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diffusion scheduling code making abnormal protein output #29

Diffusion scheduling code making abnormal protein output #29

jmoojun commented Jul 3, 2024 •

edited

Loading

bjing2016 commented Sep 2, 2024

Diffusion scheduling code making abnormal protein output #29

Diffusion scheduling code making abnormal protein output #29

Comments

jmoojun commented Jul 3, 2024 • edited Loading

for t, s in zip(schedule[:-1], schedule[1:]): output = self.teacher(batch, prev_outputs=prev_outputs) pseudo_beta = pseudo_beta_fn(batch['aatype'], output['final_atom_positions'], None) noisy = rmsdalign(pseudo_beta, noisy) noisy = (s / t) * noisy + (1 - s / t) * pseudo_beta

bjing2016 commented Sep 2, 2024

jmoojun commented Jul 3, 2024 •

edited

Loading

for t, s in zip(schedule[:-1], schedule[1:]):
output = self.teacher(batch, prev_outputs=prev_outputs)
pseudo_beta = pseudo_beta_fn(batch['aatype'], output['final_atom_positions'], None)
noisy = rmsdalign(pseudo_beta, noisy)
noisy = (s / t) * noisy + (1 - s / t) * pseudo_beta