Optimize beam search #269

bittremieux · 2023-12-06T17:58:57Z

Move some initialization code for discarding invalid beams (i.e. which tokens are neutral losses or N-terminal residues) to __init__ to only do it when initializing the model instead of every time a batch of beams is checked for termination.
Move peptide detokenizing to the point where it's absolutely needed. This is still a slow step (currently done in DepthCharge) that could maybe be skipped by calculating masses from tokens directly? @wfondrie What do you think?
Only calculate the peptide mass once and then subtract masses of neutral losses instead of adding the neutral losses to the peptide and then calculating the mass of the entire peptide.
Early stopping when comparing delta masses for different isotopes.
Numba JIT compilation of mass error calculation. Although this requires moving the precursor m/z and charge tensors to CPU, so I don't know whether it's actually beneficial. Any thoughts @wfondrie @melihyilmaz?

- Move some initialization code for discarding invalid beams (i.e. which tokens are neutral losses or N-terminal residues) to `__init__` to only do it when initializing the model instead of every time a batch of beams is checked for termination. - Move peptide detokenizing to the point where it's absolutely needed. This is still a slow step (currently done in DepthCharge) that could maybe be skipped by calculating masses from tokens directly? - Only calculate the peptide mass once and then subtract masses of neutral losses instead of adding the neutral losses to the peptide and then calculating the mass of the entire peptide. - Early stopping when comparing delta masses for different isotopes. - Numba JIT compilation of mass error calculation.

Because the stop token is always predicted last, we don't need to take peptide order into account.

codecov · 2023-12-06T19:07:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (235420f) 89.43% compared to head (bc56c49) 88.71%.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #269      +/-   ##
==========================================
- Coverage   89.43%   88.71%   -0.73%     
==========================================
  Files          12       12              
  Lines         909      913       +4     
==========================================
- Hits          813      810       -3     
- Misses         96      103       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wfondrie · 2023-12-06T19:26:26Z

Move peptide detokenizing to the point where it's absolutely needed. This is still a slow step (currently done in DepthCharge) that could maybe be skipped by calculating masses from tokens directly? @wfondrie What do you think?

I'm working on a bigger PR for depthcharge right now and can add support for this 👍

Numba JIT compilation of mass error calculation. Although this requires moving the precursor m/z and charge tensors to CPU, so I don't know whether it's actually beneficial. Any thoughts @wfondrie @melihyilmaz?

That's a good question. I don't know either 🤔. I guess the alternative is to move the calculated m/z to the GPU and perform calculations using pytorch.

bittremieux · 2023-12-06T19:35:05Z

I guess the alternative is to move the calculated m/z to the GPU and perform calculations using pytorch.

Speeding this up with Pytorch would be ideal of course. However, it would require some significant refactoring to make efficient. Currently each beam is processed consecutively, it's not immediate obvious how to vectorize/parallelize that to harness the GPU.

The device changes between initialization and actual running, hence doing this in __init__ is insufficient.

bittremieux added 3 commits December 6, 2023 17:07

Fix unit tests

b2df0ef

Remove stop token from predicted sequences

38212f5

Because the stop token is always predicted last, we don't need to take peptide order into account.

wfondrie mentioned this pull request Dec 6, 2023

Update API and add support for small molecules wfondrie/depthcharge#43

Merged

Move N-term tokens to the correct device

bc56c49

The device changes between initialization and actual running, hence doing this in __init__ is insufficient.

bittremieux mentioned this pull request Dec 25, 2023

Add the option to turn off the precursor m/z filter entirely #247

Open

bittremieux mentioned this pull request Feb 28, 2024

Version 4 is more accurate but much slower than version 3? #307

Closed

bittremieux added this to the Casanovo v5.0.0 milestone May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize beam search #269

Optimize beam search #269

bittremieux commented Dec 6, 2023

codecov bot commented Dec 6, 2023 •

edited

Loading

wfondrie commented Dec 6, 2023

bittremieux commented Dec 6, 2023 •

edited

Loading

Optimize beam search #269

Are you sure you want to change the base?

Optimize beam search #269

Conversation

bittremieux commented Dec 6, 2023

codecov bot commented Dec 6, 2023 • edited Loading

Codecov Report

wfondrie commented Dec 6, 2023

bittremieux commented Dec 6, 2023 • edited Loading

codecov bot commented Dec 6, 2023 •

edited

Loading

bittremieux commented Dec 6, 2023 •

edited

Loading